* [PATCH AUTOSEL 5.10 008/114] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
[not found] <20250505231817.2697367-1-sashal@kernel.org>
@ 2025-05-05 23:16 ` Sasha Levin
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 009/114] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
` (23 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
chuck.lever, trondmy, anna, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit bf9be373b830a3e48117da5d89bb6145a575f880 ]
The autobind setting was supposed to be determined in rpc_create(),
since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating
remote transport endpoints").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/clnt.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 7ec5b0bc48ebf..0a4b4870c4d99 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -273,9 +273,6 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt,
old = rcu_dereference_protected(clnt->cl_xprt,
lockdep_is_held(&clnt->cl_lock));
- if (!xprt_bound(xprt))
- clnt->cl_autobind = 1;
-
clnt->cl_timeout = timeout;
rcu_assign_pointer(clnt->cl_xprt, xprt);
spin_unlock(&clnt->cl_lock);
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 009/114] SUNRPC: rpcbind should never reset the port to the value '0'
[not found] <20250505231817.2697367-1-sashal@kernel.org>
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 008/114] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
@ 2025-05-05 23:16 ` Sasha Levin
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 027/114] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
` (22 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
trondmy, anna, chuck.lever, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit 214c13e380ad7636631279f426387f9c4e3c14d9 ]
If we already had a valid port number for the RPC service, then we
should not allow the rpcbind client to set it to the invalid value '0'.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/rpcb_clnt.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 8fad45320e1b9..f1bb4fd2a2707 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -794,9 +794,10 @@ static void rpcb_getport_done(struct rpc_task *child, void *data)
}
trace_rpcb_setport(child, map->r_status, map->r_port);
- xprt->ops->set_port(xprt, map->r_port);
- if (map->r_port)
+ if (map->r_port) {
+ xprt->ops->set_port(xprt, map->r_port);
xprt_set_bound(xprt);
+ }
}
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 027/114] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
[not found] <20250505231817.2697367-1-sashal@kernel.org>
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 008/114] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 009/114] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
@ 2025-05-05 23:16 ` Sasha Levin
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 033/114] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
` (21 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ilpo Järvinen, Chia-Yu Chang, David S . Miller, Sasha Levin,
edumazet, ncardwell, dsahern, kuba, pabeni, netdev
From: Ilpo Järvinen <ij@kernel.org>
[ Upstream commit 149dfb31615e22271d2525f078c95ea49bc4db24 ]
- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
the compiler to decide
Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/tcp_input.c | 56 +++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6b926c71b6f31..7c2e714527f68 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -403,6 +403,20 @@ static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr
return false;
}
+static void tcp_count_delivered_ce(struct tcp_sock *tp, u32 ecn_count)
+{
+ tp->delivered_ce += ecn_count;
+}
+
+/* Updates the delivered and delivered_ce counts */
+static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
+ bool ece_ack)
+{
+ tp->delivered += delivered;
+ if (ece_ack)
+ tcp_count_delivered_ce(tp, delivered);
+}
+
/* Buffer size and advertised window tuning.
*
* 1. Tuning sk->sk_sndbuf, when connection enters established state.
@@ -1102,15 +1116,6 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
}
}
-/* Updates the delivered and delivered_ce counts */
-static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
- bool ece_ack)
-{
- tp->delivered += delivered;
- if (ece_ack)
- tp->delivered_ce += delivered;
-}
-
/* This procedure tags the retransmission queue when SACKs arrive.
*
* We have three tag bits: SACKED(S), RETRANS(R) and LOST(L).
@@ -3735,12 +3740,23 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
}
}
-static inline void tcp_in_ack_event(struct sock *sk, u32 flags)
+static void tcp_in_ack_event(struct sock *sk, int flag)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
- if (icsk->icsk_ca_ops->in_ack_event)
- icsk->icsk_ca_ops->in_ack_event(sk, flags);
+ if (icsk->icsk_ca_ops->in_ack_event) {
+ u32 ack_ev_flags = 0;
+
+ if (flag & FLAG_WIN_UPDATE)
+ ack_ev_flags |= CA_ACK_WIN_UPDATE;
+ if (flag & FLAG_SLOWPATH) {
+ ack_ev_flags |= CA_ACK_SLOWPATH;
+ if (flag & FLAG_ECE)
+ ack_ev_flags |= CA_ACK_ECE;
+ }
+
+ icsk->icsk_ca_ops->in_ack_event(sk, ack_ev_flags);
+ }
}
/* Congestion control has updated the cwnd already. So if we're in
@@ -3857,12 +3873,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_snd_una_update(tp, ack);
flag |= FLAG_WIN_UPDATE;
- tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPACKS);
} else {
- u32 ack_ev_flags = CA_ACK_SLOWPATH;
-
if (ack_seq != TCP_SKB_CB(skb)->end_seq)
flag |= FLAG_DATA;
else
@@ -3874,19 +3886,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
&sack_state);
- if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
+ if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb)))
flag |= FLAG_ECE;
- ack_ev_flags |= CA_ACK_ECE;
- }
if (sack_state.sack_delivered)
tcp_count_delivered(tp, sack_state.sack_delivered,
flag & FLAG_ECE);
-
- if (flag & FLAG_WIN_UPDATE)
- ack_ev_flags |= CA_ACK_WIN_UPDATE;
-
- tcp_in_ack_event(sk, ack_ev_flags);
}
/* This is a deviation from RFC3168 since it states that:
@@ -3913,6 +3918,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_rack_update_reo_wnd(sk, &rs);
+ tcp_in_ack_event(sk, flag);
+
if (tp->tlp_high_seq)
tcp_process_tlp_ack(sk, ack, flag);
@@ -3944,6 +3951,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
return 1;
no_queue:
+ tcp_in_ack_event(sk, flag);
/* If data was DSACKed, see if we can undo a cwnd reduction. */
if (flag & FLAG_DSACKING_ACK) {
tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag,
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 033/114] netfilter: conntrack: Bound nf_conntrack sysctl writes
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (2 preceding siblings ...)
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 027/114] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
@ 2025-05-05 23:16 ` Sasha Levin
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 036/114] ipv6: save dontfrag in cork Sasha Levin
` (20 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Nicolas Bouchinet, Pablo Neira Ayuso, Sasha Levin, kadlec, davem,
edumazet, kuba, pabeni, netfilter-devel, coreteam, netdev
From: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
[ Upstream commit 8b6861390ffee6b8ed78b9395e3776c16fec6579 ]
nf_conntrack_max and nf_conntrack_expect_max sysctls were authorized to
be written any negative value, which would then be stored in the
unsigned int variables nf_conntrack_max and nf_ct_expect_max variables.
While the do_proc_dointvec_conv function is supposed to limit writing
handled by proc_dointvec proc_handler to INT_MAX. Such a negative value
being written in an unsigned int leads to a very high value, exceeding
this limit.
Moreover, the nf_conntrack_expect_max sysctl documentation specifies the
minimum value is 1.
The proc_handlers have thus been updated to proc_dointvec_minmax in
order to specify the following write bounds :
* Bound nf_conntrack_max sysctl writings between SYSCTL_ZERO
and SYSCTL_INT_MAX.
* Bound nf_conntrack_expect_max sysctl writings between SYSCTL_ONE
and SYSCTL_INT_MAX as defined in the sysctl documentation.
With this patch applied, sysctl writes outside the defined in the bound
will thus lead to a write error :
```
sysctl -w net.netfilter.nf_conntrack_expect_max=-1
sysctl: setting key "net.netfilter.nf_conntrack_expect_max": Invalid argument
```
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_standalone.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 8498bf27a3531..abf558868db11 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -613,7 +613,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_COUNT] = {
.procname = "nf_conntrack_count",
@@ -652,7 +654,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_ct_expect_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_ACCT] = {
.procname = "nf_conntrack_acct",
@@ -931,7 +935,9 @@ static struct ctl_table nf_ct_netfilter_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
{ }
};
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 036/114] ipv6: save dontfrag in cork
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (3 preceding siblings ...)
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 033/114] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
@ 2025-05-05 23:16 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 039/114] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
` (19 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:16 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Willem de Bruijn, Eric Dumazet, Jakub Kicinski, Sasha Levin,
davem, dsahern, pabeni, netdev
From: Willem de Bruijn <willemb@google.com>
[ Upstream commit a18dfa9925b9ef6107ea3aa5814ca3c704d34a8a ]
When spanning datagram construction over multiple send calls using
MSG_MORE, per datagram settings are configured on the first send.
That is when ip(6)_setup_cork stores these settings for subsequent use
in __ip(6)_append_data and others.
The only flag that escaped this was dontfrag. As a result, a datagram
could be constructed with df=0 on the first sendmsg, but df=1 on a
next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in
the "diff" scenario.
Changing datagram conditions in the middle of constructing an skb
makes this already complex code path even more convoluted. It is here
unintentional. Bring this flag in line with expected sockopt/cmsg
behavior.
And stop passing ipc6 to __ip6_append_data, to avoid such issues
in the future. This is already the case for __ip_append_data.
inet6_cork had a 6 byte hole, so the 1B flag has no impact.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/ipv6.h | 1 +
net/ipv6/ip6_output.c | 9 +++++----
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index d758c131ed5e1..89b3527e03675 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -189,6 +189,7 @@ struct inet6_cork {
struct ipv6_txoptions *opt;
u8 hop_limit;
u8 tclass;
+ u8 dontfrag:1;
};
/**
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 426330b8dfa47..f024d89cb0f81 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1416,6 +1416,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
cork->fl.u.ip6 = *fl6;
v6_cork->hop_limit = ipc6->hlimit;
v6_cork->tclass = ipc6->tclass;
+ v6_cork->dontfrag = ipc6->dontfrag;
if (rt->dst.flags & DST_XFRM_TUNNEL)
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
@@ -1450,7 +1451,7 @@ static int __ip6_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb),
void *from, size_t length, int transhdrlen,
- unsigned int flags, struct ipcm6_cookie *ipc6)
+ unsigned int flags)
{
struct sk_buff *skb, *skb_prev = NULL;
unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
@@ -1507,7 +1508,7 @@ static int __ip6_append_data(struct sock *sk,
if (headersize + transhdrlen > mtu)
goto emsgsize;
- if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
+ if (cork->length + length > mtu - headersize && v6_cork->dontfrag &&
(sk->sk_protocol == IPPROTO_UDP ||
sk->sk_protocol == IPPROTO_RAW)) {
ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
@@ -1825,7 +1826,7 @@ int ip6_append_data(struct sock *sk,
return __ip6_append_data(sk, fl6, &sk->sk_write_queue, &inet->cork.base,
&np->cork, sk_page_frag(sk), getfrag,
- from, length, transhdrlen, flags, ipc6);
+ from, length, transhdrlen, flags);
}
EXPORT_SYMBOL_GPL(ip6_append_data);
@@ -2020,7 +2021,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
err = __ip6_append_data(sk, fl6, &queue, &cork->base, &v6_cork,
¤t->task_frag, getfrag, from,
length + exthdrlen, transhdrlen + exthdrlen,
- flags, ipc6);
+ flags);
if (err) {
__ip6_flush_pending_frames(sk, &queue, cork, &v6_cork);
return ERR_PTR(err);
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 039/114] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (4 preceding siblings ...)
2025-05-05 23:16 ` [PATCH AUTOSEL 5.10 036/114] ipv6: save dontfrag in cork Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 041/114] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
` (18 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, Jason Xing, Kuniyuki Iwashima, Jakub Kicinski,
Sasha Levin, ncardwell, davem, dsahern, pabeni, netdev
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit f8ece40786c9342249aa0a1b55e148ee23b2a746 ]
We have platforms with 6 NUMA nodes and 480 cpus.
inet_ehash_locks_alloc() currently allocates a single 64KB page
to hold all ehash spinlocks. This adds more pressure on a single node.
Change inet_ehash_locks_alloc() to use vmalloc() to spread
the spinlocks on all online nodes, driven by NUMA policies.
At boot time, NUMA policy is interleave=all, meaning that
tcp_hashinfo.ehash_locks gets hash dispersion on all nodes.
Tested:
lack5:~# grep inet_ehash_locks_alloc /proc/vmallocinfo
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 8192 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x000000004e99d30c-0x00000000763f3279 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=1 N1=2 N2=2 N3=1 N4=1 N5=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# numactl --interleave=0,5 unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000fd73a33e-0x0000000004b9a177 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=4 N5=4
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 1024 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000db07d7a2-0x00000000ad697d29 8192 inet_ehash_locks_alloc+0x90/0x100 pages=1 vmalloc N2=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250305130550.1865988-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/inet_hashtables.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 0fb5d758264fe..fea74ab2a4bec 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -934,22 +934,37 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
{
unsigned int locksz = sizeof(spinlock_t);
unsigned int i, nblocks = 1;
+ spinlock_t *ptr = NULL;
- if (locksz != 0) {
- /* allocate 2 cache lines or at least one spinlock per cpu */
- nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U);
- nblocks = roundup_pow_of_two(nblocks * num_possible_cpus());
+ if (locksz == 0)
+ goto set_mask;
- /* no more locks than number of hash buckets */
- nblocks = min(nblocks, hashinfo->ehash_mask + 1);
+ /* Allocate 2 cache lines or at least one spinlock per cpu. */
+ nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U) * num_possible_cpus();
- hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
- if (!hashinfo->ehash_locks)
- return -ENOMEM;
+ /* At least one page per NUMA node. */
+ nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
+
+ nblocks = roundup_pow_of_two(nblocks);
+
+ /* No more locks than number of hash buckets. */
+ nblocks = min(nblocks, hashinfo->ehash_mask + 1);
- for (i = 0; i < nblocks; i++)
- spin_lock_init(&hashinfo->ehash_locks[i]);
+ if (num_online_nodes() > 1) {
+ /* Use vmalloc() to allow NUMA policy to spread pages
+ * on all available nodes if desired.
+ */
+ ptr = vmalloc_array(nblocks, locksz);
+ }
+ if (!ptr) {
+ ptr = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
}
+ for (i = 0; i < nblocks; i++)
+ spin_lock_init(&ptr[i]);
+ hashinfo->ehash_locks = ptr;
+set_mask:
hashinfo->ehash_locks_mask = nblocks - 1;
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 041/114] ieee802154: ca8210: Use proper setters and getters for bitwise types
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (5 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 039/114] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 049/114] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
` (17 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andy Shevchenko, Miquel Raynal, Linus Walleij, Stefan Schmidt,
Sasha Levin, alex.aring, andrew+netdev, davem, edumazet, kuba,
pabeni, linux-wpan, netdev
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ Upstream commit 169b2262205836a5d1213ff44dca2962276bece1 ]
Sparse complains that the driver doesn't respect the bitwise types:
drivers/net/ieee802154/ca8210.c:1796:27: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1796:27: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1796:27: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1801:25: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1801:25: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1801:25: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1928:28: warning: incorrect type in argument 3 (different base types)
drivers/net/ieee802154/ca8210.c:1928:28: expected unsigned short [usertype] dst_pan_id
drivers/net/ieee802154/ca8210.c:1928:28: got restricted __le16 [addressable] [usertype] pan_id
Use proper setters and getters for bitwise types.
Note, in accordance with [1] the protocol is little endian.
Link: https://www.cascoda.com/wp-content/uploads/2018/11/CA-8210_datasheet_0418.pdf [1]
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/20250305105656.2133487-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ieee802154/ca8210.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 9a082910ec59f..35e2b7eb30761 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -1488,8 +1488,7 @@ static u8 mcps_data_request(
command.pdata.data_req.src_addr_mode = src_addr_mode;
command.pdata.data_req.dst.mode = dst_address_mode;
if (dst_address_mode != MAC_MODE_NO_ADDR) {
- command.pdata.data_req.dst.pan_id[0] = LS_BYTE(dst_pan_id);
- command.pdata.data_req.dst.pan_id[1] = MS_BYTE(dst_pan_id);
+ put_unaligned_le16(dst_pan_id, command.pdata.data_req.dst.pan_id);
if (dst_address_mode == MAC_MODE_SHORT_ADDR) {
command.pdata.data_req.dst.address[0] = LS_BYTE(
dst_addr->short_address
@@ -1838,12 +1837,12 @@ static int ca8210_skb_rx(
}
hdr.source.mode = data_ind[0];
dev_dbg(&priv->spi->dev, "srcAddrMode: %#03x\n", hdr.source.mode);
- hdr.source.pan_id = *(u16 *)&data_ind[1];
+ hdr.source.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[1]));
dev_dbg(&priv->spi->dev, "srcPanId: %#06x\n", hdr.source.pan_id);
memcpy(&hdr.source.extended_addr, &data_ind[3], 8);
hdr.dest.mode = data_ind[11];
dev_dbg(&priv->spi->dev, "dstAddrMode: %#03x\n", hdr.dest.mode);
- hdr.dest.pan_id = *(u16 *)&data_ind[12];
+ hdr.dest.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[12]));
dev_dbg(&priv->spi->dev, "dstPanId: %#06x\n", hdr.dest.pan_id);
memcpy(&hdr.dest.extended_addr, &data_ind[14], 8);
@@ -1970,7 +1969,7 @@ static int ca8210_skb_tx(
status = mcps_data_request(
header.source.mode,
header.dest.mode,
- header.dest.pan_id,
+ le16_to_cpu(header.dest.pan_id),
(union macaddr *)&header.dest.extended_addr,
skb->len - mac_len,
&skb->data[mac_len],
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 049/114] net: ethernet: ti: cpsw_new: populate netdev of_node
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (6 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 041/114] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 050/114] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
` (16 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexander Sverdlin, Siddharth Vadapalli, Andrew Lunn,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, edumazet,
pabeni, alexander.sverdlin, hkallweit1, lorenzo, nicolas.dichtel,
aleksander.lobakin, linux-omap, netdev
From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
[ Upstream commit 7ff1c88fc89688c27f773ba956f65f0c11367269 ]
So that of_find_net_device_by_node() can find CPSW ports and other DSA
switches can be stacked downstream. Tested in conjunction with KSZ8873.
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20250303074703.1758297-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/ti/cpsw_new.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index a1ee205d6a889..d6f8d3e757a25 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1427,6 +1427,7 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
ndev->netdev_ops = &cpsw_netdev_ops;
ndev->ethtool_ops = &cpsw_ethtool_ops;
SET_NETDEV_DEV(ndev, dev);
+ ndev->dev.of_node = slave_data->slave_node;
if (!napi_ndev) {
/* CPSW Host port CPDMA interface is shared between
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 050/114] net: pktgen: fix mpls maximum labels list parsing
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (7 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 049/114] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 051/114] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
` (15 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Paolo Abeni, Sasha Levin, davem,
edumazet, kuba, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 2b15a0693f70d1e8119743ee89edbfb1271b3ea8 ]
Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead
of up to MAX_MPLS_LABELS - 1).
Addresses the following:
$ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0
-bash: echo: write error: Argument list too long
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index c1e3d3bea1286..c2b3c454eddd9 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -805,6 +805,10 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
pkt_dev->nr_labels = 0;
do {
__u32 tmp;
+
+ if (n >= MAX_MPLS_LABELS)
+ return -E2BIG;
+
len = hex32_arg(&buffer[i], 8, &tmp);
if (len <= 0)
return len;
@@ -816,8 +820,6 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
return -EFAULT;
i++;
n++;
- if (n >= MAX_MPLS_LABELS)
- return -E2BIG;
} while (c == ',');
pkt_dev->nr_labels = n;
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 051/114] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (8 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 050/114] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 059/114] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
` (14 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, David Ahern, Jakub Kicinski,
Sasha Levin, davem, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 254ba7e6032d3fc738050d500b0c1d8197af90ca ]
fib_valid_key_len() is called in the beginning of fib_table_insert()
or fib_table_delete() to check if the prefix length is valid.
fib_table_insert() and fib_table_delete() are called from 3 paths
- ip_rt_ioctl()
- inet_rtm_newroute() / inet_rtm_delroute()
- fib_magic()
In the first ioctl() path, rtentry_to_fib_config() checks the prefix
length with bad_mask(). Also, fib_magic() always passes the correct
prefix: 32 or ifa->ifa_prefixlen, which is already validated.
Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config().
While at it, 2 direct returns in rtm_to_fib_config() are changed to
goto to match other places in the same function
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_frontend.c | 18 ++++++++++++++++--
net/ipv4/fib_trie.c | 22 ----------------------
2 files changed, 16 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 5e2a003cd83c7..f902cd8cb852b 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -817,19 +817,33 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
}
}
+ if (cfg->fc_dst_len > 32) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (cfg->fc_dst_len < 32 && (ntohl(cfg->fc_dst) << cfg->fc_dst_len)) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix for given prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
if (cfg->fc_nh_id) {
if (cfg->fc_oif || cfg->fc_gw_family ||
cfg->fc_encap || cfg->fc_mp) {
NL_SET_ERR_MSG(extack,
"Nexthop specification and nexthop id are mutually exclusive");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
}
if (has_gw && has_via) {
NL_SET_ERR_MSG(extack,
"Nexthop configuration can not contain both GATEWAY and VIA");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
if (!cfg->fc_table)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1bdcdc79d43f9..6c53381fa36f7 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1145,22 +1145,6 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
return 0;
}
-static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
-{
- if (plen > KEYLENGTH) {
- NL_SET_ERR_MSG(extack, "Invalid prefix length");
- return false;
- }
-
- if ((plen < KEYLENGTH) && (key << plen)) {
- NL_SET_ERR_MSG(extack,
- "Invalid prefix for given prefix length");
- return false;
- }
-
- return true;
-}
-
static void fib_remove_alias(struct trie *t, struct key_vector *tp,
struct key_vector *l, struct fib_alias *old);
@@ -1181,9 +1165,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
fi = fib_create_info(cfg, extack);
@@ -1671,9 +1652,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
l = fib_find_node(t, &tp, key);
if (!l)
return -ESRCH;
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 059/114] net/mlx5: Avoid report two health errors on same syndrome
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (9 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 051/114] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 061/114] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
` (13 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Moshe Shemesh, Shahar Shitrit, Tariq Toukan, Kalesh AP,
David S . Miller, Sasha Levin, saeedm, andrew+netdev, edumazet,
kuba, pabeni, netdev, linux-rdma
From: Moshe Shemesh <moshe@nvidia.com>
[ Upstream commit b5d7b2f04ebcff740f44ef4d295b3401aeb029f4 ]
In case health counter has not increased for few polling intervals, miss
counter will reach max misses threshold and health report will be
triggered for FW health reporter. In case syndrome found on same health
poll another health report will be triggered.
Avoid two health reports on same syndrome by marking this syndrome as
already known.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/health.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index f42e118f32901..d48912d7afe54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -733,6 +733,7 @@ static void poll_health(struct timer_list *t)
health->prev = count;
if (health->miss_counter == MAX_MISSES) {
mlx5_core_err(dev, "device's health compromised - reached miss count\n");
+ health->synd = ioread8(&h->synd);
print_health_info(dev);
queue_work(health->wq, &health->report_work);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 061/114] net: xgene-v2: remove incorrect ACPI_PTR annotation
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (10 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 059/114] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 062/114] bonding: report duplicate MAC address in all situations Sasha Levin
` (12 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Arnd Bergmann, Paolo Abeni, Sasha Levin, iyappan, keyur,
andrew+netdev, davem, edumazet, kuba, netdev
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit 01358e8fe922f716c05d7864ac2213b2440026e7 ]
Building with W=1 shows a warning about xge_acpi_match being unused when
CONFIG_ACPI is disabled:
drivers/net/ethernet/apm/xgene-v2/main.c:723:36: error: unused variable 'xge_acpi_match' [-Werror,-Wunused-const-variable]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250225163341.4168238-2-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c b/drivers/net/ethernet/apm/xgene-v2/main.c
index 80399c8980bd3..627f860141002 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -9,8 +9,6 @@
#include "main.h"
-static const struct acpi_device_id xge_acpi_match[];
-
static int xge_get_resources(struct xge_pdata *pdata)
{
struct platform_device *pdev;
@@ -733,7 +731,7 @@ MODULE_DEVICE_TABLE(acpi, xge_acpi_match);
static struct platform_driver xge_driver = {
.driver = {
.name = "xgene-enet-v2",
- .acpi_match_table = ACPI_PTR(xge_acpi_match),
+ .acpi_match_table = xge_acpi_match,
},
.probe = xge_probe,
.remove = xge_remove,
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 062/114] bonding: report duplicate MAC address in all situations
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (11 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 061/114] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 075/114] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
` (11 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hangbin Liu, Nikolay Aleksandrov, Jakub Kicinski, Sasha Levin, jv,
andrew+netdev, davem, edumazet, pabeni, netdev
From: Hangbin Liu <liuhangbin@gmail.com>
[ Upstream commit 28d68d396a1cd21591e8c6d74afbde33a7ea107e ]
Normally, a bond uses the MAC address of the first added slave as the bond’s
MAC address. And the bond will set active slave’s MAC address to bond’s
address if fail_over_mac is set to none (0) or follow (2).
When the first slave is removed, the bond will still use the removed slave’s
MAC address, which can lead to a duplicate MAC address and potentially cause
issues with the switch. To avoid confusion, let's warn the user in all
situations, including when fail_over_mac is set to 2 or not in active-backup
mode.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7caaf5b49c7b5..c5ccd42af528e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2239,7 +2239,7 @@ static int __bond_release_one(struct net_device *bond_dev,
RCU_INIT_POINTER(bond->current_arp_slave, NULL);
- if (!all && (!bond->params.fail_over_mac ||
+ if (!all && (bond->params.fail_over_mac != BOND_FOM_ACTIVE ||
BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
if (ether_addr_equal_64bits(bond_dev->dev_addr, slave->perm_hwaddr) &&
bond_has_slaves(bond))
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 075/114] net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (12 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 062/114] bonding: report duplicate MAC address in all situations Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 077/114] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
` (10 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Jakub Kicinski, Sasha Levin, davem,
edumazet, pabeni, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 425e64440ad0a2f03bdaf04be0ae53dededbaa77 ]
Honour the user given buffer size for the strn_len() calls (otherwise
strn_len() will access memory outside of the user given buffer).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-8-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index c2b3c454eddd9..57502e8628462 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1770,8 +1770,8 @@ static ssize_t pktgen_thread_write(struct file *file,
i = len;
/* Read variable name */
-
- len = strn_len(&user_buffer[i], sizeof(name) - 1);
+ max = min(sizeof(name) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0)
return len;
@@ -1801,7 +1801,8 @@ static ssize_t pktgen_thread_write(struct file *file,
if (!strcmp(name, "add_device")) {
char f[32];
memset(f, 0, 32);
- len = strn_len(&user_buffer[i], sizeof(f) - 1);
+ max = min(sizeof(f) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0) {
ret = len;
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 077/114] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (13 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 075/114] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 079/114] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
` (9 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jason Xing, Martin KaFai Lau, Sasha Levin, ast, daniel, andrii,
edumazet, ncardwell, davem, kuba, pabeni, martin.lau, dsahern,
bpf, netdev
From: Jason Xing <kerneljasonxing@gmail.com>
[ Upstream commit fd93eaffb3f977b23bc0a48d4c8616e654fcf133 ]
The subsequent patch will implement BPF TX timestamping. It will
call the sockops BPF program without holding the sock lock.
This breaks the current assumption that all sock ops programs will
hold the sock lock. The sock's fields of the uapi's bpf_sock_ops
requires this assumption.
To address this, a new "u8 is_locked_tcp_sock;" field is added. This
patch sets it in the current sock_ops callbacks. The "is_fullsock"
test is then replaced by the "is_locked_tcp_sock" test during
sock_ops_convert_ctx_access().
The new TX timestamping callbacks added in the subsequent patch will
not have this set. This will prevent unsafe access from the new
timestamping callbacks.
Potentially, we could allow read-only access. However, this would
require identifying which callback is read-safe-only and also requires
additional BPF instruction rewrites in the covert_ctx. Since the BPF
program can always read everything from a socket (e.g., by using
bpf_core_cast), this patch keeps it simple and disables all read
and write access to any socket fields through the bpf_sock_ops
UAPI from the new TX timestamping callback.
Moreover, note that some of the fields in bpf_sock_ops are specific
to tcp_sock, and sock_ops currently only supports tcp_sock. In
the future, UDP timestamping will be added, which will also break
this assumption. The same idea used in this patch will be reused.
Considering that the current sock_ops only supports tcp_sock, the
variable is named is_locked_"tcp"_sock.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/filter.h | 1 +
include/net/tcp.h | 1 +
net/core/filter.c | 8 ++++----
net/ipv4/tcp_input.c | 2 ++
net/ipv4/tcp_output.c | 2 ++
5 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e3aca0dc7d9c6..a963a4495b0d0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1277,6 +1277,7 @@ struct bpf_sock_ops_kern {
void *skb_data_end;
u8 op;
u8 is_fullsock;
+ u8 is_locked_tcp_sock;
u8 remaining_opt_len;
u64 temp; /* temp and everything after is not
* initialized to 0 before calling
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2aad2e79ac6ad..02e8ef3a49192 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2311,6 +2311,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
if (sk_fullsock(sk)) {
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_owned_by_me(sk);
}
diff --git a/net/core/filter.c b/net/core/filter.c
index b262cad02bad9..73df612426a2a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9194,10 +9194,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
} \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
fullsock_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); \
if (si->dst_reg == si->src_reg) \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
@@ -9282,10 +9282,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
temp)); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
reg, si->dst_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, sk),\
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7c2e714527f68..5b751f9c6fd16 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -167,6 +167,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
@@ -183,6 +184,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = bpf_op;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
if (skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 32e38ac5ee2bd..ae4f23455f985 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -506,6 +506,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
@@ -551,6 +552,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 079/114] eth: mlx4: don't try to complete XDP frames in netpoll
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (14 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 077/114] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 082/114] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
` (8 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Tariq Toukan, Sasha Levin, andrew+netdev, davem,
edumazet, pabeni, ast, daniel, hawk, john.fastabend, netdev,
linux-rdma, bpf
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 8fdeafd66edaf420ea0063a1f13442fe3470fe70 ]
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 59b097cda3278..6c52ddef88a62 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -445,6 +445,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
if (unlikely(!priv->port_up))
return 0;
+ if (unlikely(!napi_budget) && cq->type == TX_XDP)
+ return 0;
netdev_txq_bql_complete_prefetchw(ring->tx_queue);
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 082/114] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (15 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 079/114] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 083/114] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
` (7 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 633f16d7e07c129a36b882c05379e01ce5bdb542 ]
In the sensor_count field of the MTEWE register, bits 1-62 are
supported only for unmanaged switches, not for NICs, and bit 63
is reserved for internal use.
To prevent confusing output that may include set bits that are
not relevant to NIC sensors, we update the bitmask to retain only
the first bit, which corresponds to the sensor ASIC.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 3ce17c3d7a001..9d7b0a4cc48a9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -156,6 +156,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
u64 value_msb;
value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
+ /* bit 1-63 are not supported for NICs,
+ * hence read only bit 0 (asic) from lsb.
+ */
+ value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
mlx5_core_warn(events->dev,
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 083/114] net/mlx5: Apply rate-limiting to high temperature warning
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (16 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 082/114] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 090/114] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
` (6 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 9dd3d5d258aceb37bdf09c8b91fa448f58ea81f0 ]
Wrap the high temperature warning in a temperature event with
a call to net_ratelimit() to prevent flooding the kernel log
with repeated warning messages when temperature exceeds the
threshold multiple times within a short duration.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 9d7b0a4cc48a9..5e8db7a6185a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -162,9 +162,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
- mlx5_core_warn(events->dev,
- "High temperature on sensors with bit set %llx %llx",
- value_msb, value_lsb);
+ if (net_ratelimit())
+ mlx5_core_warn(events->dev,
+ "High temperature on sensors with bit set %llx %llx",
+ value_msb, value_lsb);
return NOTIFY_OK;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 090/114] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (17 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 083/114] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 093/114] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
` (5 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kees Cook, Jakub Kicinski, Sasha Levin, tariqt, andrew+netdev,
davem, edumazet, pabeni, yishaih, netdev, linux-rdma
From: Kees Cook <kees@kernel.org>
[ Upstream commit 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e ]
GCC can see that the value range for "order" is capped, but this leads
it to consider that it might be negative, leading to a false positive
warning (with GCC 15 with -Warray-bounds -fdiagnostics-details):
../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=]
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
| ~~~~~~~~~~~^~~
'mlx4_alloc_db_from_pgdir': events 1-2
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | | | | (2) out of array bounds here
| (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53,
from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42:
../include/linux/mlx4/device.h:664:33: note: while referencing 'bits'
664 | unsigned long *bits[2];
| ^~~~
Switch the argument to unsigned int, which removes the compiler needing
to consider negative values.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/alloc.c | 6 +++---
include/linux/mlx4/device.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index b330020dc0d67..f2bded847e61d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -682,9 +682,9 @@ static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
}
static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
- struct mlx4_db *db, int order)
+ struct mlx4_db *db, unsigned int order)
{
- int o;
+ unsigned int o;
int i;
for (o = order; o <= 1; ++o) {
@@ -712,7 +712,7 @@ static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
return 0;
}
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order)
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order)
{
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_db_pgdir *pgdir;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index eb8169c03d899..0eb04d537038c 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1116,7 +1116,7 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
struct mlx4_buf *buf);
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order);
void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 093/114] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (18 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 090/114] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 094/114] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
` (4 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexei Lazar, Dragos Tatulea, Tariq Toukan, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Alexei Lazar <alazar@nvidia.com>
[ Upstream commit 95b9606b15bb3ce1198d28d2393dd0e1f0a5f3e9 ]
Current loopback test validation ignores non-linear SKB case in
the SKB access, which can lead to failures in scenarios such as
when HW GRO is enabled.
Linearize the SKB so both cases will be handled.
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index ce8ab1f018769..c380340b81665 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -188,6 +188,9 @@ mlx5e_test_loopback_validate(struct sk_buff *skb,
struct udphdr *udph;
struct iphdr *iph;
+ if (skb_linearize(skb))
+ goto out;
+
/* We are only going to peek, no need to clone the SKB */
if (MLX5E_TEST_PKT_SIZE - ETH_HLEN > skb_headlen(skb))
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 094/114] net/mlx5e: set the tx_queue_len for pfifo_fast
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (19 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 093/114] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 095/114] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
` (3 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Daniel Jurgens, Tariq Toukan, Michal Swiatkowski,
Jakub Kicinski, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit a38cc5706fb9f7dc4ee3a443f61de13ce1e410ed ]
By default, the mq netdev creates a pfifo_fast qdisc. On a
system with 16 core, the pfifo_fast with 3 bands consumes
16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
= 393KB. The patch sets the tx qlen to representor default
value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
representor at ECPF.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 6512bb231e7e0..8eb7288f820a4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -740,6 +740,8 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev)
}
netdev->watchdog_timeo = 15 * HZ;
+ if (mlx5_core_is_ecpf(mdev))
+ netdev->tx_queue_len = 1 << MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE;
netdev->features |= NETIF_F_NETNS_LOCAL;
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 095/114] net/mlx5e: reduce rep rxq depth to 256 for ECPF
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (20 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 094/114] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 096/114] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
` (2 subsequent siblings)
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Bodong Wang, Saeed Mahameed, Tariq Toukan,
Michal Swiatkowski, Jakub Kicinski, Sasha Levin, andrew+netdev,
davem, edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit b9cc8f9d700867aaa77aedddfea85e53d5e5d584 ]
By experiments, a single queue representor netdev consumes kernel
memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
which becomes a memory pressure issue for embedded devices such as
BlueField-2 16GB / BlueField-3 32GB memory.
Since representor netdevs mostly handles miss traffic, and ideally,
most of the traffic will be offloaded, reduce the default non-uplink
rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
manager. This saves around 1MB of memory per regular RQ,
(1024 - 256) * 2KB, allocated from page pool.
With rxq depth of 256, the netlink page pool tool reports
$./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-get
{'id': 277,
'ifindex': 9,
'inflight': 128,
'inflight-mem': 786432,
'napi-id': 775}]
This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
entries consumes around 128 pages (thus create a page pool with
size 128), shown above at inflight.
Note that each netdev has multiple types of RQs, including
Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
only supports regular rq, this patch only changes the regular RQ's
default depth.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 8eb7288f820a4..2fb1fe9a8eee1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -53,6 +53,7 @@
#define MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE \
max(0x7, MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE)
#define MLX5E_REP_PARAMS_DEF_NUM_CHANNELS 1
+#define MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE 0x8
static const char mlx5e_rep_driver_name[] = "mlx5e_rep";
@@ -702,6 +703,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)
/* RQ */
mlx5e_build_rq_params(mdev, params);
+ if (!mlx5e_is_uplink_rep(priv) && mlx5_core_is_ecpf(mdev))
+ params->log_rq_mtu_frames = MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE;
/* CQ moderation params */
params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 096/114] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (21 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 095/114] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
@ 2025-05-05 23:17 ` Sasha Levin
2025-05-05 23:18 ` [PATCH AUTOSEL 5.10 100/114] vxlan: Annotate FDB data races Sasha Levin
2025-05-05 23:18 ` [PATCH AUTOSEL 5.10 101/114] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:17 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, Ido Schimmel, Jakub Kicinski,
Sasha Levin, davem, dsahern, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 5a1ccffd30a08f5a2428cd5fbb3ab03e8eb6c66d ]
The following patch will not set skb->sk from VRF path.
Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk)
in fib[46]_rule_configure().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_rules.c | 4 ++--
net/ipv6/fib6_rules.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index d279cb8ac1584..a270951386e19 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -226,9 +226,9 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
- struct net *net = sock_net(skb->sk);
+ struct fib4_rule *rule4 = (struct fib4_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct fib4_rule *rule4 = (struct fib4_rule *) rule;
if (frh->tos & ~IPTOS_TOS_MASK) {
NL_SET_ERR_MSG(extack, "Invalid tos");
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index cf9a44fb8243d..0d4e82744921f 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -353,9 +353,9 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
+ struct fib6_rule *rule6 = (struct fib6_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct net *net = sock_net(skb->sk);
- struct fib6_rule *rule6 = (struct fib6_rule *) rule;
if (rule->action == FR_ACT_TO_TBL && !rule->l3mdev) {
if (rule->table == RT6_TABLE_UNSPEC) {
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 100/114] vxlan: Annotate FDB data races
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (22 preceding siblings ...)
2025-05-05 23:17 ` [PATCH AUTOSEL 5.10 096/114] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
@ 2025-05-05 23:18 ` Sasha Levin
2025-05-05 23:18 ` [PATCH AUTOSEL 5.10 101/114] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:18 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ido Schimmel, Petr Machata, Eric Dumazet, Nikolay Aleksandrov,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, pabeni,
menglong8.dong, gnault, netdev
From: Ido Schimmel <idosch@nvidia.com>
[ Upstream commit f6205f8215f12a96518ac9469ff76294ae7bd612 ]
The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].
Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().
[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit
write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
vxlan_xmit+0xb29/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
vxlan_xmit+0xadf/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f
Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[2]
#!/bin/bash
set +H
echo whitelist > /sys/kernel/debug/kcsan
echo !vxlan_xmit > /sys/kernel/debug/kcsan
ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/vxlan/vxlan_core.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index ec67d2eb05ecd..7d7aa7d768804 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -333,9 +333,9 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
be32_to_cpu(fdb->vni)))
goto nla_put_failure;
- ci.ndm_used = jiffies_to_clock_t(now - fdb->used);
+ ci.ndm_used = jiffies_to_clock_t(now - READ_ONCE(fdb->used));
ci.ndm_confirmed = 0;
- ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated);
+ ci.ndm_updated = jiffies_to_clock_t(now - READ_ONCE(fdb->updated));
ci.ndm_refcnt = 0;
if (nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@@ -541,8 +541,8 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
f = __vxlan_find_mac(vxlan, mac, vni);
- if (f && f->used != jiffies)
- f->used = jiffies;
+ if (f && READ_ONCE(f->used) != jiffies)
+ WRITE_ONCE(f->used, jiffies);
return f;
}
@@ -1072,12 +1072,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
!(f->flags & NTF_VXLAN_ADDED_BY_USER)) {
if (f->state != state) {
f->state = state;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
if (f->flags != fdb_flags) {
f->flags = fdb_flags;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
}
@@ -1111,7 +1111,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
}
if (ndm_flags & NTF_USE)
- f->used = jiffies;
+ WRITE_ONCE(f->used, jiffies);
if (notify) {
if (rd == NULL)
@@ -1524,7 +1524,7 @@ static bool vxlan_snoop(struct net_device *dev,
src_mac, &rdst->remote_ip.sa, &src_ip->sa);
rdst->remote_ip = *src_ip;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
vxlan_fdb_notify(vxlan, f, rdst, RTM_NEWNEIGH, true, NULL);
} else {
u32 hash_index = fdb_head_index(vxlan, src_mac, vni);
@@ -2999,7 +2999,7 @@ static void vxlan_cleanup(struct timer_list *t)
if (f->flags & NTF_EXT_LEARNED)
continue;
- timeout = f->used + vxlan->cfg.age_interval * HZ;
+ timeout = READ_ONCE(f->used) + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
"garbage collect %pM\n",
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH AUTOSEL 5.10 101/114] net-sysfs: prevent uncleared queues from being re-added
[not found] <20250505231817.2697367-1-sashal@kernel.org>
` (23 preceding siblings ...)
2025-05-05 23:18 ` [PATCH AUTOSEL 5.10 100/114] vxlan: Annotate FDB data races Sasha Levin
@ 2025-05-05 23:18 ` Sasha Levin
24 siblings, 0 replies; 25+ messages in thread
From: Sasha Levin @ 2025-05-05 23:18 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Antoine Tenart, Jakub Kicinski, Sasha Levin, davem, edumazet,
pabeni, sdf, jdamato, netdev
From: Antoine Tenart <atenart@kernel.org>
[ Upstream commit 7e54f85c60828842be27e0149f3533357225090e ]
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic
and because of how Tx/Rx queues are implemented (and their
requirements), it might happen that a queue is re-added before having
the chance to be cleared. In such rare case, do not complete the queue
addition operation.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/net-sysfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 99303897b7bb7..bbcff7925f03f 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1009,6 +1009,22 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Rx queues are cleared in rx_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add rx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger rx_queue_release call which
* decreases dev refcount: Take that reference here
*/
@@ -1640,6 +1656,22 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Tx queues are cleared in netdev_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add tx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger netdev_queue_release call
* which decreases dev refcount: Take that reference here
*/
--
2.39.5
^ permalink raw reply related [flat|nested] 25+ messages in thread