* [PATCH AUTOSEL 6.6 010/294] SUNRPC: Don't allow waiting for exiting tasks
[not found] <20250505225634.2688578-1-sashal@kernel.org>
@ 2025-05-05 22:51 ` Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 023/294] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
` (42 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:51 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Sasha Levin, chuck.lever, trondmy,
anna, davem, edumazet, kuba, pabeni, linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit 14e41b16e8cb677bb440dca2edba8b041646c742 ]
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/sched.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 9b45fbdc90cab..73bc39281ef5f 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -276,6 +276,8 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
static int rpc_wait_bit_killable(struct wait_bit_key *key, int mode)
{
+ if (unlikely(current->flags & PF_EXITING))
+ return -EINTR;
schedule();
if (signal_pending_state(mode, current))
return -ERESTARTSYS;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 023/294] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
[not found] <20250505225634.2688578-1-sashal@kernel.org>
2025-05-05 22:51 ` [PATCH AUTOSEL 6.6 010/294] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
@ 2025-05-05 22:52 ` Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 024/294] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
` (41 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:52 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
chuck.lever, trondmy, anna, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit bf9be373b830a3e48117da5d89bb6145a575f880 ]
The autobind setting was supposed to be determined in rpc_create(),
since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating
remote transport endpoints").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/clnt.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 142ee6554848a..4ffb2bcaf3648 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -275,9 +275,6 @@ static struct rpc_xprt *rpc_clnt_set_transport(struct rpc_clnt *clnt,
old = rcu_dereference_protected(clnt->cl_xprt,
lockdep_is_held(&clnt->cl_lock));
- if (!xprt_bound(xprt))
- clnt->cl_autobind = 1;
-
clnt->cl_timeout = timeout;
rcu_assign_pointer(clnt->cl_xprt, xprt);
spin_unlock(&clnt->cl_lock);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 024/294] SUNRPC: rpcbind should never reset the port to the value '0'
[not found] <20250505225634.2688578-1-sashal@kernel.org>
2025-05-05 22:51 ` [PATCH AUTOSEL 6.6 010/294] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 023/294] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
@ 2025-05-05 22:52 ` Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 063/294] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
` (40 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:52 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Trond Myklebust, Jeff Layton, Benjamin Coddington, Sasha Levin,
trondmy, anna, chuck.lever, davem, edumazet, kuba, pabeni,
linux-nfs, netdev
From: Trond Myklebust <trond.myklebust@hammerspace.com>
[ Upstream commit 214c13e380ad7636631279f426387f9c4e3c14d9 ]
If we already had a valid port number for the RPC service, then we
should not allow the rpcbind client to set it to the invalid value '0'.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/sunrpc/rpcb_clnt.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 102c3818bc54d..53bcca365fb1c 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -820,9 +820,10 @@ static void rpcb_getport_done(struct rpc_task *child, void *data)
}
trace_rpcb_setport(child, map->r_status, map->r_port);
- xprt->ops->set_port(xprt, map->r_port);
- if (map->r_port)
+ if (map->r_port) {
+ xprt->ops->set_port(xprt, map->r_port);
xprt_set_bound(xprt);
+ }
}
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 063/294] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered()
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (2 preceding siblings ...)
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 024/294] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
@ 2025-05-05 22:52 ` Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 072/294] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
` (39 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:52 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ilpo Järvinen, Chia-Yu Chang, David S . Miller, Sasha Levin,
edumazet, ncardwell, dsahern, kuba, pabeni, netdev
From: Ilpo Järvinen <ij@kernel.org>
[ Upstream commit 149dfb31615e22271d2525f078c95ea49bc4db24 ]
- Move tcp_count_delivered() earlier and split tcp_count_delivered_ce()
out of it
- Move tcp_in_ack_event() later
- While at it, remove the inline from tcp_in_ack_event() and let
the compiler to decide
Accurate ECN's heuristics does not know if there is going
to be ACE field based CE counter increase or not until after
rtx queue has been processed. Only then the number of ACKed
bytes/pkts is available. As CE or not affects presence of
FLAG_ECE, that information for tcp_in_ack_event is not yet
available in the old location of the call to tcp_in_ack_event().
Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/tcp_input.c | 56 +++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 10d38ec0ff5ac..a172248b66783 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -425,6 +425,20 @@ static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr
return false;
}
+static void tcp_count_delivered_ce(struct tcp_sock *tp, u32 ecn_count)
+{
+ tp->delivered_ce += ecn_count;
+}
+
+/* Updates the delivered and delivered_ce counts */
+static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
+ bool ece_ack)
+{
+ tp->delivered += delivered;
+ if (ece_ack)
+ tcp_count_delivered_ce(tp, delivered);
+}
+
/* Buffer size and advertised window tuning.
*
* 1. Tuning sk->sk_sndbuf, when connection enters established state.
@@ -1137,15 +1151,6 @@ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb)
}
}
-/* Updates the delivered and delivered_ce counts */
-static void tcp_count_delivered(struct tcp_sock *tp, u32 delivered,
- bool ece_ack)
-{
- tp->delivered += delivered;
- if (ece_ack)
- tp->delivered_ce += delivered;
-}
-
/* This procedure tags the retransmission queue when SACKs arrive.
*
* We have three tag bits: SACKED(S), RETRANS(R) and LOST(L).
@@ -3816,12 +3821,23 @@ static void tcp_process_tlp_ack(struct sock *sk, u32 ack, int flag)
}
}
-static inline void tcp_in_ack_event(struct sock *sk, u32 flags)
+static void tcp_in_ack_event(struct sock *sk, int flag)
{
const struct inet_connection_sock *icsk = inet_csk(sk);
- if (icsk->icsk_ca_ops->in_ack_event)
- icsk->icsk_ca_ops->in_ack_event(sk, flags);
+ if (icsk->icsk_ca_ops->in_ack_event) {
+ u32 ack_ev_flags = 0;
+
+ if (flag & FLAG_WIN_UPDATE)
+ ack_ev_flags |= CA_ACK_WIN_UPDATE;
+ if (flag & FLAG_SLOWPATH) {
+ ack_ev_flags |= CA_ACK_SLOWPATH;
+ if (flag & FLAG_ECE)
+ ack_ev_flags |= CA_ACK_ECE;
+ }
+
+ icsk->icsk_ca_ops->in_ack_event(sk, ack_ev_flags);
+ }
}
/* Congestion control has updated the cwnd already. So if we're in
@@ -3938,12 +3954,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_snd_una_update(tp, ack);
flag |= FLAG_WIN_UPDATE;
- tcp_in_ack_event(sk, CA_ACK_WIN_UPDATE);
-
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPHPACKS);
} else {
- u32 ack_ev_flags = CA_ACK_SLOWPATH;
-
if (ack_seq != TCP_SKB_CB(skb)->end_seq)
flag |= FLAG_DATA;
else
@@ -3955,19 +3967,12 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
flag |= tcp_sacktag_write_queue(sk, skb, prior_snd_una,
&sack_state);
- if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb))) {
+ if (tcp_ecn_rcv_ecn_echo(tp, tcp_hdr(skb)))
flag |= FLAG_ECE;
- ack_ev_flags |= CA_ACK_ECE;
- }
if (sack_state.sack_delivered)
tcp_count_delivered(tp, sack_state.sack_delivered,
flag & FLAG_ECE);
-
- if (flag & FLAG_WIN_UPDATE)
- ack_ev_flags |= CA_ACK_WIN_UPDATE;
-
- tcp_in_ack_event(sk, ack_ev_flags);
}
/* This is a deviation from RFC3168 since it states that:
@@ -3994,6 +3999,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
tcp_rack_update_reo_wnd(sk, &rs);
+ tcp_in_ack_event(sk, flag);
+
if (tp->tlp_high_seq)
tcp_process_tlp_ack(sk, ack, flag);
@@ -4025,6 +4032,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
return 1;
no_queue:
+ tcp_in_ack_event(sk, flag);
/* If data was DSACKed, see if we can undo a cwnd reduction. */
if (flag & FLAG_DSACKING_ACK) {
tcp_fastretrans_alert(sk, prior_snd_una, num_dupack, &flag,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 072/294] net/smc: use the correct ndev to find pnetid by pnetid table
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (3 preceding siblings ...)
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 063/294] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
@ 2025-05-05 22:52 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 084/294] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
` (38 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:52 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Guangguan Wang, Wenjia Zhang, Halil Pasic, David S . Miller,
Sasha Levin, jaka, edumazet, kuba, pabeni, linux-rdma, linux-s390,
netdev
From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
[ Upstream commit bfc6c67ec2d64d0ca4e5cc3e1ac84298a10b8d62 ]
When using smc_pnet in SMC, it will only search the pnetid in the
base_ndev of the netdev hierarchy(both HW PNETID and User-defined
sw pnetid). This may not work for some scenarios when using SMC in
container on cloud environment.
In container, there have choices of different container network,
such as directly using host network, virtual network IPVLAN, veth,
etc. Different choices of container network have different netdev
hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1
in host below is the netdev directly related to the physical device).
_______________________________
| _________________ |
| |POD | |
| | | |
| | eth0_________ | |
| |____| |__| |
| | | |
| | | |
| eth1|base_ndev| eth0_______ |
| | | | RDMA ||
| host |_________| |_______||
---------------------------------
netdev hierarchy if directly using host network
________________________________
| _________________ |
| |POD __________ | |
| | |upper_ndev| | |
| |eth0|__________| | |
| |_______|_________| |
| |lower netdev |
| __|______ |
| eth1| | eth0_______ |
| |base_ndev| | RDMA ||
| host |_________| |_______||
---------------------------------
netdev hierarchy if using IPVLAN
_______________________________
| _____________________ |
| |POD _________ | |
| | |base_ndev|| |
| |eth0(veth)|_________|| |
| |____________|________| |
| |pairs |
| _______|_ |
| | | eth0_______ |
| veth|base_ndev| | RDMA ||
| |_________| |_______||
| _________ |
| eth1|base_ndev| |
| host |_________| |
---------------------------------
netdev hierarchy if using veth
Due to some reasons, the eth1 in host is not RDMA attached netdevice,
pnetid is needed to map the eth1(in host) with RDMA device so that POD
can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such
as Terway, network management plugin in container environment), and in
cloud environment the eth(in host) can dynamically be inserted by CNI
when POD create and dynamically be removed by CNI when POD destroy and
no POD related to the eth(in host) anymore. It is hard to config the
pnetid to the eth1(in host). But it is easy to config the pnetid to the
netdevice which can be seen in POD. When do SMC-R, both the container
directly using host network and the container using veth network can
successfully match the RDMA device, because the configured pnetid netdev
is a base_ndev. But the container using IPVLAN can not successfully
match the RDMA device and 0x03030000 fallback happens, because the
configured pnetid netdev is not a base_ndev. Additionally, if config
pnetid to the eth1(in host) also can not work for matching RDMA device
when using veth network and doing SMC-R in POD.
To resolve the problems list above, this patch extends to search user
-defined sw pnetid in the clc handshake ndev when no pnetid can be found
in the base_ndev, and the base_ndev take precedence over ndev for backward
compatibility. This patch also can unify the pnetid setup of different
network choices list above in container(Config user-defined sw pnetid in
the netdevice can be seen in POD).
Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/smc/smc_pnet.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/smc/smc_pnet.c b/net/smc/smc_pnet.c
index dbcc72b43d0c0..d44d7f427fc94 100644
--- a/net/smc/smc_pnet.c
+++ b/net/smc/smc_pnet.c
@@ -1084,14 +1084,16 @@ static void smc_pnet_find_roce_by_pnetid(struct net_device *ndev,
struct smc_init_info *ini)
{
u8 ndev_pnetid[SMC_MAX_PNETID_LEN];
+ struct net_device *base_ndev;
struct net *net;
- ndev = pnet_find_base_ndev(ndev);
+ base_ndev = pnet_find_base_ndev(ndev);
net = dev_net(ndev);
- if (smc_pnetid_by_dev_port(ndev->dev.parent, ndev->dev_port,
+ if (smc_pnetid_by_dev_port(base_ndev->dev.parent, base_ndev->dev_port,
ndev_pnetid) &&
+ smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid) &&
smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid)) {
- smc_pnet_find_rdma_dev(ndev, ini);
+ smc_pnet_find_rdma_dev(base_ndev, ini);
return; /* pnetid could not be determined */
}
_smc_pnet_find_roce_by_pnetid(ndev_pnetid, ini, NULL, net);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 084/294] netfilter: conntrack: Bound nf_conntrack sysctl writes
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (4 preceding siblings ...)
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 072/294] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 092/294] ipv6: save dontfrag in cork Sasha Levin
` (37 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Nicolas Bouchinet, Pablo Neira Ayuso, Sasha Levin, kadlec, davem,
edumazet, kuba, pabeni, netfilter-devel, coreteam, netdev
From: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
[ Upstream commit 8b6861390ffee6b8ed78b9395e3776c16fec6579 ]
nf_conntrack_max and nf_conntrack_expect_max sysctls were authorized to
be written any negative value, which would then be stored in the
unsigned int variables nf_conntrack_max and nf_ct_expect_max variables.
While the do_proc_dointvec_conv function is supposed to limit writing
handled by proc_dointvec proc_handler to INT_MAX. Such a negative value
being written in an unsigned int leads to a very high value, exceeding
this limit.
Moreover, the nf_conntrack_expect_max sysctl documentation specifies the
minimum value is 1.
The proc_handlers have thus been updated to proc_dointvec_minmax in
order to specify the following write bounds :
* Bound nf_conntrack_max sysctl writings between SYSCTL_ZERO
and SYSCTL_INT_MAX.
* Bound nf_conntrack_expect_max sysctl writings between SYSCTL_ONE
and SYSCTL_INT_MAX as defined in the sysctl documentation.
With this patch applied, sysctl writes outside the defined in the bound
will thus lead to a write error :
```
sysctl -w net.netfilter.nf_conntrack_expect_max=-1
sysctl: setting key "net.netfilter.nf_conntrack_expect_max": Invalid argument
```
Signed-off-by: Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_standalone.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 559665467b04d..1d5de1d9f008d 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -621,7 +621,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_COUNT] = {
.procname = "nf_conntrack_count",
@@ -657,7 +659,9 @@ static struct ctl_table nf_ct_sysctl_table[] = {
.data = &nf_ct_expect_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ONE,
+ .extra2 = SYSCTL_INT_MAX,
},
[NF_SYSCTL_CT_ACCT] = {
.procname = "nf_conntrack_acct",
@@ -951,7 +955,9 @@ static struct ctl_table nf_ct_netfilter_table[] = {
.data = &nf_conntrack_max,
.maxlen = sizeof(int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_INT_MAX,
},
{ }
};
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 092/294] ipv6: save dontfrag in cork
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (5 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 084/294] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 108/294] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
` (36 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Willem de Bruijn, Eric Dumazet, Jakub Kicinski, Sasha Levin,
davem, dsahern, pabeni, netdev
From: Willem de Bruijn <willemb@google.com>
[ Upstream commit a18dfa9925b9ef6107ea3aa5814ca3c704d34a8a ]
When spanning datagram construction over multiple send calls using
MSG_MORE, per datagram settings are configured on the first send.
That is when ip(6)_setup_cork stores these settings for subsequent use
in __ip(6)_append_data and others.
The only flag that escaped this was dontfrag. As a result, a datagram
could be constructed with df=0 on the first sendmsg, but df=1 on a
next. Which is what cmsg_ip.sh does in an upcoming MSG_MORE test in
the "diff" scenario.
Changing datagram conditions in the middle of constructing an skb
makes this already complex code path even more convoluted. It is here
unintentional. Bring this flag in line with expected sockopt/cmsg
behavior.
And stop passing ipc6 to __ip6_append_data, to avoid such issues
in the future. This is already the case for __ip_append_data.
inet6_cork had a 6 byte hole, so the 1B flag has no impact.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250307033620.411611-3-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/ipv6.h | 1 +
net/ipv6/ip6_output.c | 9 +++++----
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index af8a771a053c5..d79851c5fabd8 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -199,6 +199,7 @@ struct inet6_cork {
struct ipv6_txoptions *opt;
u8 hop_limit;
u8 tclass;
+ u8 dontfrag:1;
};
/* struct ipv6_pinfo - ipv6 private area */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index cd89a2b35dfb5..b840021f79e98 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1452,6 +1452,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
}
v6_cork->hop_limit = ipc6->hlimit;
v6_cork->tclass = ipc6->tclass;
+ v6_cork->dontfrag = ipc6->dontfrag;
if (rt->dst.flags & DST_XFRM_TUNNEL)
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
@@ -1485,7 +1486,7 @@ static int __ip6_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb),
void *from, size_t length, int transhdrlen,
- unsigned int flags, struct ipcm6_cookie *ipc6)
+ unsigned int flags)
{
struct sk_buff *skb, *skb_prev = NULL;
struct inet_cork *cork = &cork_full->base;
@@ -1541,7 +1542,7 @@ static int __ip6_append_data(struct sock *sk,
if (headersize + transhdrlen > mtu)
goto emsgsize;
- if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
+ if (cork->length + length > mtu - headersize && v6_cork->dontfrag &&
(sk->sk_protocol == IPPROTO_UDP ||
sk->sk_protocol == IPPROTO_ICMPV6 ||
sk->sk_protocol == IPPROTO_RAW)) {
@@ -1913,7 +1914,7 @@ int ip6_append_data(struct sock *sk,
return __ip6_append_data(sk, &sk->sk_write_queue, &inet->cork,
&np->cork, sk_page_frag(sk), getfrag,
- from, length, transhdrlen, flags, ipc6);
+ from, length, transhdrlen, flags);
}
EXPORT_SYMBOL_GPL(ip6_append_data);
@@ -2118,7 +2119,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
err = __ip6_append_data(sk, &queue, cork, &v6_cork,
¤t->task_frag, getfrag, from,
length + exthdrlen, transhdrlen + exthdrlen,
- flags, ipc6);
+ flags);
if (err) {
__ip6_flush_pending_frames(sk, &queue, cork, &v6_cork);
return ERR_PTR(err);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 108/294] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc()
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (6 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 092/294] ipv6: save dontfrag in cork Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 110/294] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
` (35 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, Jason Xing, Kuniyuki Iwashima, Jakub Kicinski,
Sasha Levin, ncardwell, davem, dsahern, pabeni, netdev
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit f8ece40786c9342249aa0a1b55e148ee23b2a746 ]
We have platforms with 6 NUMA nodes and 480 cpus.
inet_ehash_locks_alloc() currently allocates a single 64KB page
to hold all ehash spinlocks. This adds more pressure on a single node.
Change inet_ehash_locks_alloc() to use vmalloc() to spread
the spinlocks on all online nodes, driven by NUMA policies.
At boot time, NUMA policy is interleave=all, meaning that
tcp_hashinfo.ehash_locks gets hash dispersion on all nodes.
Tested:
lack5:~# grep inet_ehash_locks_alloc /proc/vmallocinfo
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 8192 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x000000004e99d30c-0x00000000763f3279 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=1 N1=2 N2=2 N3=1 N4=1 N5=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# numactl --interleave=0,5 unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000fd73a33e-0x0000000004b9a177 36864 inet_ehash_locks_alloc+0x90/0x100 pages=8 vmalloc N0=4 N5=4
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
lack5:~# echo 1024 >/proc/sys/net/ipv4/tcp_child_ehash_entries
lack5:~# numactl --interleave=all unshare -n bash -c "grep inet_ehash_locks_alloc /proc/vmallocinfo"
0x00000000db07d7a2-0x00000000ad697d29 8192 inet_ehash_locks_alloc+0x90/0x100 pages=1 vmalloc N2=1
0x00000000d9aec4d1-0x00000000a828b652 69632 inet_ehash_locks_alloc+0x90/0x100 pages=16 vmalloc N0=2 N1=3 N2=3 N3=3 N4=3 N5=2
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250305130550.1865988-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/inet_hashtables.c | 37 ++++++++++++++++++++++++++-----------
1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 7967ff7e02f79..60e81f6b1c6d4 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -1231,22 +1231,37 @@ int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo)
{
unsigned int locksz = sizeof(spinlock_t);
unsigned int i, nblocks = 1;
+ spinlock_t *ptr = NULL;
- if (locksz != 0) {
- /* allocate 2 cache lines or at least one spinlock per cpu */
- nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U);
- nblocks = roundup_pow_of_two(nblocks * num_possible_cpus());
+ if (locksz == 0)
+ goto set_mask;
- /* no more locks than number of hash buckets */
- nblocks = min(nblocks, hashinfo->ehash_mask + 1);
+ /* Allocate 2 cache lines or at least one spinlock per cpu. */
+ nblocks = max(2U * L1_CACHE_BYTES / locksz, 1U) * num_possible_cpus();
- hashinfo->ehash_locks = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
- if (!hashinfo->ehash_locks)
- return -ENOMEM;
+ /* At least one page per NUMA node. */
+ nblocks = max(nblocks, num_online_nodes() * PAGE_SIZE / locksz);
+
+ nblocks = roundup_pow_of_two(nblocks);
+
+ /* No more locks than number of hash buckets. */
+ nblocks = min(nblocks, hashinfo->ehash_mask + 1);
- for (i = 0; i < nblocks; i++)
- spin_lock_init(&hashinfo->ehash_locks[i]);
+ if (num_online_nodes() > 1) {
+ /* Use vmalloc() to allow NUMA policy to spread pages
+ * on all available nodes if desired.
+ */
+ ptr = vmalloc_array(nblocks, locksz);
+ }
+ if (!ptr) {
+ ptr = kvmalloc_array(nblocks, locksz, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
}
+ for (i = 0; i < nblocks; i++)
+ spin_lock_init(&ptr[i]);
+ hashinfo->ehash_locks = ptr;
+set_mask:
hashinfo->ehash_locks_mask = nblocks - 1;
return 0;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 110/294] ieee802154: ca8210: Use proper setters and getters for bitwise types
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (7 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 108/294] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 116/294] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
` (34 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Andy Shevchenko, Miquel Raynal, Linus Walleij, Stefan Schmidt,
Sasha Levin, alex.aring, andrew+netdev, davem, edumazet, kuba,
pabeni, linux-wpan, netdev
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[ Upstream commit 169b2262205836a5d1213ff44dca2962276bece1 ]
Sparse complains that the driver doesn't respect the bitwise types:
drivers/net/ieee802154/ca8210.c:1796:27: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1796:27: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1796:27: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1801:25: warning: incorrect type in assignment (different base types)
drivers/net/ieee802154/ca8210.c:1801:25: expected restricted __le16 [addressable] [assigned] [usertype] pan_id
drivers/net/ieee802154/ca8210.c:1801:25: got unsigned short [usertype]
drivers/net/ieee802154/ca8210.c:1928:28: warning: incorrect type in argument 3 (different base types)
drivers/net/ieee802154/ca8210.c:1928:28: expected unsigned short [usertype] dst_pan_id
drivers/net/ieee802154/ca8210.c:1928:28: got restricted __le16 [addressable] [usertype] pan_id
Use proper setters and getters for bitwise types.
Note, in accordance with [1] the protocol is little endian.
Link: https://www.cascoda.com/wp-content/uploads/2018/11/CA-8210_datasheet_0418.pdf [1]
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/20250305105656.2133487-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ieee802154/ca8210.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 0a0ad3d77557f..587643a371de3 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -1446,8 +1446,7 @@ static u8 mcps_data_request(
command.pdata.data_req.src_addr_mode = src_addr_mode;
command.pdata.data_req.dst.mode = dst_address_mode;
if (dst_address_mode != MAC_MODE_NO_ADDR) {
- command.pdata.data_req.dst.pan_id[0] = LS_BYTE(dst_pan_id);
- command.pdata.data_req.dst.pan_id[1] = MS_BYTE(dst_pan_id);
+ put_unaligned_le16(dst_pan_id, command.pdata.data_req.dst.pan_id);
if (dst_address_mode == MAC_MODE_SHORT_ADDR) {
command.pdata.data_req.dst.address[0] = LS_BYTE(
dst_addr->short_address
@@ -1795,12 +1794,12 @@ static int ca8210_skb_rx(
}
hdr.source.mode = data_ind[0];
dev_dbg(&priv->spi->dev, "srcAddrMode: %#03x\n", hdr.source.mode);
- hdr.source.pan_id = *(u16 *)&data_ind[1];
+ hdr.source.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[1]));
dev_dbg(&priv->spi->dev, "srcPanId: %#06x\n", hdr.source.pan_id);
memcpy(&hdr.source.extended_addr, &data_ind[3], 8);
hdr.dest.mode = data_ind[11];
dev_dbg(&priv->spi->dev, "dstAddrMode: %#03x\n", hdr.dest.mode);
- hdr.dest.pan_id = *(u16 *)&data_ind[12];
+ hdr.dest.pan_id = cpu_to_le16(get_unaligned_le16(&data_ind[12]));
dev_dbg(&priv->spi->dev, "dstPanId: %#06x\n", hdr.dest.pan_id);
memcpy(&hdr.dest.extended_addr, &data_ind[14], 8);
@@ -1927,7 +1926,7 @@ static int ca8210_skb_tx(
status = mcps_data_request(
header.source.mode,
header.dest.mode,
- header.dest.pan_id,
+ le16_to_cpu(header.dest.pan_id),
(union macaddr *)&header.dest.extended_addr,
skb->len - mac_len,
&skb->data[mac_len],
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 116/294] net: phylink: use pl->link_interface in phylink_expects_phy()
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (8 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 110/294] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 122/294] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
` (33 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Choong Yong Liang, Russell King, Jakub Kicinski, Sasha Levin,
linux, andrew, hkallweit1, davem, edumazet, pabeni, netdev
From: Choong Yong Liang <yong.liang.choong@linux.intel.com>
[ Upstream commit b63263555eaafbf9ab1a82f2020bbee872d83759 ]
The phylink_expects_phy() function allows MAC drivers to check if they are
expecting a PHY to attach. The checking condition in phylink_expects_phy()
aims to achieve the same result as the checking condition in
phylink_attach_phy().
However, the checking condition in phylink_expects_phy() uses
pl->link_config.interface, while phylink_attach_phy() uses
pl->link_interface.
Initially, both pl->link_interface and pl->link_config.interface are set
to SGMII, and pl->cfg_link_an_mode is set to MLO_AN_INBAND.
When the interface switches from SGMII to 2500BASE-X,
pl->link_config.interface is updated by phylink_major_config().
At this point, pl->cfg_link_an_mode remains MLO_AN_INBAND, and
pl->link_config.interface is set to 2500BASE-X.
Subsequently, when the STMMAC interface is taken down
administratively and brought back up, it is blocked by
phylink_expects_phy().
Since phylink_expects_phy() and phylink_attach_phy() aim to achieve the
same result, phylink_expects_phy() should check pl->link_interface,
which never changes, instead of pl->link_config.interface, which is
updated by phylink_major_config().
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Choong Yong Liang <yong.liang.choong@linux.intel.com>
Link: https://patch.msgid.link/20250227121522.1802832-2-yong.liang.choong@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/phy/phylink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index b5f012619e42d..800e8b9eb4532 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1718,7 +1718,7 @@ bool phylink_expects_phy(struct phylink *pl)
{
if (pl->cfg_link_an_mode == MLO_AN_FIXED ||
(pl->cfg_link_an_mode == MLO_AN_INBAND &&
- phy_interface_mode_is_8023z(pl->link_config.interface)))
+ phy_interface_mode_is_8023z(pl->link_interface)))
return false;
return true;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 122/294] net: ethernet: ti: cpsw_new: populate netdev of_node
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (9 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 116/294] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 123/294] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
` (32 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexander Sverdlin, Siddharth Vadapalli, Andrew Lunn,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, edumazet,
pabeni, alexander.sverdlin, u.kleine-koenig, aleksander.lobakin,
lorenzo, hkallweit1, nicolas.dichtel, linux-omap, netdev
From: Alexander Sverdlin <alexander.sverdlin@siemens.com>
[ Upstream commit 7ff1c88fc89688c27f773ba956f65f0c11367269 ]
So that of_find_net_device_by_node() can find CPSW ports and other DSA
switches can be stacked downstream. Tested in conjunction with KSZ8873.
Reviewed-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com>
Link: https://patch.msgid.link/20250303074703.1758297-1-alexander.sverdlin@siemens.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/ti/cpsw_new.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 9061dca97fcbf..1c1d4806c119b 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1416,6 +1416,7 @@ static int cpsw_create_ports(struct cpsw_common *cpsw)
ndev->netdev_ops = &cpsw_netdev_ops;
ndev->ethtool_ops = &cpsw_ethtool_ops;
SET_NETDEV_DEV(ndev, dev);
+ ndev->dev.of_node = slave_data->slave_node;
if (!napi_ndev) {
/* CPSW Host port CPDMA interface is shared between
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 123/294] net: pktgen: fix mpls maximum labels list parsing
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (10 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 122/294] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 126/294] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
` (31 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Paolo Abeni, Sasha Levin, davem,
edumazet, kuba, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 2b15a0693f70d1e8119743ee89edbfb1271b3ea8 ]
Fix mpls maximum labels list parsing up to MAX_MPLS_LABELS entries (instead
of up to MAX_MPLS_LABELS - 1).
Addresses the following:
$ echo "mpls 00000f00,00000f01,00000f02,00000f03,00000f04,00000f05,00000f06,00000f07,00000f08,00000f09,00000f0a,00000f0b,00000f0c,00000f0d,00000f0e,00000f0f" > /proc/net/pktgen/lo\@0
-bash: echo: write error: Argument list too long
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 359e24c3f22ca..1decd6300f34c 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -897,6 +897,10 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
pkt_dev->nr_labels = 0;
do {
__u32 tmp;
+
+ if (n >= MAX_MPLS_LABELS)
+ return -E2BIG;
+
len = hex32_arg(&buffer[i], 8, &tmp);
if (len <= 0)
return len;
@@ -908,8 +912,6 @@ static ssize_t get_labels(const char __user *buffer, struct pktgen_dev *pkt_dev)
return -EFAULT;
i++;
n++;
- if (n >= MAX_MPLS_LABELS)
- return -E2BIG;
} while (c == ',');
pkt_dev->nr_labels = n;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 126/294] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config().
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (11 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 123/294] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
@ 2025-05-05 22:53 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 144/294] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
` (30 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:53 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, David Ahern, Jakub Kicinski,
Sasha Levin, davem, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 254ba7e6032d3fc738050d500b0c1d8197af90ca ]
fib_valid_key_len() is called in the beginning of fib_table_insert()
or fib_table_delete() to check if the prefix length is valid.
fib_table_insert() and fib_table_delete() are called from 3 paths
- ip_rt_ioctl()
- inet_rtm_newroute() / inet_rtm_delroute()
- fib_magic()
In the first ioctl() path, rtentry_to_fib_config() checks the prefix
length with bad_mask(). Also, fib_magic() always passes the correct
prefix: 32 or ifa->ifa_prefixlen, which is already validated.
Let's move fib_valid_key_len() to the rtnetlink path, rtm_to_fib_config().
While at it, 2 direct returns in rtm_to_fib_config() are changed to
goto to match other places in the same function
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20250228042328.96624-12-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_frontend.c | 18 ++++++++++++++++--
net/ipv4/fib_trie.c | 22 ----------------------
2 files changed, 16 insertions(+), 24 deletions(-)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 90ce87ffed461..7993ff46de23c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -829,19 +829,33 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
}
}
+ if (cfg->fc_dst_len > 32) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (cfg->fc_dst_len < 32 && (ntohl(cfg->fc_dst) << cfg->fc_dst_len)) {
+ NL_SET_ERR_MSG(extack, "Invalid prefix for given prefix length");
+ err = -EINVAL;
+ goto errout;
+ }
+
if (cfg->fc_nh_id) {
if (cfg->fc_oif || cfg->fc_gw_family ||
cfg->fc_encap || cfg->fc_mp) {
NL_SET_ERR_MSG(extack,
"Nexthop specification and nexthop id are mutually exclusive");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
}
if (has_gw && has_via) {
NL_SET_ERR_MSG(extack,
"Nexthop configuration can not contain both GATEWAY and VIA");
- return -EINVAL;
+ err = -EINVAL;
+ goto errout;
}
if (!cfg->fc_table)
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 77b97c48da5ea..fa54b36b241ac 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1192,22 +1192,6 @@ static int fib_insert_alias(struct trie *t, struct key_vector *tp,
return 0;
}
-static bool fib_valid_key_len(u32 key, u8 plen, struct netlink_ext_ack *extack)
-{
- if (plen > KEYLENGTH) {
- NL_SET_ERR_MSG(extack, "Invalid prefix length");
- return false;
- }
-
- if ((plen < KEYLENGTH) && (key << plen)) {
- NL_SET_ERR_MSG(extack,
- "Invalid prefix for given prefix length");
- return false;
- }
-
- return true;
-}
-
static void fib_remove_alias(struct trie *t, struct key_vector *tp,
struct key_vector *l, struct fib_alias *old);
@@ -1228,9 +1212,6 @@ int fib_table_insert(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
pr_debug("Insert table=%u %08x/%d\n", tb->tb_id, key, plen);
fi = fib_create_info(cfg, extack);
@@ -1723,9 +1704,6 @@ int fib_table_delete(struct net *net, struct fib_table *tb,
key = ntohl(cfg->fc_dst);
- if (!fib_valid_key_len(key, plen, extack))
- return -EINVAL;
-
l = fib_find_node(t, &tp, key);
if (!l)
return -ESRCH;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 144/294] net/mlx5: Avoid report two health errors on same syndrome
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (12 preceding siblings ...)
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 126/294] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 145/294] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
` (29 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Moshe Shemesh, Shahar Shitrit, Tariq Toukan, Kalesh AP,
David S . Miller, Sasha Levin, saeedm, andrew+netdev, edumazet,
kuba, pabeni, netdev, linux-rdma
From: Moshe Shemesh <moshe@nvidia.com>
[ Upstream commit b5d7b2f04ebcff740f44ef4d295b3401aeb029f4 ]
In case health counter has not increased for few polling intervals, miss
counter will reach max misses threshold and health report will be
triggered for FW health reporter. In case syndrome found on same health
poll another health report will be triggered.
Avoid two health reports on same syndrome by marking this syndrome as
already known.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/health.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index d798834c4e755..3ac8043f76dac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -833,6 +833,7 @@ static void poll_health(struct timer_list *t)
health->prev = count;
if (health->miss_counter == MAX_MISSES) {
mlx5_core_err(dev, "device's health compromised - reached miss count\n");
+ health->synd = ioread8(&h->synd);
print_health_info(dev);
queue_work(health->wq, &health->report_work);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 145/294] selftests/net: have `gro.sh -t` return a correct exit code
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (13 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 144/294] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 148/294] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
` (28 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kevin Krakauer, Willem de Bruijn, Jakub Kicinski, Sasha Levin,
davem, edumazet, pabeni, shuah, netdev, linux-kselftest
From: Kevin Krakauer <krakauer@google.com>
[ Upstream commit 784e6abd99f24024a8998b5916795f0bec9d2fd9 ]
Modify gro.sh to return a useful exit code when the -t flag is used. It
formerly returned 0 no matter what.
Tested: Ran `gro.sh -t large` and verified that test failures return 1.
Signed-off-by: Kevin Krakauer <krakauer@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250226192725.621969-2-krakauer@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/net/gro.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 342ad27f631b1..e771f5f7faa26 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -95,5 +95,6 @@ trap cleanup EXIT
if [[ "${test}" == "all" ]]; then
run_all_tests
else
- run_test "${proto}" "${test}"
+ exit_code=$(run_test "${proto}" "${test}")
+ exit $exit_code
fi;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 148/294] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (14 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 145/294] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 149/294] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
` (27 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Woudstra, Paolo Abeni, Sasha Levin, nbd, sean.wang, lorenzo,
andrew+netdev, davem, edumazet, kuba, matthias.bgg,
angelogioacchino.delregno, netdev, linux-arm-kernel,
linux-mediatek
From: Eric Woudstra <ericwouds@gmail.com>
[ Upstream commit 7fe0353606d77a32c4c7f2814833dd1c043ebdd2 ]
mtk_foe_entry_set_vlan() in mtk_ppe.c already supports double vlan
tagging, but mtk_flow_offload_replace() in mtk_ppe_offload.c only allows
for 1 vlan tag, optionally in combination with pppoe and dsa tags.
However, mtk_foe_entry_set_vlan() only allows for setting the vlan id.
The protocol cannot be set, it is always ETH_P_8021Q, for inner and outer
tag. This patch adds QinQ support to mtk_flow_offload_replace(), only in
the case that both inner and outer tags are ETH_P_8021Q.
Only PPPoE-in-Q (as before) and Q-in-Q are allowed. A combination
of PPPoE and Q-in-Q is not allowed.
Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
Link: https://patch.msgid.link/20250225201509.20843-1-ericwouds@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../net/ethernet/mediatek/mtk_ppe_offload.c | 22 +++++++++----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
index a4efbeb162084..889fd26843e60 100644
--- a/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
+++ b/drivers/net/ethernet/mediatek/mtk_ppe_offload.c
@@ -34,8 +34,10 @@ struct mtk_flow_data {
u16 vlan_in;
struct {
- u16 id;
- __be16 proto;
+ struct {
+ u16 id;
+ __be16 proto;
+ } vlans[2];
u8 num;
} vlan;
struct {
@@ -330,18 +332,19 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
case FLOW_ACTION_CSUM:
break;
case FLOW_ACTION_VLAN_PUSH:
- if (data.vlan.num == 1 ||
+ if (data.vlan.num + data.pppoe.num == 2 ||
act->vlan.proto != htons(ETH_P_8021Q))
return -EOPNOTSUPP;
- data.vlan.id = act->vlan.vid;
- data.vlan.proto = act->vlan.proto;
+ data.vlan.vlans[data.vlan.num].id = act->vlan.vid;
+ data.vlan.vlans[data.vlan.num].proto = act->vlan.proto;
data.vlan.num++;
break;
case FLOW_ACTION_VLAN_POP:
break;
case FLOW_ACTION_PPPOE_PUSH:
- if (data.pppoe.num == 1)
+ if (data.pppoe.num == 1 ||
+ data.vlan.num == 2)
return -EOPNOTSUPP;
data.pppoe.sid = act->pppoe.sid;
@@ -431,12 +434,9 @@ mtk_flow_offload_replace(struct mtk_eth *eth, struct flow_cls_offload *f,
if (offload_type == MTK_PPE_PKT_TYPE_BRIDGE)
foe.bridge.vlan = data.vlan_in;
- if (data.vlan.num == 1) {
- if (data.vlan.proto != htons(ETH_P_8021Q))
- return -EOPNOTSUPP;
+ for (i = 0; i < data.vlan.num; i++)
+ mtk_foe_entry_set_vlan(eth, &foe, data.vlan.vlans[i].id);
- mtk_foe_entry_set_vlan(eth, &foe, data.vlan.id);
- }
if (data.pppoe.num == 1)
mtk_foe_entry_set_pppoe(eth, &foe, data.pppoe.sid);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 149/294] net: xgene-v2: remove incorrect ACPI_PTR annotation
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (15 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 148/294] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 150/294] bonding: report duplicate MAC address in all situations Sasha Levin
` (26 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Arnd Bergmann, Paolo Abeni, Sasha Levin, iyappan, keyur,
andrew+netdev, davem, edumazet, kuba, netdev
From: Arnd Bergmann <arnd@arndb.de>
[ Upstream commit 01358e8fe922f716c05d7864ac2213b2440026e7 ]
Building with W=1 shows a warning about xge_acpi_match being unused when
CONFIG_ACPI is disabled:
drivers/net/ethernet/apm/xgene-v2/main.c:723:36: error: unused variable 'xge_acpi_match' [-Werror,-Wunused-const-variable]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20250225163341.4168238-2-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c b/drivers/net/ethernet/apm/xgene-v2/main.c
index 379d19d18dbed..5808e3c73a8f4 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -9,8 +9,6 @@
#include "main.h"
-static const struct acpi_device_id xge_acpi_match[];
-
static int xge_get_resources(struct xge_pdata *pdata)
{
struct platform_device *pdev;
@@ -733,7 +731,7 @@ MODULE_DEVICE_TABLE(acpi, xge_acpi_match);
static struct platform_driver xge_driver = {
.driver = {
.name = "xgene-enet-v2",
- .acpi_match_table = ACPI_PTR(xge_acpi_match),
+ .acpi_match_table = xge_acpi_match,
},
.probe = xge_probe,
.remove = xge_remove,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 150/294] bonding: report duplicate MAC address in all situations
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (16 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 149/294] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 153/294] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
` (25 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hangbin Liu, Nikolay Aleksandrov, Jakub Kicinski, Sasha Levin, jv,
andrew+netdev, davem, edumazet, pabeni, netdev
From: Hangbin Liu <liuhangbin@gmail.com>
[ Upstream commit 28d68d396a1cd21591e8c6d74afbde33a7ea107e ]
Normally, a bond uses the MAC address of the first added slave as the bond’s
MAC address. And the bond will set active slave’s MAC address to bond’s
address if fail_over_mac is set to none (0) or follow (2).
When the first slave is removed, the bond will still use the removed slave’s
MAC address, which can lead to a duplicate MAC address and potentially cause
issues with the switch. To avoid confusion, let's warn the user in all
situations, including when fail_over_mac is set to 2 or not in active-backup
mode.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250225033914.18617-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7eb62fe55947f..56c241246d1af 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2469,7 +2469,7 @@ static int __bond_release_one(struct net_device *bond_dev,
RCU_INIT_POINTER(bond->current_arp_slave, NULL);
- if (!all && (!bond->params.fail_over_mac ||
+ if (!all && (bond->params.fail_over_mac != BOND_FOM_ACTIVE ||
BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)) {
if (ether_addr_equal_64bits(bond_dev->dev_addr, slave->perm_hwaddr) &&
bond_has_slaves(bond))
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 153/294] Octeontx2-af: RPM: Register driver with PCI subsys IDs
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (17 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 150/294] bonding: report duplicate MAC address in all situations Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 160/294] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
` (24 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Hariprasad Kelam, Jakub Kicinski, Sasha Levin, sgoutham, lcherian,
gakula, jerinj, sbhatta, andrew+netdev, davem, edumazet, pabeni,
netdev
From: Hariprasad Kelam <hkelam@marvell.com>
[ Upstream commit fc9167192f29485be5621e2e9c8208b717b65753 ]
Although the PCI device ID and Vendor ID for the RPM (MAC) block
have remained the same across Octeon CN10K and the next-generation
CN20K silicon, Hardware architecture has changed (NIX mapped RPMs
and RFOE Mapped RPMs).
Add PCI Subsystem IDs to the device table to ensure that this driver
can be probed from NIX mapped RPM devices only.
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Link: https://patch.msgid.link/20250224035603.1220913-1-hkelam@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 14 ++++++++++++--
drivers/net/ethernet/marvell/octeontx2/af/rvu.h | 2 ++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
index 52792546fe00d..fc771e495836e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cgx.c
@@ -66,8 +66,18 @@ static int cgx_fwi_link_change(struct cgx *cgx, int lmac_id, bool en);
/* Supported devices */
static const struct pci_device_id cgx_id_table[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_OCTEONTX2_CGX) },
- { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM) },
- { PCI_DEVICE(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN10K_A) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF10K_A) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF10K_B) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN10K_B) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CN20KA) },
+ { PCI_DEVICE_SUB(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10KB_RPM,
+ PCI_ANY_ID, PCI_SUBSYS_DEVID_CNF20KA) },
{ 0, } /* end of table */
};
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index a607c7294b0c5..9fbc071ef29b0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -30,6 +30,8 @@
#define PCI_SUBSYS_DEVID_CNF10K_A 0xBA00
#define PCI_SUBSYS_DEVID_CNF10K_B 0xBC00
#define PCI_SUBSYS_DEVID_CN10K_B 0xBD00
+#define PCI_SUBSYS_DEVID_CN20KA 0xC220
+#define PCI_SUBSYS_DEVID_CNF20KA 0xC320
/* PCI BAR nos */
#define PCI_AF_REG_BAR_NUM 0
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 160/294] vhost-scsi: Return queue full for page alloc failures during copy
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (18 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 153/294] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 168/294] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
` (23 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Mike Christie, Michael S . Tsirkin, Stefan Hajnoczi, Sasha Levin,
jasowang, virtualization, kvm, netdev
From: Mike Christie <michael.christie@oracle.com>
[ Upstream commit 891b99eab0f89dbe08d216f4ab71acbeaf7a3102 ]
This has us return queue full if we can't allocate a page during the
copy operation so the initiator can retry.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20241203191705.19431-5-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/vhost/scsi.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8d8a22504d71f..9a62372bdac32 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -746,7 +746,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
size_t len = iov_iter_count(iter);
unsigned int nbytes = 0;
struct page *page;
- int i;
+ int i, ret;
if (cmd->tvc_data_direction == DMA_FROM_DEVICE) {
cmd->saved_iter_addr = dup_iter(&cmd->saved_iter, iter,
@@ -759,6 +759,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
page = alloc_page(GFP_KERNEL);
if (!page) {
i--;
+ ret = -ENOMEM;
goto err;
}
@@ -766,8 +767,10 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
sg_set_page(&sg[i], page, nbytes, 0);
if (cmd->tvc_data_direction == DMA_TO_DEVICE &&
- copy_page_from_iter(page, 0, nbytes, iter) != nbytes)
+ copy_page_from_iter(page, 0, nbytes, iter) != nbytes) {
+ ret = -EFAULT;
goto err;
+ }
len -= nbytes;
}
@@ -782,7 +785,7 @@ vhost_scsi_copy_iov_to_sgl(struct vhost_scsi_cmd *cmd, struct iov_iter *iter,
for (; i >= 0; i--)
__free_page(sg_page(&sg[i]));
kfree(cmd->saved_iter_addr);
- return -ENOMEM;
+ return ret;
}
static int
@@ -1221,9 +1224,9 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
" %d\n", cmd, exp_data_len, prot_bytes, data_direction);
if (data_direction != DMA_NONE) {
- if (unlikely(vhost_scsi_mapal(cmd, prot_bytes,
- &prot_iter, exp_data_len,
- &data_iter))) {
+ ret = vhost_scsi_mapal(cmd, prot_bytes, &prot_iter,
+ exp_data_len, &data_iter);
+ if (unlikely(ret)) {
vq_err(vq, "Failed to map iov to sgl\n");
vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd);
goto err;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 168/294] net/mlx5: Change POOL_NEXT_SIZE define value and make it global
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (19 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 160/294] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 181/294] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
` (22 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Patrisious Haddad, Maor Gottlieb, Mark Bloch, Tariq Toukan,
Leon Romanovsky, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, kuba, pabeni, cratiu, bpoirier, horms, vulab,
michal.swiatkowski, netdev, linux-rdma
From: Patrisious Haddad <phaddad@nvidia.com>
[ Upstream commit 80df31f384b4146a62a01b3d4beb376cc7b9a89e ]
Change POOL_NEXT_SIZE define value from 0 to BIT(30), since this define
is used to request the available maximum sized flow table, and zero doesn't
make sense for it, whereas some places in the driver use zero explicitly
expecting the smallest table size possible but instead due to this
define they end up allocating the biggest table size unawarely.
In addition move the definition to "include/linux/mlx5/fs.h" to expose the
define to IB driver as well, while appropriately renaming it.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250219085808.349923-3-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c | 6 ++++--
drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h | 2 --
drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c | 3 ++-
include/linux/mlx5/fs.h | 2 ++
5 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
index 8587cd572da53..bdb825aa87268 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/legacy.c
@@ -96,7 +96,7 @@ static int esw_create_legacy_fdb_table(struct mlx5_eswitch *esw)
if (!flow_group_in)
return -ENOMEM;
- ft_attr.max_fte = POOL_NEXT_SIZE;
+ ft_attr.max_fte = MLX5_FS_MAX_POOL_SIZE;
ft_attr.prio = LEGACY_FDB_PRIO;
fdb = mlx5_create_flow_table(root_ns, &ft_attr);
if (IS_ERR(fdb)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
index c14590acc7726..f6abfd00d7e68 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.c
@@ -50,10 +50,12 @@ mlx5_ft_pool_get_avail_sz(struct mlx5_core_dev *dev, enum fs_flow_table_type tab
int i, found_i = -1;
for (i = ARRAY_SIZE(FT_POOLS) - 1; i >= 0; i--) {
- if (dev->priv.ft_pool->ft_left[i] && FT_POOLS[i] >= desired_size &&
+ if (dev->priv.ft_pool->ft_left[i] &&
+ (FT_POOLS[i] >= desired_size ||
+ desired_size == MLX5_FS_MAX_POOL_SIZE) &&
FT_POOLS[i] <= max_ft_size) {
found_i = i;
- if (desired_size != POOL_NEXT_SIZE)
+ if (desired_size != MLX5_FS_MAX_POOL_SIZE)
break;
}
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
index 25f4274b372b5..173e312db7204 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_ft_pool.h
@@ -7,8 +7,6 @@
#include <linux/mlx5/driver.h>
#include "fs_core.h"
-#define POOL_NEXT_SIZE 0
-
int mlx5_ft_pool_init(struct mlx5_core_dev *dev);
void mlx5_ft_pool_destroy(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
index 711d14dea2485..d313cb7f0ed88 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/fs_chains.c
@@ -161,7 +161,8 @@ mlx5_chains_create_table(struct mlx5_fs_chains *chains,
ft_attr.flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_REFORMAT |
MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
- sz = (chain == mlx5_chains_get_nf_ft_chain(chains)) ? FT_TBL_SZ : POOL_NEXT_SIZE;
+ sz = (chain == mlx5_chains_get_nf_ft_chain(chains)) ?
+ FT_TBL_SZ : MLX5_FS_MAX_POOL_SIZE;
ft_attr.max_fte = sz;
/* We use chains_default_ft(chains) as the table's next_ft till
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 3fb428ce7d1c7..b6b9a4dfa4fb9 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -40,6 +40,8 @@
#define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
+#define MLX5_FS_MAX_POOL_SIZE BIT(30)
+
enum mlx5_flow_destination_type {
MLX5_FLOW_DESTINATION_TYPE_NONE,
MLX5_FLOW_DESTINATION_TYPE_VPORT,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 181/294] net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (20 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 168/294] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 183/294] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
` (21 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Peter Seiderer, Simon Horman, Jakub Kicinski, Sasha Levin, davem,
edumazet, pabeni, netdev
From: Peter Seiderer <ps.report@gmx.net>
[ Upstream commit 425e64440ad0a2f03bdaf04be0ae53dededbaa77 ]
Honour the user given buffer size for the strn_len() calls (otherwise
strn_len() will access memory outside of the user given buffer).
Signed-off-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250219084527.20488-8-ps.report@gmx.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/pktgen.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 1decd6300f34c..6afea369ca213 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -1877,8 +1877,8 @@ static ssize_t pktgen_thread_write(struct file *file,
i = len;
/* Read variable name */
-
- len = strn_len(&user_buffer[i], sizeof(name) - 1);
+ max = min(sizeof(name) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0)
return len;
@@ -1908,7 +1908,8 @@ static ssize_t pktgen_thread_write(struct file *file,
if (!strcmp(name, "add_device")) {
char f[32];
memset(f, 0, 32);
- len = strn_len(&user_buffer[i], sizeof(f) - 1);
+ max = min(sizeof(f) - 1, count - i);
+ len = strn_len(&user_buffer[i], max);
if (len < 0) {
ret = len;
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 183/294] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (21 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 181/294] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 191/294] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
` (20 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jason Xing, Martin KaFai Lau, Sasha Levin, ast, daniel, andrii,
edumazet, ncardwell, davem, kuba, pabeni, martin.lau, dsahern,
bpf, netdev
From: Jason Xing <kerneljasonxing@gmail.com>
[ Upstream commit fd93eaffb3f977b23bc0a48d4c8616e654fcf133 ]
The subsequent patch will implement BPF TX timestamping. It will
call the sockops BPF program without holding the sock lock.
This breaks the current assumption that all sock ops programs will
hold the sock lock. The sock's fields of the uapi's bpf_sock_ops
requires this assumption.
To address this, a new "u8 is_locked_tcp_sock;" field is added. This
patch sets it in the current sock_ops callbacks. The "is_fullsock"
test is then replaced by the "is_locked_tcp_sock" test during
sock_ops_convert_ctx_access().
The new TX timestamping callbacks added in the subsequent patch will
not have this set. This will prevent unsafe access from the new
timestamping callbacks.
Potentially, we could allow read-only access. However, this would
require identifying which callback is read-safe-only and also requires
additional BPF instruction rewrites in the covert_ctx. Since the BPF
program can always read everything from a socket (e.g., by using
bpf_core_cast), this patch keeps it simple and disables all read
and write access to any socket fields through the bpf_sock_ops
UAPI from the new TX timestamping callback.
Moreover, note that some of the fields in bpf_sock_ops are specific
to tcp_sock, and sock_ops currently only supports tcp_sock. In
the future, UDP timestamping will be added, which will also break
this assumption. The same idea used in this patch will be reused.
Considering that the current sock_ops only supports tcp_sock, the
variable is named is_locked_"tcp"_sock.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250220072940.99994-4-kerneljasonxing@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/filter.h | 1 +
include/net/tcp.h | 1 +
net/core/filter.c | 8 ++++----
net/ipv4/tcp_input.c | 2 ++
net/ipv4/tcp_output.c | 2 ++
5 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 5090e940ba3e4..fa43d343b298b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1303,6 +1303,7 @@ struct bpf_sock_ops_kern {
void *skb_data_end;
u8 op;
u8 is_fullsock;
+ u8 is_locked_tcp_sock;
u8 remaining_opt_len;
u64 temp; /* temp and everything after is not
* initialized to 0 before calling
diff --git a/include/net/tcp.h b/include/net/tcp.h
index a6def0aab3ed3..658551a64e2a5 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2462,6 +2462,7 @@ static inline int tcp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
if (sk_fullsock(sk)) {
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_owned_by_me(sk);
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 39eef3370d800..79165e181d418 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10304,10 +10304,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
} \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
fullsock_reg, si->src_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp); \
if (si->dst_reg == si->src_reg) \
*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg, \
@@ -10392,10 +10392,10 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
temp)); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, \
- is_fullsock), \
+ is_locked_tcp_sock), \
reg, si->dst_reg, \
offsetof(struct bpf_sock_ops_kern, \
- is_fullsock)); \
+ is_locked_tcp_sock)); \
*insn++ = BPF_JMP_IMM(BPF_JEQ, reg, 0, 2); \
*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF( \
struct bpf_sock_ops_kern, sk),\
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a172248b66783..4bed6f9923059 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -168,6 +168,7 @@ static void bpf_skops_parse_hdr(struct sock *sk, struct sk_buff *skb)
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = BPF_SOCK_OPS_PARSE_HDR_OPT_CB;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
bpf_skops_init_skb(&sock_ops, skb, tcp_hdrlen(skb));
@@ -184,6 +185,7 @@ static void bpf_skops_established(struct sock *sk, int bpf_op,
memset(&sock_ops, 0, offsetof(struct bpf_sock_ops_kern, temp));
sock_ops.op = bpf_op;
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
/* sk with TCP_REPAIR_ON does not have skb in tcp_finish_connect */
if (skb)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3771ed22c2f56..2cd96fdeeefdb 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -522,6 +522,7 @@ static void bpf_skops_hdr_opt_len(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
@@ -567,6 +568,7 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
sock_owned_by_me(sk);
sock_ops.is_fullsock = 1;
+ sock_ops.is_locked_tcp_sock = 1;
sock_ops.sk = sk;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 191/294] eth: mlx4: don't try to complete XDP frames in netpoll
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (22 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 183/294] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 194/294] vxlan: Join / leave MC group after remote changes Sasha Levin
` (19 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jakub Kicinski, Tariq Toukan, Sasha Levin, andrew+netdev, davem,
edumazet, pabeni, ast, daniel, hawk, john.fastabend, netdev,
linux-rdma, bpf
From: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 8fdeafd66edaf420ea0063a1f13442fe3470fe70 ]
mlx4 doesn't support ndo_xdp_xmit / XDP_REDIRECT and wasn't
using page pool until now, so it could run XDP completions
in netpoll (NAPI budget == 0) just fine. Page pool has calling
context requirements, make sure we don't try to call it from
what is potentially HW IRQ context.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250213010635.1354034-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 65cb63f6c4658..61a0fd8424a2c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -450,6 +450,8 @@ int mlx4_en_process_tx_cq(struct net_device *dev,
if (unlikely(!priv->port_up))
return 0;
+ if (unlikely(!napi_budget) && cq->type == TX_XDP)
+ return 0;
netdev_txq_bql_complete_prefetchw(ring->tx_queue);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 194/294] vxlan: Join / leave MC group after remote changes
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (23 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 191/294] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 196/294] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
` (18 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Petr Machata, Ido Schimmel, Nikolay Aleksandrov, Paolo Abeni,
Sasha Levin, andrew+netdev, davem, edumazet, kuba, menglong8.dong,
gnault, netdev
From: Petr Machata <petrm@nvidia.com>
[ Upstream commit d42d543368343c0449a4e433b5f02e063a86209c ]
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.
Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.
Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.
One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):
# ip link add name br up type bridge vlan_filtering 1
# ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
# bridge vni add dev vx1 vni 200 group 224.0.0.1
# tcpdump -i lo &
# bridge vni add dev vx1 vni 200 group 224.0.0.2
18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
# bridge vni
dev vni group/remote
vx1 200 224.0.0.2
Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.
The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/vxlan/vxlan_core.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 64db3e98a1b66..822cf49d82676 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4240,6 +4240,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
struct netlink_ext_ack *extack)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
+ bool rem_ip_changed, change_igmp;
struct net_device *lowerdev;
struct vxlan_config conf;
struct vxlan_rdst *dst;
@@ -4263,8 +4264,13 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (err)
return err;
+ rem_ip_changed = !vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip);
+ change_igmp = vxlan->dev->flags & IFF_UP &&
+ (rem_ip_changed ||
+ dst->remote_ifindex != conf.remote_ifindex);
+
/* handle default dst entry */
- if (!vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip)) {
+ if (rem_ip_changed) {
u32 hash_index = fdb_head_index(vxlan, all_zeros_mac, conf.vni);
spin_lock_bh(&vxlan->hash_lock[hash_index]);
@@ -4308,6 +4314,9 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
}
}
+ if (change_igmp && vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_leave(vxlan);
+
if (conf.age_interval != vxlan->cfg.age_interval)
mod_timer(&vxlan->age_timer, jiffies);
@@ -4315,7 +4324,12 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (lowerdev && lowerdev != dst->remote_dev)
dst->remote_dev = lowerdev;
vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true);
- return 0;
+
+ if (!err && change_igmp &&
+ vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_join(vxlan);
+
+ return err;
}
static void vxlan_dellink(struct net_device *dev, struct list_head *head)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 196/294] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (24 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 194/294] vxlan: Join / leave MC group after remote changes Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 197/294] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
` (17 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 633f16d7e07c129a36b882c05379e01ce5bdb542 ]
In the sensor_count field of the MTEWE register, bits 1-62 are
supported only for unmanaged switches, not for NICs, and bit 63
is reserved for internal use.
To prevent confusing output that may include set bits that are
not relevant to NIC sensors, we update the bitmask to retain only
the first bit, which corresponds to the sensor ASIC.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 3ec892d51f57d..0f4763dab5d25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -163,6 +163,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
u64 value_msb;
value_lsb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_lsb);
+ /* bit 1-63 are not supported for NICs,
+ * hence read only bit 0 (asic) from lsb.
+ */
+ value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
mlx5_core_warn(events->dev,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 197/294] net/mlx5: Apply rate-limiting to high temperature warning
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (25 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 196/294] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
@ 2025-05-05 22:54 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 214/294] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
` (16 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:54 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Shahar Shitrit, Tariq Toukan, Mateusz Polchlopek, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Shahar Shitrit <shshitrit@nvidia.com>
[ Upstream commit 9dd3d5d258aceb37bdf09c8b91fa448f58ea81f0 ]
Wrap the high temperature warning in a temperature event with
a call to net_ratelimit() to prevent flooding the kernel log
with repeated warning messages when temperature exceeds the
threshold multiple times within a short duration.
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Link: https://patch.msgid.link/20250213094641.226501-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/events.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/events.c b/drivers/net/ethernet/mellanox/mlx5/core/events.c
index 0f4763dab5d25..e7143d32b2211 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/events.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/events.c
@@ -169,9 +169,10 @@ static int temp_warn(struct notifier_block *nb, unsigned long type, void *data)
value_lsb &= 0x1;
value_msb = be64_to_cpu(eqe->data.temp_warning.sensor_warning_msb);
- mlx5_core_warn(events->dev,
- "High temperature on sensors with bit set %llx %llx",
- value_msb, value_lsb);
+ if (net_ratelimit())
+ mlx5_core_warn(events->dev,
+ "High temperature on sensors with bit set %llx %llx",
+ value_msb, value_lsb);
return NOTIFY_OK;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 214/294] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (26 preceding siblings ...)
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 197/294] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 232/294] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
` (15 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kees Cook, Jakub Kicinski, Sasha Levin, tariqt, andrew+netdev,
davem, edumazet, pabeni, yishaih, netdev, linux-rdma
From: Kees Cook <kees@kernel.org>
[ Upstream commit 4a6f18f28627e121bd1f74b5fcc9f945d6dbeb1e ]
GCC can see that the value range for "order" is capped, but this leads
it to consider that it might be negative, leading to a false positive
warning (with GCC 15 with -Warray-bounds -fdiagnostics-details):
../drivers/net/ethernet/mellanox/mlx4/alloc.c:691:47: error: array subscript -1 is below array bounds of 'long unsigned int *[2]' [-Werror=array-bounds=]
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o);
| ~~~~~~~~~~~^~~
'mlx4_alloc_db_from_pgdir': events 1-2
691 | i = find_first_bit(pgdir->bits[o], MLX4_DB_PER_PAGE >> o); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | | | | (2) out of array bounds here
| (1) when the condition is evaluated to true In file included from ../drivers/net/ethernet/mellanox/mlx4/mlx4.h:53,
from ../drivers/net/ethernet/mellanox/mlx4/alloc.c:42:
../include/linux/mlx4/device.h:664:33: note: while referencing 'bits'
664 | unsigned long *bits[2];
| ^~~~
Switch the argument to unsigned int, which removes the compiler needing
to consider negative values.
Signed-off-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20250210174504.work.075-kees@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx4/alloc.c | 6 +++---
include/linux/mlx4/device.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index b330020dc0d67..f2bded847e61d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -682,9 +682,9 @@ static struct mlx4_db_pgdir *mlx4_alloc_db_pgdir(struct device *dma_device)
}
static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
- struct mlx4_db *db, int order)
+ struct mlx4_db *db, unsigned int order)
{
- int o;
+ unsigned int o;
int i;
for (o = order; o <= 1; ++o) {
@@ -712,7 +712,7 @@ static int mlx4_alloc_db_from_pgdir(struct mlx4_db_pgdir *pgdir,
return 0;
}
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order)
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order)
{
struct mlx4_priv *priv = mlx4_priv(dev);
struct mlx4_db_pgdir *pgdir;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 27f42f713c891..86f0f2a25a3d6 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -1135,7 +1135,7 @@ int mlx4_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
int mlx4_buf_write_mtt(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
struct mlx4_buf *buf);
-int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, int order);
+int mlx4_db_alloc(struct mlx4_dev *dev, struct mlx4_db *db, unsigned int order);
void mlx4_db_free(struct mlx4_dev *dev, struct mlx4_db *db);
int mlx4_alloc_hwq_res(struct mlx4_dev *dev, struct mlx4_hwq_resources *wqres,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 232/294] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (27 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 214/294] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 233/294] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
` (14 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Alexei Lazar, Dragos Tatulea, Tariq Toukan, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
netdev, linux-rdma
From: Alexei Lazar <alazar@nvidia.com>
[ Upstream commit 95b9606b15bb3ce1198d28d2393dd0e1f0a5f3e9 ]
Current loopback test validation ignores non-linear SKB case in
the SKB access, which can lead to failures in scenarios such as
when HW GRO is enabled.
Linearize the SKB so both cases will be handled.
Signed-off-by: Alexei Lazar <alazar@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250209101716.112774-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
index 08a75654f5f18..c170503b3aace 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_selftest.c
@@ -165,6 +165,9 @@ mlx5e_test_loopback_validate(struct sk_buff *skb,
struct udphdr *udph;
struct iphdr *iph;
+ if (skb_linearize(skb))
+ goto out;
+
/* We are only going to peek, no need to clone the SKB */
if (MLX5E_TEST_PKT_SIZE - ETH_HLEN > skb_headlen(skb))
goto out;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 233/294] net/mlx5e: set the tx_queue_len for pfifo_fast
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (28 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 232/294] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 234/294] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
` (13 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Daniel Jurgens, Tariq Toukan, Michal Swiatkowski,
Jakub Kicinski, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit a38cc5706fb9f7dc4ee3a443f61de13ce1e410ed ]
By default, the mq netdev creates a pfifo_fast qdisc. On a
system with 16 core, the pfifo_fast with 3 bands consumes
16 * 3 * 8 (size of pointer) * 1024 (default tx queue len)
= 393KB. The patch sets the tx qlen to representor default
value, 128 (1<<MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE), which
consumes 16 * 3 * 8 * 128 = 49KB, saving 344KB for each
representor at ECPF.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 751d3ffcd2f6c..39d8e63e8856d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -829,6 +829,8 @@ static void mlx5e_build_rep_netdev(struct net_device *netdev,
netdev->ethtool_ops = &mlx5e_rep_ethtool_ops;
netdev->watchdog_timeo = 15 * HZ;
+ if (mlx5_core_is_ecpf(mdev))
+ netdev->tx_queue_len = 1 << MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE;
#if IS_ENABLED(CONFIG_MLX5_CLS_ACT)
netdev->hw_features |= NETIF_F_HW_TC;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 234/294] net/mlx5e: reduce rep rxq depth to 256 for ECPF
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (29 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 233/294] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 235/294] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
` (12 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Bodong Wang, Saeed Mahameed, Tariq Toukan,
Michal Swiatkowski, Jakub Kicinski, Sasha Levin, andrew+netdev,
davem, edumazet, pabeni, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit b9cc8f9d700867aaa77aedddfea85e53d5e5d584 ]
By experiments, a single queue representor netdev consumes kernel
memory around 2.8MB, and 1.8MB out of the 2.8MB is due to page
pool for the RXQ. Scaling to a thousand representors consumes 2.8GB,
which becomes a memory pressure issue for embedded devices such as
BlueField-2 16GB / BlueField-3 32GB memory.
Since representor netdevs mostly handles miss traffic, and ideally,
most of the traffic will be offloaded, reduce the default non-uplink
rep netdev's RXQ default depth from 1024 to 256 if mdev is ecpf eswitch
manager. This saves around 1MB of memory per regular RQ,
(1024 - 256) * 2KB, allocated from page pool.
With rxq depth of 256, the netlink page pool tool reports
$./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump page-pool-get
{'id': 277,
'ifindex': 9,
'inflight': 128,
'inflight-mem': 786432,
'napi-id': 775}]
This is due to mtu 1500 + headroom consumes half pages, so 256 rxq
entries consumes around 128 pages (thus create a page pool with
size 128), shown above at inflight.
Note that each netdev has multiple types of RQs, including
Regular RQ, XSK, PTP, Drop, Trap RQ. Since non-uplink representor
only supports regular rq, this patch only changes the regular RQ's
default depth.
Signed-off-by: William Tu <witu@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 39d8e63e8856d..851c499faa795 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -63,6 +63,7 @@
#define MLX5E_REP_PARAMS_DEF_LOG_SQ_SIZE \
max(0x7, MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE)
#define MLX5E_REP_PARAMS_DEF_NUM_CHANNELS 1
+#define MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE 0x8
static const char mlx5e_rep_driver_name[] = "mlx5e_rep";
@@ -798,6 +799,8 @@ static void mlx5e_build_rep_params(struct net_device *netdev)
/* RQ */
mlx5e_build_rq_params(mdev, params);
+ if (!mlx5e_is_uplink_rep(priv) && mlx5_core_is_ecpf(mdev))
+ params->log_rq_mtu_frames = MLX5E_REP_PARAMS_DEF_LOG_RQ_SIZE;
/* If netdev is already registered (e.g. move from nic profile to uplink,
* RTNL lock must be held before triggering netdev notifiers.
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 235/294] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (30 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 234/294] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 239/294] net: fec: Refactor MAC reset to function Sasha Levin
` (11 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: William Tu, Tariq Toukan, Michal Swiatkowski, Jakub Kicinski,
Sasha Levin, saeedm, andrew+netdev, davem, edumazet, pabeni,
dtatulea, alazar, lkayal, yorayz, netdev, linux-rdma
From: William Tu <witu@nvidia.com>
[ Upstream commit e1d68ea58c7e9ebacd9ad7a99b25a3578fa62182 ]
For the ECPF and representors, reduce the max MPWRQ size from 256KB (18)
to 128KB (17). This prepares the later patch for saving representor
memory.
With Striding RQ, there is a minimum of 4 MPWQEs. So with 128KB of max
MPWRQ size, the minimal memory is 4 * 128KB = 512KB. When creating page
pool, consider 1500 mtu, the minimal page pool size will be 512KB/4KB =
128 pages = 256 rx ring entries (2 entries per page).
Before this patch, setting RX ringsize (ethtool -G rx) to 256 causes
driver to allocate page pool size more than it needs due to max MPWRQ
is 256KB (18). Ex: 4 * 256KB = 1MB, 1MB/4KB = 256 pages, but actually
128 pages is good enough. Reducing the max MPWRQ to 128KB fixes the
limitation.
Signed-off-by: William Tu <witu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20250209101716.112774-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 --
.../net/ethernet/mellanox/mlx5/core/en/params.c | 15 +++++++++++----
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 20a6bc1a234f4..9cf33ae48c216 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -93,8 +93,6 @@ struct page_pool;
#define MLX5_MPWRQ_DEF_LOG_STRIDE_SZ(mdev) \
MLX5_MPWRQ_LOG_STRIDE_SZ(mdev, order_base_2(MLX5E_RX_MAX_HEAD))
-#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
-
/* Keep in sync with mlx5e_mpwrq_log_wqe_sz.
* These are theoretical maximums, which can be further restricted by
* capabilities. These values are used for static resource allocations and
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 775010e94cb7c..dcd5db907f102 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -9,6 +9,9 @@
#include <net/page_pool/types.h>
#include <net/xdp_sock_drv.h>
+#define MLX5_MPWRQ_MAX_LOG_WQE_SZ 18
+#define MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ 17
+
static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
{
u8 min_page_shift = MLX5_CAP_GEN_2(mdev, log_min_mkey_entity_size);
@@ -102,18 +105,22 @@ u8 mlx5e_mpwrq_log_wqe_sz(struct mlx5_core_dev *mdev, u8 page_shift,
enum mlx5e_mpwrq_umr_mode umr_mode)
{
u8 umr_entry_size = mlx5e_mpwrq_umr_entry_size(umr_mode);
- u8 max_pages_per_wqe, max_log_mpwqe_size;
+ u8 max_pages_per_wqe, max_log_wqe_size_calc;
+ u8 max_log_wqe_size_cap;
u16 max_wqe_size;
/* Keep in sync with MLX5_MPWRQ_MAX_PAGES_PER_WQE. */
max_wqe_size = mlx5e_get_max_sq_aligned_wqebbs(mdev) * MLX5_SEND_WQE_BB;
max_pages_per_wqe = ALIGN_DOWN(max_wqe_size - sizeof(struct mlx5e_umr_wqe),
MLX5_UMR_FLEX_ALIGNMENT) / umr_entry_size;
- max_log_mpwqe_size = ilog2(max_pages_per_wqe) + page_shift;
+ max_log_wqe_size_calc = ilog2(max_pages_per_wqe) + page_shift;
+
+ WARN_ON_ONCE(max_log_wqe_size_calc < MLX5E_ORDER2_MAX_PACKET_MTU);
- WARN_ON_ONCE(max_log_mpwqe_size < MLX5E_ORDER2_MAX_PACKET_MTU);
+ max_log_wqe_size_cap = mlx5_core_is_ecpf(mdev) ?
+ MLX5_REP_MPWRQ_MAX_LOG_WQE_SZ : MLX5_MPWRQ_MAX_LOG_WQE_SZ;
- return min_t(u8, max_log_mpwqe_size, MLX5_MPWRQ_MAX_LOG_WQE_SZ);
+ return min_t(u8, max_log_wqe_size_calc, max_log_wqe_size_cap);
}
u8 mlx5e_mpwrq_pages_per_wqe(struct mlx5_core_dev *mdev, u8 page_shift,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 239/294] net: fec: Refactor MAC reset to function
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (31 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 235/294] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 242/294] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
` (10 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Csókás, Bence, Michal Swiatkowski, Jacob Keller,
Simon Horman, Paolo Abeni, Sasha Levin, wei.fang, andrew+netdev,
davem, edumazet, kuba, imx, netdev
From: Csókás, Bence <csokas.bence@prolan.hu>
[ Upstream commit 67800d296191d0a9bde0a7776f99ca1ddfa0fc26 ]
The core is reset both in `fec_restart()` (called on link-up) and
`fec_stop()` (going to sleep, driver remove etc.). These two functions
had their separate implementations, which was at first only a register
write and a `udelay()` (and the accompanying block comment). However,
since then we got soft-reset (MAC disable) and Wake-on-LAN support, which
meant that these implementations diverged, often causing bugs.
For instance, as of now, `fec_stop()` does not check for
`FEC_QUIRK_NO_HARD_RESET`, meaning the MII/RMII mode is cleared on eg.
a PM power-down event; and `fec_restart()` missed the refactor renaming
the "magic" constant `1` to `FEC_ECR_RESET`.
To harmonize current implementations, and eliminate this source of
potential future bugs, refactor implementation to a common function.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu>
Link: https://patch.msgid.link/20250207121255.161146-2-csokas.bence@prolan.hu
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/freescale/fec_main.c | 52 +++++++++++------------
1 file changed, 25 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 2d6b50903c923..419df6559492e 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1074,6 +1074,29 @@ static void fec_enet_enable_ring(struct net_device *ndev)
}
}
+/* Whack a reset. We should wait for this.
+ * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
+ * instead of reset MAC itself.
+ */
+static void fec_ctrl_reset(struct fec_enet_private *fep, bool allow_wol)
+{
+ u32 val;
+
+ if (!allow_wol || !(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
+ if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES ||
+ ((fep->quirks & FEC_QUIRK_NO_HARD_RESET) && fep->link)) {
+ writel(0, fep->hwp + FEC_ECNTRL);
+ } else {
+ writel(FEC_ECR_RESET, fep->hwp + FEC_ECNTRL);
+ udelay(10);
+ }
+ } else {
+ val = readl(fep->hwp + FEC_ECNTRL);
+ val |= (FEC_ECR_MAGICEN | FEC_ECR_SLEEP);
+ writel(val, fep->hwp + FEC_ECNTRL);
+ }
+}
+
/*
* This function is called to start or restart the FEC during a link
* change, transmit timeout, or to reconfigure the FEC. The network
@@ -1090,17 +1113,7 @@ fec_restart(struct net_device *ndev)
if (fep->bufdesc_ex)
fec_ptp_save_state(fep);
- /* Whack a reset. We should wait for this.
- * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
- * instead of reset MAC itself.
- */
- if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES ||
- ((fep->quirks & FEC_QUIRK_NO_HARD_RESET) && fep->link)) {
- writel(0, fep->hwp + FEC_ECNTRL);
- } else {
- writel(1, fep->hwp + FEC_ECNTRL);
- udelay(10);
- }
+ fec_ctrl_reset(fep, false);
/*
* enet-mac reset will reset mac address registers too,
@@ -1354,22 +1367,7 @@ fec_stop(struct net_device *ndev)
if (fep->bufdesc_ex)
fec_ptp_save_state(fep);
- /* Whack a reset. We should wait for this.
- * For i.MX6SX SOC, enet use AXI bus, we use disable MAC
- * instead of reset MAC itself.
- */
- if (!(fep->wol_flag & FEC_WOL_FLAG_SLEEP_ON)) {
- if (fep->quirks & FEC_QUIRK_HAS_MULTI_QUEUES) {
- writel(0, fep->hwp + FEC_ECNTRL);
- } else {
- writel(FEC_ECR_RESET, fep->hwp + FEC_ECNTRL);
- udelay(10);
- }
- } else {
- val = readl(fep->hwp + FEC_ECNTRL);
- val |= (FEC_ECR_MAGICEN | FEC_ECR_SLEEP);
- writel(val, fep->hwp + FEC_ECNTRL);
- }
+ fec_ctrl_reset(fep, true);
writel(fep->phy_speed, fep->hwp + FEC_MII_SPEED);
writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 242/294] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure().
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (32 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 239/294] net: fec: Refactor MAC reset to function Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 243/294] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
` (9 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kuniyuki Iwashima, Eric Dumazet, Ido Schimmel, Jakub Kicinski,
Sasha Levin, davem, dsahern, pabeni, netdev
From: Kuniyuki Iwashima <kuniyu@amazon.com>
[ Upstream commit 5a1ccffd30a08f5a2428cd5fbb3ab03e8eb6c66d ]
The following patch will not set skb->sk from VRF path.
Let's fetch net from fib_rule->fr_net instead of sock_net(skb->sk)
in fib[46]_rule_configure().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20250207072502.87775-5-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/fib_rules.c | 4 ++--
net/ipv6/fib6_rules.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 513f475c6a534..298a9944a3d1e 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -222,9 +222,9 @@ static int fib4_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
- struct net *net = sock_net(skb->sk);
+ struct fib4_rule *rule4 = (struct fib4_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct fib4_rule *rule4 = (struct fib4_rule *) rule;
if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack,
diff --git a/net/ipv6/fib6_rules.c b/net/ipv6/fib6_rules.c
index 6eeab21512ba9..e0f0c5f8cccda 100644
--- a/net/ipv6/fib6_rules.c
+++ b/net/ipv6/fib6_rules.c
@@ -350,9 +350,9 @@ static int fib6_rule_configure(struct fib_rule *rule, struct sk_buff *skb,
struct nlattr **tb,
struct netlink_ext_ack *extack)
{
+ struct fib6_rule *rule6 = (struct fib6_rule *)rule;
+ struct net *net = rule->fr_net;
int err = -EINVAL;
- struct net *net = sock_net(skb->sk);
- struct fib6_rule *rule6 = (struct fib6_rule *) rule;
if (!inet_validate_dscp(frh->tos)) {
NL_SET_ERR_MSG(extack,
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 243/294] r8152: add vendor/device ID pair for Dell Alienware AW1022z
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (33 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 242/294] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 255/294] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
` (8 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Aleksander Jan Bajkowski, Jakub Kicinski, Sasha Levin,
andrew+netdev, davem, edumazet, pabeni, gregkh, hayeswang, horms,
dianders, ste3ls, phahn-oss, linux-usb, netdev
From: Aleksander Jan Bajkowski <olek2@wp.pl>
[ Upstream commit 848b09d53d923b4caee5491f57a5c5b22d81febc ]
The Dell AW1022z is an RTL8156B based 2.5G Ethernet controller.
Add the vendor and product ID values to the driver. This makes Ethernet
work with the adapter.
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Link: https://patch.msgid.link/20250206224033.980115-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/usb/r8152.c | 1 +
include/linux/usb/r8152.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index bbcefcc7ef8f0..1e85cfe524e87 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -10032,6 +10032,7 @@ static const struct usb_device_id rtl8152_table[] = {
{ USB_DEVICE(VENDOR_ID_NVIDIA, 0x09ff) },
{ USB_DEVICE(VENDOR_ID_TPLINK, 0x0601) },
{ USB_DEVICE(VENDOR_ID_DLINK, 0xb301) },
+ { USB_DEVICE(VENDOR_ID_DELL, 0xb097) },
{ USB_DEVICE(VENDOR_ID_ASUS, 0x1976) },
{}
};
diff --git a/include/linux/usb/r8152.h b/include/linux/usb/r8152.h
index 33a4c146dc19c..2ca60828f28bb 100644
--- a/include/linux/usb/r8152.h
+++ b/include/linux/usb/r8152.h
@@ -30,6 +30,7 @@
#define VENDOR_ID_NVIDIA 0x0955
#define VENDOR_ID_TPLINK 0x2357
#define VENDOR_ID_DLINK 0x2001
+#define VENDOR_ID_DELL 0x413c
#define VENDOR_ID_ASUS 0x0b05
#if IS_REACHABLE(CONFIG_USB_RTL8152)
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 255/294] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (34 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 243/294] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 256/294] vxlan: Annotate FDB data races Sasha Levin
` (7 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Carolina Jubran, Yael Chemla, Cosmin Ratiu, Tariq Toukan,
Kalesh AP, Paolo Abeni, Sasha Levin, saeedm, andrew+netdev, davem,
edumazet, kuba, netdev, linux-rdma
From: Carolina Jubran <cjubran@nvidia.com>
[ Upstream commit 689805dcc474c2accb5cffbbcea1c06ee4a54570 ]
When attempting to enable MQPRIO while HTB offload is already
configured, the driver currently returns `-EINVAL` and triggers a
`WARN_ON`, leading to an unnecessary call trace.
Update the code to handle this case more gracefully by returning
`-EOPNOTSUPP` instead, while also providing a helpful user message.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Yael Chemla <ychemla@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8a892614015cd..8e2cd50899ea1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3627,8 +3627,11 @@ static int mlx5e_setup_tc_mqprio(struct mlx5e_priv *priv,
/* MQPRIO is another toplevel qdisc that can't be attached
* simultaneously with the offloaded HTB.
*/
- if (WARN_ON(mlx5e_selq_is_htb_enabled(&priv->selq)))
- return -EINVAL;
+ if (mlx5e_selq_is_htb_enabled(&priv->selq)) {
+ NL_SET_ERR_MSG_MOD(mqprio->extack,
+ "MQPRIO cannot be configured when HTB offload is enabled.");
+ return -EOPNOTSUPP;
+ }
switch (mqprio->mode) {
case TC_MQPRIO_MODE_DCB:
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 256/294] vxlan: Annotate FDB data races
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (35 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 255/294] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 257/294] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
` (6 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Ido Schimmel, Petr Machata, Eric Dumazet, Nikolay Aleksandrov,
Jakub Kicinski, Sasha Levin, andrew+netdev, davem, pabeni,
menglong8.dong, gnault, netdev
From: Ido Schimmel <idosch@nvidia.com>
[ Upstream commit f6205f8215f12a96518ac9469ff76294ae7bd612 ]
The 'used' and 'updated' fields in the FDB entry structure can be
accessed concurrently by multiple threads, leading to reports such as
[1]. Can be reproduced using [2].
Suppress these reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().
[1]
BUG: KCSAN: data-race in vxlan_xmit / vxlan_xmit
write to 0xffff942604d263a8 of 8 bytes by task 286 on cpu 0:
vxlan_xmit+0xb29/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffff942604d263a8 of 8 bytes by task 287 on cpu 2:
vxlan_xmit+0xadf/0x2380
dev_hard_start_xmit+0x84/0x2f0
__dev_queue_xmit+0x45a/0x1650
packet_xmit+0x100/0x150
packet_sendmsg+0x2114/0x2ac0
__sys_sendto+0x318/0x330
__x64_sys_sendto+0x76/0x90
x64_sys_call+0x14e8/0x1c00
do_syscall_64+0x9e/0x1a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0x00000000fffbac6e -> 0x00000000fffbac6f
Reported by Kernel Concurrency Sanitizer on:
CPU: 2 UID: 0 PID: 287 Comm: mausezahn Not tainted 6.13.0-rc7-01544-gb4b270f11a02 #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[2]
#!/bin/bash
set +H
echo whitelist > /sys/kernel/debug/kcsan
echo !vxlan_xmit > /sys/kernel/debug/kcsan
ip link add name vx0 up type vxlan id 10010 dstport 4789 local 192.0.2.1
bridge fdb add 00:11:22:33:44:55 dev vx0 self static dst 198.51.100.1
taskset -c 0 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
taskset -c 2 mausezahn vx0 -a own -b 00:11:22:33:44:55 -c 0 -q &
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250204145549.1216254-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/vxlan/vxlan_core.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 822cf49d82676..2ed879a0abc6c 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -227,9 +227,9 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
be32_to_cpu(fdb->vni)))
goto nla_put_failure;
- ci.ndm_used = jiffies_to_clock_t(now - fdb->used);
+ ci.ndm_used = jiffies_to_clock_t(now - READ_ONCE(fdb->used));
ci.ndm_confirmed = 0;
- ci.ndm_updated = jiffies_to_clock_t(now - fdb->updated);
+ ci.ndm_updated = jiffies_to_clock_t(now - READ_ONCE(fdb->updated));
ci.ndm_refcnt = 0;
if (nla_put(skb, NDA_CACHEINFO, sizeof(ci), &ci))
@@ -435,8 +435,8 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
f = __vxlan_find_mac(vxlan, mac, vni);
- if (f && f->used != jiffies)
- f->used = jiffies;
+ if (f && READ_ONCE(f->used) != jiffies)
+ WRITE_ONCE(f->used, jiffies);
return f;
}
@@ -1010,12 +1010,12 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
!(f->flags & NTF_VXLAN_ADDED_BY_USER)) {
if (f->state != state) {
f->state = state;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
if (f->flags != fdb_flags) {
f->flags = fdb_flags;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
notify = 1;
}
}
@@ -1049,7 +1049,7 @@ static int vxlan_fdb_update_existing(struct vxlan_dev *vxlan,
}
if (ndm_flags & NTF_USE)
- f->used = jiffies;
+ WRITE_ONCE(f->used, jiffies);
if (notify) {
if (rd == NULL)
@@ -1478,7 +1478,7 @@ static bool vxlan_snoop(struct net_device *dev,
src_mac, &rdst->remote_ip.sa, &src_ip->sa);
rdst->remote_ip = *src_ip;
- f->updated = jiffies;
+ WRITE_ONCE(f->updated, jiffies);
vxlan_fdb_notify(vxlan, f, rdst, RTM_NEWNEIGH, true, NULL);
} else {
u32 hash_index = fdb_head_index(vxlan, src_mac, vni);
@@ -2920,7 +2920,7 @@ static void vxlan_cleanup(struct timer_list *t)
if (f->flags & NTF_EXT_LEARNED)
continue;
- timeout = f->used + vxlan->cfg.age_interval * HZ;
+ timeout = READ_ONCE(f->used) + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
"garbage collect %pM\n",
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 257/294] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (36 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 256/294] vxlan: Annotate FDB data races Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 258/294] r8169: don't scan PHY addresses > 0 Sasha Levin
` (5 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Geert Uytterhoeven, kernel test robot, Simon Horman,
Jakub Kicinski, Sasha Levin, davem, dsahern, edumazet, pabeni,
netdev
From: Geert Uytterhoeven <geert@linux-m68k.org>
[ Upstream commit 50f37fc2a39c4a8cc4813629b4cf239b71c6097d ]
if CONFIG_NET_IPGRE is enabled, but CONFIG_IPV6 is disabled:
net/ipv4/ip_gre.c: In function ‘ipgre_err’:
net/ipv4/ip_gre.c:144:22: error: variable ‘data_len’ set but not used [-Werror=unused-but-set-variable]
144 | unsigned int data_len = 0;
| ^~~~~~~~
Fix this by moving all data_len processing inside the IPV6-only section
that uses its result.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202501121007.2GofXmh5-lkp@intel.com/
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/d09113cfe2bfaca02f3dddf832fb5f48dd20958b.1738704881.git.geert@linux-m68k.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv4/ip_gre.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 890c15510b421..f261e29adc7c2 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -140,7 +140,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
const struct iphdr *iph;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
- unsigned int data_len = 0;
struct ip_tunnel *t;
if (tpi->proto == htons(ETH_P_TEB))
@@ -181,7 +180,6 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
case ICMP_TIME_EXCEEDED:
if (code != ICMP_EXC_TTL)
return 0;
- data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
break;
case ICMP_REDIRECT:
@@ -189,10 +187,16 @@ static int ipgre_err(struct sk_buff *skb, u32 info,
}
#if IS_ENABLED(CONFIG_IPV6)
- if (tpi->proto == htons(ETH_P_IPV6) &&
- !ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
- type, data_len))
- return 0;
+ if (tpi->proto == htons(ETH_P_IPV6)) {
+ unsigned int data_len = 0;
+
+ if (type == ICMP_TIME_EXCEEDED)
+ data_len = icmp_hdr(skb)->un.reserved[1] * 4; /* RFC 4884 4.1 */
+
+ if (!ip6_err_gen_icmpv6_unreach(skb, iph->ihl * 4 + tpi->hdr_len,
+ type, data_len))
+ return 0;
+ }
#endif
if (t->parms.iph.daddr == 0 ||
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 258/294] r8169: don't scan PHY addresses > 0
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (37 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 257/294] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 259/294] bridge: mdb: Allow replace of a host-joined group Sasha Levin
` (4 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Heiner Kallweit, Andrew Lunn, Jakub Kicinski, Sasha Levin,
nic_swsd, andrew+netdev, davem, edumazet, pabeni, netdev
From: Heiner Kallweit <hkallweit1@gmail.com>
[ Upstream commit faac69a4ae5abb49e62c79c66b51bb905c9aa5ec ]
The PHY address is a dummy, because r8169 PHY access registers
don't support a PHY address. Therefore scan address 0 only.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/830637dd-4016-4a68-92b3-618fcac6589d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/realtek/r8169_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 7e5258b2c4290..5af932a5e70c4 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -5125,6 +5125,7 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
new_bus->priv = tp;
new_bus->parent = &pdev->dev;
new_bus->irq[0] = PHY_MAC_INTERRUPT;
+ new_bus->phy_mask = GENMASK(31, 1);
snprintf(new_bus->id, MII_BUS_ID_SIZE, "r8169-%x-%x",
pci_domain_nr(pdev->bus), pci_dev_id(pdev));
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 259/294] bridge: mdb: Allow replace of a host-joined group
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (38 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 258/294] r8169: don't scan PHY addresses > 0 Sasha Levin
@ 2025-05-05 22:55 ` Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 260/294] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
` (3 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:55 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Petr Machata, Ido Schimmel, Nikolay Aleksandrov, Jakub Kicinski,
Sasha Levin, davem, edumazet, pabeni, shuah, bridge, netdev,
linux-kselftest
From: Petr Machata <petrm@nvidia.com>
[ Upstream commit d9e9f6d7b7d0c520bb87f19d2cbc57aeeb2091d5 ]
Attempts to replace an MDB group membership of the host itself are
currently bounced:
# ip link add name br up type bridge vlan_filtering 1
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
Error: bridge: Group is already joined by host.
A similar operation done on a member port would succeed. Ignore the check
for replacement of host group memberships as well.
The bit of code that this enables is br_multicast_host_join(), which, for
already-joined groups only refreshes the MC group expiration timer, which
is desirable; and a userspace notification, also desirable.
Change a selftest that exercises this code path from expecting a rejection
to expecting a pass. The rest of MDB selftests pass without modification.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.1738685648.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/bridge/br_mdb.c | 2 +-
tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 7305f5f8215ca..96bea0c8408fe 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1030,7 +1030,7 @@ static int br_mdb_add_group(const struct br_mdb_config *cfg,
/* host join */
if (!port) {
- if (mp->host_joined) {
+ if (mp->host_joined && !(cfg->nlflags & NLM_F_REPLACE)) {
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by host");
return -EEXIST;
}
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index a3678dfe5848a..c151374ddf040 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -149,7 +149,7 @@ cfg_test_host_common()
check_err $? "Failed to add $name host entry"
bridge mdb replace dev br0 port br0 grp $grp $state vid 10 &> /dev/null
- check_fail $? "Managed to replace $name host entry"
+ check_err $? "Failed to replace $name host entry"
bridge mdb del dev br0 port br0 grp $grp $state vid 10
bridge mdb get dev br0 grp $grp vid 10 &> /dev/null
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 260/294] net-sysfs: prevent uncleared queues from being re-added
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (39 preceding siblings ...)
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 259/294] bridge: mdb: Allow replace of a host-joined group Sasha Levin
@ 2025-05-05 22:56 ` Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 261/294] ice: treat dyn_allowed only as suggestion Sasha Levin
` (2 subsequent siblings)
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Antoine Tenart, Jakub Kicinski, Sasha Levin, davem, edumazet,
pabeni, sdf, jdamato, netdev
From: Antoine Tenart <atenart@kernel.org>
[ Upstream commit 7e54f85c60828842be27e0149f3533357225090e ]
With the (upcoming) removal of the rtnl_trylock/restart_syscall logic
and because of how Tx/Rx queues are implemented (and their
requirements), it might happen that a queue is re-added before having
the chance to be cleared. In such rare case, do not complete the queue
addition operation.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Link: https://patch.msgid.link/20250204170314.146022-4-atenart@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/net-sysfs.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index f7404bc679746..d88682ae0e126 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1077,6 +1077,22 @@ static int rx_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Rx queues are cleared in rx_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add rx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger rx_queue_release call which
* decreases dev refcount: Take that reference here
*/
@@ -1684,6 +1700,22 @@ static int netdev_queue_add_kobject(struct net_device *dev, int index)
struct kobject *kobj = &queue->kobj;
int error = 0;
+ /* Tx queues are cleared in netdev_queue_release to allow later
+ * re-registration. This is triggered when their kobj refcount is
+ * dropped.
+ *
+ * If a queue is removed while both a read (or write) operation and a
+ * the re-addition of the same queue are pending (waiting on rntl_lock)
+ * it might happen that the re-addition will execute before the read,
+ * making the initial removal to never happen (queue's kobj refcount
+ * won't drop enough because of the pending read). In such rare case,
+ * return to allow the removal operation to complete.
+ */
+ if (unlikely(kobj->state_initialized)) {
+ netdev_warn_once(dev, "Cannot re-add tx queues before their removal completed");
+ return -EAGAIN;
+ }
+
/* Kobject_put later will trigger netdev_queue_release call
* which decreases dev refcount: Take that reference here
*/
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 261/294] ice: treat dyn_allowed only as suggestion
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (40 preceding siblings ...)
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 260/294] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
@ 2025-05-05 22:56 ` Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 266/294] ice: count combined queues using Rx/Tx count Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 267/294] net/mana: fix warning in the writer of client oob Sasha Levin
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Michal Swiatkowski, Jacob Keller, Wojciech Drewek,
Pucha Himasekhar Reddy, Tony Nguyen, Sasha Levin,
przemyslaw.kitszel, andrew+netdev, davem, edumazet, kuba, pabeni,
intel-wired-lan, netdev
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
[ Upstream commit a8c2d3932c1106af2764cc6869b29bcf3cb5bc47 ]
It can be needed to have some MSI-X allocated as static and rest as
dynamic. For example on PF VSI. We want to always have minimum one MSI-X
on it, because of that it is allocated as a static one, rest can be
dynamic if it is supported.
Change the ice_get_irq_res() to allow using static entries if they are
free even if caller wants dynamic one.
Adjust limit values to the new approach. Min and max in limit means the
values that are valid, so decrease max and num_static by one.
Set vsi::irq_dyn_alloc if dynamic allocation is supported.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_irq.c | 25 ++++++++++++------------
drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index ad82ff7d19957..09f9c7ba52795 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -45,7 +45,7 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
/**
* ice_get_irq_res - get an interrupt resource
* @pf: board private structure
- * @dyn_only: force entry to be dynamically allocated
+ * @dyn_allowed: allow entry to be dynamically allocated
*
* Allocate new irq entry in the free slot of the tracker. Since xarray
* is used, always allocate new entry at the lowest possible index. Set
@@ -53,11 +53,12 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
*
* Returns allocated irq entry or NULL on failure.
*/
-static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
+static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf,
+ bool dyn_allowed)
{
- struct xa_limit limit = { .max = pf->irq_tracker.num_entries,
+ struct xa_limit limit = { .max = pf->irq_tracker.num_entries - 1,
.min = 0 };
- unsigned int num_static = pf->irq_tracker.num_static;
+ unsigned int num_static = pf->irq_tracker.num_static - 1;
struct ice_irq_entry *entry;
unsigned int index;
int ret;
@@ -66,9 +67,9 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
if (!entry)
return NULL;
- /* skip preallocated entries if the caller says so */
- if (dyn_only)
- limit.min = num_static;
+ /* only already allocated if the caller says so */
+ if (!dyn_allowed)
+ limit.max = num_static;
ret = xa_alloc(&pf->irq_tracker.entries, &index, entry, limit,
GFP_KERNEL);
@@ -78,7 +79,7 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
entry = NULL;
} else {
entry->index = index;
- entry->dynamic = index >= num_static;
+ entry->dynamic = index > num_static;
}
return entry;
@@ -272,7 +273,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
/**
* ice_alloc_irq - Allocate new interrupt vector
* @pf: board private structure
- * @dyn_only: force dynamic allocation of the interrupt
+ * @dyn_allowed: allow dynamic allocation of the interrupt
*
* Allocate new interrupt vector for a given owner id.
* return struct msi_map with interrupt details and track
@@ -285,20 +286,20 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
* interrupt will be allocated with pci_msix_alloc_irq_at.
*
* Some callers may only support dynamically allocated interrupts.
- * This is indicated with dyn_only flag.
+ * This is indicated with dyn_allowed flag.
*
* On failure, return map with negative .index. The caller
* is expected to check returned map index.
*
*/
-struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only)
+struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
{
int sriov_base_vector = pf->sriov_base_vector;
struct msi_map map = { .index = -ENOENT };
struct device *dev = ice_pf_to_dev(pf);
struct ice_irq_entry *entry;
- entry = ice_get_irq_res(pf, dyn_only);
+ entry = ice_get_irq_res(pf, dyn_allowed);
if (!entry)
return map;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 1fc4805353eb5..a6a290514e548 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -587,6 +587,8 @@ ice_vsi_alloc_def(struct ice_vsi *vsi, struct ice_channel *ch)
return -ENOMEM;
}
+ vsi->irq_dyn_alloc = pci_msix_can_alloc_dyn(vsi->back->pdev);
+
switch (vsi->type) {
case ICE_VSI_SWITCHDEV_CTRL:
/* Setup eswitch MSIX irq handler for VSI */
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 266/294] ice: count combined queues using Rx/Tx count
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (41 preceding siblings ...)
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 261/294] ice: treat dyn_allowed only as suggestion Sasha Levin
@ 2025-05-05 22:56 ` Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 267/294] net/mana: fix warning in the writer of client oob Sasha Levin
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Michal Swiatkowski, Jacob Keller, Pucha Himasekhar Reddy,
Tony Nguyen, Sasha Levin, przemyslaw.kitszel, andrew+netdev,
davem, edumazet, kuba, pabeni, intel-wired-lan, netdev
From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
[ Upstream commit c3a392bdd31adc474f1009ee85c13fdd01fe800d ]
Previous implementation assumes that there is 1:1 matching between
vectors and queues. It isn't always true.
Get minimum value from Rx/Tx queues to determine combined queues number.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 39b5f24be7e4f..dd58b2372dc0c 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3329,8 +3329,7 @@ static u32 ice_get_combined_cnt(struct ice_vsi *vsi)
ice_for_each_q_vector(vsi, q_idx) {
struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
- if (q_vector->rx.rx_ring && q_vector->tx.tx_ring)
- combined++;
+ combined += min(q_vector->num_ring_tx, q_vector->num_ring_rx);
}
return combined;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH AUTOSEL 6.6 267/294] net/mana: fix warning in the writer of client oob
[not found] <20250505225634.2688578-1-sashal@kernel.org>
` (42 preceding siblings ...)
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 266/294] ice: count combined queues using Rx/Tx count Sasha Levin
@ 2025-05-05 22:56 ` Sasha Levin
43 siblings, 0 replies; 44+ messages in thread
From: Sasha Levin @ 2025-05-05 22:56 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Konstantin Taranov, Shiraz Saleem, Long Li, Leon Romanovsky,
Sasha Levin, kys, haiyangz, wei.liu, decui, andrew+netdev, davem,
edumazet, kuba, pabeni, shradhagupta, mlevitsk, peterz, ernis,
linux-hyperv, netdev
From: Konstantin Taranov <kotaranov@microsoft.com>
[ Upstream commit 5ec7e1c86c441c46a374577bccd9488abea30037 ]
Do not warn on missing pad_data when oob is in sgl.
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1737394039-28772-9-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/microsoft/mana/gdma_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index ae014e21eb605..9ed965d61e355 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1036,7 +1036,7 @@ static u32 mana_gd_write_client_oob(const struct gdma_wqe_request *wqe_req,
header->inline_oob_size_div4 = client_oob_size / sizeof(u32);
if (oob_in_sgl) {
- WARN_ON_ONCE(!pad_data || wqe_req->num_sge < 2);
+ WARN_ON_ONCE(wqe_req->num_sge < 2);
header->client_oob_in_sgl = 1;
--
2.39.5
^ permalink raw reply related [flat|nested] 44+ messages in thread
end of thread, other threads:[~2025-05-05 23:05 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250505225634.2688578-1-sashal@kernel.org>
2025-05-05 22:51 ` [PATCH AUTOSEL 6.6 010/294] SUNRPC: Don't allow waiting for exiting tasks Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 023/294] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 024/294] SUNRPC: rpcbind should never reset the port to the value '0' Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 063/294] tcp: reorganize tcp_in_ack_event() and tcp_count_delivered() Sasha Levin
2025-05-05 22:52 ` [PATCH AUTOSEL 6.6 072/294] net/smc: use the correct ndev to find pnetid by pnetid table Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 084/294] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 092/294] ipv6: save dontfrag in cork Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 108/294] tcp: bring back NUMA dispersion in inet_ehash_locks_alloc() Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 110/294] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 116/294] net: phylink: use pl->link_interface in phylink_expects_phy() Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 122/294] net: ethernet: ti: cpsw_new: populate netdev of_node Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 123/294] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
2025-05-05 22:53 ` [PATCH AUTOSEL 6.6 126/294] ipv4: fib: Move fib_valid_key_len() to rtm_to_fib_config() Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 144/294] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 145/294] selftests/net: have `gro.sh -t` return a correct exit code Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 148/294] net: ethernet: mtk_ppe_offload: Allow QinQ, double ETH_P_8021Q only Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 149/294] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 150/294] bonding: report duplicate MAC address in all situations Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 153/294] Octeontx2-af: RPM: Register driver with PCI subsys IDs Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 160/294] vhost-scsi: Return queue full for page alloc failures during copy Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 168/294] net/mlx5: Change POOL_NEXT_SIZE define value and make it global Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 181/294] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 183/294] bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 191/294] eth: mlx4: don't try to complete XDP frames in netpoll Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 194/294] vxlan: Join / leave MC group after remote changes Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 196/294] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
2025-05-05 22:54 ` [PATCH AUTOSEL 6.6 197/294] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 214/294] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 232/294] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 233/294] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 234/294] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 235/294] net/mlx5e: reduce the max log mpwrq sz for ECPF and reps Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 239/294] net: fec: Refactor MAC reset to function Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 242/294] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 243/294] r8152: add vendor/device ID pair for Dell Alienware AW1022z Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 255/294] net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 256/294] vxlan: Annotate FDB data races Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 257/294] ipv4: ip_gre: Fix set but not used warning in ipgre_err() if IPv4-only Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 258/294] r8169: don't scan PHY addresses > 0 Sasha Levin
2025-05-05 22:55 ` [PATCH AUTOSEL 6.6 259/294] bridge: mdb: Allow replace of a host-joined group Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 260/294] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 261/294] ice: treat dyn_allowed only as suggestion Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 266/294] ice: count combined queues using Rx/Tx count Sasha Levin
2025-05-05 22:56 ` [PATCH AUTOSEL 6.6 267/294] net/mana: fix warning in the writer of client oob Sasha Levin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).