* [PATCH 6.6 00/60] 6.6.22-rc1 review
@ 2024-03-13 16:36 Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts Sasha Levin
` (66 more replies)
0 siblings, 67 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, torvalds, akpm, linux, shuah, patches, lkft-triage,
pavel
This is the start of the stable review cycle for the 6.6.22 release.
There are 60 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Byungchul Park (1):
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Christian Borntraeger (1):
KVM: s390: vsie: fix race during shadow creation
Daniel Borkmann (2):
xdp, bonding: Fix feature flags when there are no slave devs anymore
selftests/bpf: Fix up xdp bonding test wrt feature flags
Eduard Zingerman (1):
bpf: check bpf_func_state->callback_depth when pruning states
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Emeel Hakim (1):
net/mlx5e: Fix MACsec state loss upon state update in offload path
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Kauer (1):
igc: avoid returning frame twice in XDP_REDIRECT
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Frank Li (3):
dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in
dts
dmaengine: fsl-edma: utilize common dt-binding header file
dmaengine: fsl-edma: correct max_segment_size setting
Gao Xiang (1):
erofs: apply proper VMA alignment for memory mapped files on THP
Gavin Li (1):
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
Horatiu Vultur (1):
net: sparx5: Fix use after free inside sparx5_del_mact_entry
Jacob Keller (1):
ice: virtchnl: stop pretending to support RSS over AQ or registers
Jan Kara (1):
readahead: avoid multiple marked readahead pages
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around
sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around
sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Jianbo Liu (2):
net/mlx5: E-switch, Change flow rule destination checking
net/mlx5e: Change the warning when ignore_flow_level is not supported
Kefeng Wang (3):
mm: migrate: remove PageTransHuge check in numamigrate_isolate_page()
mm: migrate: remove THP mapcount check in numamigrate_isolate_page()
mm: migrate: convert numamigrate_isolate_page() to
numamigrate_isolate_folio()
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of
range
Leon Romanovsky (1):
xfrm: Pass UDP encapsulation in TX packet offload
Maciej Fijalkowski (3):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ice: reorder disabling IRQ and NAPI in ice_qp_dis
Matthieu Baerts (NGI0) (1):
selftests: mptcp: decrease BW in simult flows
Moshe Shemesh (1):
net/mlx5: Check capability for fw_reset
Nico Boehr (1):
KVM: s390: add stat counter for shadow gmap events
Oleg Nesterov (1):
exit: wait_task_zombie: kill the no longer necessary
spin_lock_irq(siglock)
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Rahul Rameshbabu (2):
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission
tracking occurs after populating the metadata_map
net/mlx5e: Switch to using _bh variant of of spinlock API in port
timestamping NAPI poll context
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in
ice_bridge_setlink()
Saeed Mahameed (1):
Revert "net/mlx5e: Check the number of elements before walk TC
rhashtable"
Sasha Levin (1):
Linux 6.6.22-rc1
Steven Rostedt (Google) (1):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
Tobias Jakobi (Compleo) (1):
net: dsa: microchip: fix register write order in ksz8_ind_write8()
Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
Xiubo Li (1):
ceph: switch to corrected encoding of max_xattr_size in mdsmap
Yongzhi Liu (1):
net: pds_core: Fix possible double free in error handling path
.../ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 21 ++++
Makefile | 4 +-
arch/s390/include/asm/kvm_host.h | 7 ++
arch/s390/kvm/gaccess.c | 7 ++
arch/s390/kvm/kvm-s390.c | 9 +-
arch/s390/kvm/vsie.c | 6 +-
arch/s390/mm/gmap.c | 1 +
arch/x86/Kconfig | 11 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
drivers/dma/fsl-edma-common.h | 5 +-
drivers/dma/fsl-edma-main.c | 21 ++--
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/dsa/microchip/ksz8795.c | 4 +-
drivers/net/ethernet/amd/pds_core/auxbus.c | 12 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +-
.../intel/ice/ice_virtchnl_allowlist.c | 2 -
drivers/net/ethernet/intel/ice/ice_xsk.c | 9 +-
drivers/net/ethernet/intel/igc/igc_main.c | 13 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++--
.../net/ethernet/mellanox/mlx5/core/devlink.c | 6 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 12 +-
.../mellanox/mlx5/core/en/tc/post_act.c | 2 +-
.../mellanox/mlx5/core/en_accel/macsec.c | 82 ++++++++------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +
.../mellanox/mlx5/core/esw/ipsec_fs.c | 2 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 46 +++-----
.../ethernet/mellanox/mlx5/core/fw_reset.c | 22 +++-
.../microchip/sparx5/sparx5_mactable.c | 4 +-
drivers/net/geneve.c | 18 ++-
drivers/net/usb/lan78xx.c | 3 +-
fs/ceph/mdsmap.c | 7 +-
fs/erofs/data.c | 1 +
include/dt-bindings/dma/fsl-edma.h | 21 ++++
include/linux/ceph/mdsmap.h | 6 +-
include/linux/cpu.h | 2 +
include/linux/mlx5/mlx5_ifc.h | 4 +-
include/trace/events/qdisc.h | 20 ++--
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/verifier.c | 3 +
kernel/exit.c | 10 +-
mm/migrate.c | 34 +++---
mm/readahead.c | 4 +-
net/ipv6/route.c | 21 ++--
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +--
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
net/xfrm/xfrm_device.c | 2 +-
.../selftests/bpf/prog_tests/xdp_bonding.c | 4 +-
.../selftests/net/mptcp/simult_flows.sh | 8 +-
66 files changed, 628 insertions(+), 237 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
create mode 100644 include/dt-bindings/dma/fsl-edma.h
--
2.43.0
^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 02/60] dmaengine: fsl-edma: utilize common dt-binding header file Sasha Levin
` (65 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Frank Li, Rob Herring, Vinod Koul, Sasha Levin
From: Frank Li <Frank.Li@nxp.com>
[ Upstream commit 1e9b05258271b76ccc04a4b535009d2cb596506a ]
Introduce a common dt-bindings header file, fsl-edma.h, shared between
the driver and dts files. This addition aims to eliminate hardcoded values
in dts files, promoting maintainability and consistency.
DTS header file not support BIT() macro yet. Directly use 2^n number.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20231114154824.3617255-3-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Stable-dep-of: a79f949a5ce1 ("dmaengine: fsl-edma: correct max_segment_size setting")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/dt-bindings/dma/fsl-edma.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
create mode 100644 include/dt-bindings/dma/fsl-edma.h
diff --git a/include/dt-bindings/dma/fsl-edma.h b/include/dt-bindings/dma/fsl-edma.h
new file mode 100644
index 0000000000000..fd11478cfe9cc
--- /dev/null
+++ b/include/dt-bindings/dma/fsl-edma.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
+
+#ifndef _FSL_EDMA_DT_BINDING_H_
+#define _FSL_EDMA_DT_BINDING_H_
+
+/* Receive Channel */
+#define FSL_EDMA_RX 0x1
+
+/* iMX8 audio remote DMA */
+#define FSL_EDMA_REMOTE 0x2
+
+/* FIFO is continue memory region */
+#define FSL_EDMA_MULTI_FIFO 0x4
+
+/* Channel need stick to even channel */
+#define FSL_EDMA_EVEN_CH 0x8
+
+/* Channel need stick to odd channel */
+#define FSL_EDMA_ODD_CH 0x10
+
+#endif
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 02/60] dmaengine: fsl-edma: utilize common dt-binding header file
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 03/60] dmaengine: fsl-edma: correct max_segment_size setting Sasha Levin
` (64 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Frank Li, Vinod Koul, Sasha Levin
From: Frank Li <Frank.Li@nxp.com>
[ Upstream commit d0e217b72f9f5c5ef35e3423d393ea8093ce98ec ]
Refactor the code to use the common dt-binding header file, fsl-edma.h.
Renaming ARGS* to FSL_EDMA*, ensuring no functional changes.
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20231114154824.3617255-4-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Stable-dep-of: a79f949a5ce1 ("dmaengine: fsl-edma: correct max_segment_size setting")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/dma/fsl-edma-main.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
index 30df55da4dbb9..a56c8a0f2663f 100644
--- a/drivers/dma/fsl-edma-main.c
+++ b/drivers/dma/fsl-edma-main.c
@@ -9,6 +9,7 @@
* Vybrid and Layerscape SoCs.
*/
+#include <dt-bindings/dma/fsl-edma.h>
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/clk.h>
@@ -23,12 +24,6 @@
#include "fsl-edma-common.h"
-#define ARGS_RX BIT(0)
-#define ARGS_REMOTE BIT(1)
-#define ARGS_MULTI_FIFO BIT(2)
-#define ARGS_EVEN_CH BIT(3)
-#define ARGS_ODD_CH BIT(4)
-
static void fsl_edma_synchronize(struct dma_chan *chan)
{
struct fsl_edma_chan *fsl_chan = to_fsl_edma_chan(chan);
@@ -157,14 +152,14 @@ static struct dma_chan *fsl_edma3_xlate(struct of_phandle_args *dma_spec,
i = fsl_chan - fsl_edma->chans;
fsl_chan->priority = dma_spec->args[1];
- fsl_chan->is_rxchan = dma_spec->args[2] & ARGS_RX;
- fsl_chan->is_remote = dma_spec->args[2] & ARGS_REMOTE;
- fsl_chan->is_multi_fifo = dma_spec->args[2] & ARGS_MULTI_FIFO;
+ fsl_chan->is_rxchan = dma_spec->args[2] & FSL_EDMA_RX;
+ fsl_chan->is_remote = dma_spec->args[2] & FSL_EDMA_REMOTE;
+ fsl_chan->is_multi_fifo = dma_spec->args[2] & FSL_EDMA_MULTI_FIFO;
- if ((dma_spec->args[2] & ARGS_EVEN_CH) && (i & 0x1))
+ if ((dma_spec->args[2] & FSL_EDMA_EVEN_CH) && (i & 0x1))
continue;
- if ((dma_spec->args[2] & ARGS_ODD_CH) && !(i & 0x1))
+ if ((dma_spec->args[2] & FSL_EDMA_ODD_CH) && !(i & 0x1))
continue;
if (!b_chmux && i == dma_spec->args[0]) {
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 03/60] dmaengine: fsl-edma: correct max_segment_size setting
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 02/60] dmaengine: fsl-edma: utilize common dt-binding header file Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 04/60] ceph: switch to corrected encoding of max_xattr_size in mdsmap Sasha Levin
` (63 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Frank Li, Vinod Koul, Sasha Levin
From: Frank Li <Frank.Li@nxp.com>
[ Upstream commit a79f949a5ce1d45329d63742c2a995f2b47f9852 ]
Correcting the previous setting of 0x3fff to the actual value of 0x7fff.
Introduced new macro 'EDMA_TCD_ITER_MASK' for improved code clarity and
utilization of FIELD_GET to obtain the accurate maximum value.
Cc: stable@vger.kernel.org
Fixes: e06748539432 ("dmaengine: fsl-edma: support edma memcpy")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20240207194733.2112870-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/dma/fsl-edma-common.h | 5 +++--
drivers/dma/fsl-edma-main.c | 4 +++-
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h
index 40d50cc3d75a3..92fe53faa53b1 100644
--- a/drivers/dma/fsl-edma-common.h
+++ b/drivers/dma/fsl-edma-common.h
@@ -30,8 +30,9 @@
#define EDMA_TCD_ATTR_SSIZE(x) (((x) & GENMASK(2, 0)) << 8)
#define EDMA_TCD_ATTR_SMOD(x) (((x) & GENMASK(4, 0)) << 11)
-#define EDMA_TCD_CITER_CITER(x) ((x) & GENMASK(14, 0))
-#define EDMA_TCD_BITER_BITER(x) ((x) & GENMASK(14, 0))
+#define EDMA_TCD_ITER_MASK GENMASK(14, 0)
+#define EDMA_TCD_CITER_CITER(x) ((x) & EDMA_TCD_ITER_MASK)
+#define EDMA_TCD_BITER_BITER(x) ((x) & EDMA_TCD_ITER_MASK)
#define EDMA_TCD_CSR_START BIT(0)
#define EDMA_TCD_CSR_INT_MAJOR BIT(1)
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
index a56c8a0f2663f..42a338cbe6143 100644
--- a/drivers/dma/fsl-edma-main.c
+++ b/drivers/dma/fsl-edma-main.c
@@ -10,6 +10,7 @@
*/
#include <dt-bindings/dma/fsl-edma.h>
+#include <linux/bitfield.h>
#include <linux/module.h>
#include <linux/interrupt.h>
#include <linux/clk.h>
@@ -589,7 +590,8 @@ static int fsl_edma_probe(struct platform_device *pdev)
DMAENGINE_ALIGN_32_BYTES;
/* Per worst case 'nbytes = 1' take CITER as the max_seg_size */
- dma_set_max_seg_size(fsl_edma->dma_dev.dev, 0x3fff);
+ dma_set_max_seg_size(fsl_edma->dma_dev.dev,
+ FIELD_GET(EDMA_TCD_ITER_MASK, EDMA_TCD_ITER_MASK));
fsl_edma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 04/60] ceph: switch to corrected encoding of max_xattr_size in mdsmap
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (2 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 03/60] dmaengine: fsl-edma: correct max_segment_size setting Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page() Sasha Levin
` (62 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Xiubo Li, Patrick Donnelly, Venky Shankar, Ilya Dryomov,
Sasha Levin
From: Xiubo Li <xiubli@redhat.com>
[ Upstream commit 51d31149a88b5c5a8d2d33f06df93f6187a25b4c ]
The addition of bal_rank_mask with encoding version 17 was merged
into ceph.git in Oct 2022 and made it into v18.2.0 release normally.
A few months later, the much delayed addition of max_xattr_size got
merged, also with encoding version 17, placed before bal_rank_mask
in the encoding -- but it didn't make v18.2.0 release.
The way this ended up being resolved on the MDS side is that
bal_rank_mask will continue to be encoded in version 17 while
max_xattr_size is now encoded in version 18. This does mean that
older kernels will misdecode version 17, but this is also true for
v18.2.0 and v18.2.1 clients in userspace.
The best we can do is backport this adjustment -- see ceph.git
commit 78abfeaff27fee343fb664db633de5b221699a73 for details.
[ idryomov: changelog ]
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/64440
Fixes: d93231a6bc8a ("ceph: prevent a client from exceeding the MDS maximum xattr size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ceph/mdsmap.c | 7 ++++---
include/linux/ceph/mdsmap.h | 6 +++++-
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
index 7dac21ee6ce76..3bb3b610d403e 100644
--- a/fs/ceph/mdsmap.c
+++ b/fs/ceph/mdsmap.c
@@ -379,10 +379,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end, bool msgr2)
ceph_decode_skip_8(p, end, bad_ext);
/* required_client_features */
ceph_decode_skip_set(p, end, 64, bad_ext);
+ /* bal_rank_mask */
+ ceph_decode_skip_string(p, end, bad_ext);
+ }
+ if (mdsmap_ev >= 18) {
ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext);
- } else {
- /* This forces the usage of the (sync) SETXATTR Op */
- m->m_max_xattr_size = 0;
}
bad_ext:
dout("mdsmap_decode m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n",
diff --git a/include/linux/ceph/mdsmap.h b/include/linux/ceph/mdsmap.h
index 4c3e0648dc277..fcc95bff72a57 100644
--- a/include/linux/ceph/mdsmap.h
+++ b/include/linux/ceph/mdsmap.h
@@ -25,7 +25,11 @@ struct ceph_mdsmap {
u32 m_session_timeout; /* seconds */
u32 m_session_autoclose; /* seconds */
u64 m_max_file_size;
- u64 m_max_xattr_size; /* maximum size for xattrs blob */
+ /*
+ * maximum size for xattrs blob.
+ * Zeroed by default to force the usage of the (sync) SETXATTR Op.
+ */
+ u64 m_max_xattr_size;
u32 m_max_mds; /* expected up:active mds number */
u32 m_num_active_mds; /* actual up:active mds number */
u32 possible_max_rank; /* possible max rank index */
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (3 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 04/60] ceph: switch to corrected encoding of max_xattr_size in mdsmap Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 17:29 ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 06/60] mm: migrate: remove THP mapcount " Sasha Levin
` (61 subsequent siblings)
66 siblings, 1 reply; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kefeng Wang, Matthew Wilcox, David Hildenbrand, Huang Ying,
Hugh Dickins, Mike Kravetz, Zi Yan, Andrew Morton, Sasha Levin
From: Kefeng Wang <wangkefeng.wang@huawei.com>
[ Upstream commit a8ac4a767dcd9d87d8229045904d9fe15ea5e0e8 ]
Patch series "mm: migrate: more folio conversion and unification", v3.
Convert more migrate functions to use a folio, it is also a preparation
for large folio migration support when balancing numa.
This patch (of 8):
The assert VM_BUG_ON_PAGE(order && !PageTransHuge(page), page) is not very
useful,
1) for a tail/base page, order = 0, for a head page, the order > 0 &&
PageTransHuge() is true
2) there is a PageCompound() check and only base page is handled in
do_numa_page(), and do_huge_pmd_numa_page() only handle PMD-mapped
THP
3) even though the page is a tail page, isolate_lru_page() will post
a warning, and fail to isolate the page
4) if large folio/pte-mapped THP migration supported in the future,
we could migrate the entire folio if numa fault on a tail page
so just remove the check.
Link: https://lkml.kernel.org/r/20230913095131.2426871-1-wangkefeng.wang@huawei.com
Link: https://lkml.kernel.org/r/20230913095131.2426871-2-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/migrate.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index b4d972d80b10c..6f8ad6b64c9bc 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2506,8 +2506,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
int nr_pages = thp_nr_pages(page);
int order = compound_order(page);
- VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
-
/* Do not migrate THP mapped by multiple processes */
if (PageTransHuge(page) && total_mapcount(page) > 1)
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 06/60] mm: migrate: remove THP mapcount check in numamigrate_isolate_page()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (4 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 17:31 ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio() Sasha Levin
` (60 subsequent siblings)
66 siblings, 1 reply; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kefeng Wang, Matthew Wilcox, Huang, Ying, David Hildenbrand,
Hugh Dickins, Mike Kravetz, Zi Yan, Andrew Morton, Sasha Levin
From: Kefeng Wang <wangkefeng.wang@huawei.com>
[ Upstream commit 728be28fae8c838d52c91dce4867133798146357 ]
The check of THP mapped by multiple processes was introduced by commit
04fa5d6a6547 ("mm: migrate: check page_count of THP before migrating") and
refactor by commit 340ef3902cf2 ("mm: numa: cleanup flow of transhuge page
migration"), which is out of date, since migrate_misplaced_page() is now
using the standard migrate_pages() for small pages and THPs, the reference
count checking is in folio_migrate_mapping(), so let's remove the special
check for THP.
Link: https://lkml.kernel.org/r/20230913095131.2426871-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/migrate.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 6f8ad6b64c9bc..c9fabb960996f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2506,10 +2506,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
int nr_pages = thp_nr_pages(page);
int order = compound_order(page);
- /* Do not migrate THP mapped by multiple processes */
- if (PageTransHuge(page) && total_mapcount(page) > 1)
- return 0;
-
/* Avoid migrating to a node that is nearly full */
if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
int z;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (5 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 06/60] mm: migrate: remove THP mapcount " Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 17:32 ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 08/60] mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index Sasha Levin
` (59 subsequent siblings)
66 siblings, 1 reply; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Kefeng Wang, Zi Yan, David Hildenbrand, Huang, Ying, Hugh Dickins,
Matthew Wilcox, Mike Kravetz, Andrew Morton, Sasha Levin
From: Kefeng Wang <wangkefeng.wang@huawei.com>
[ Upstream commit 2ac9e99f3b21b2864305fbfba4bae5913274c409 ]
Rename numamigrate_isolate_page() to numamigrate_isolate_folio(), then
make it takes a folio and use folio API to save compound_head() calls.
Link: https://lkml.kernel.org/r/20230913095131.2426871-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/migrate.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index c9fabb960996f..e5f2f7243a659 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2501,10 +2501,9 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
return __folio_alloc_node(gfp, order, nid);
}
-static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
+static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio)
{
- int nr_pages = thp_nr_pages(page);
- int order = compound_order(page);
+ int nr_pages = folio_nr_pages(folio);
/* Avoid migrating to a node that is nearly full */
if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
@@ -2516,22 +2515,23 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
if (managed_zone(pgdat->node_zones + z))
break;
}
- wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
+ wakeup_kswapd(pgdat->node_zones + z, 0,
+ folio_order(folio), ZONE_MOVABLE);
return 0;
}
- if (!isolate_lru_page(page))
+ if (!folio_isolate_lru(folio))
return 0;
- mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
+ node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio),
nr_pages);
/*
- * Isolating the page has taken another reference, so the
- * caller's reference can be safely dropped without the page
+ * Isolating the folio has taken another reference, so the
+ * caller's reference can be safely dropped without the folio
* disappearing underneath us during migration.
*/
- put_page(page);
+ folio_put(folio);
return 1;
}
@@ -2565,7 +2565,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
if (page_is_file_lru(page) && PageDirty(page))
goto out;
- isolated = numamigrate_isolate_page(pgdat, page);
+ isolated = numamigrate_isolate_folio(pgdat, page_folio(page));
if (!isolated)
goto out;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 08/60] mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (6 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 09/60] xfrm: Pass UDP encapsulation in TX packet offload Sasha Levin
` (58 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Byungchul Park, Hyeongtak Ji, Oscar Salvador, Baolin Wang,
Huang, Ying, Johannes Weiner, Andrew Morton, Sasha Levin
From: Byungchul Park <byungchul@sk.com>
[ Upstream commit 2774f256e7c0219e2b0a0894af1c76bdabc4f974 ]
With numa balancing on, when a numa system is running where a numa node
doesn't have its local memory so it has no managed zones, the following
oops has been observed. It's because wakeup_kswapd() is called with a
wrong zone index, -1. Fixed it by checking the index before calling
wakeup_kswapd().
> BUG: unable to handle page fault for address: 00000000000033f3
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812)
> Code: (omitted)
> RSP: 0000:ffffc90004257d58 EFLAGS: 00010286
> RAX: ffffffffffffffff RBX: ffff88883fff0480 RCX: 0000000000000003
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88883fff0480
> RBP: ffffffffffffffff R08: ff0003ffffffffff R09: ffffffffffffffff
> R10: ffff888106c95540 R11: 0000000055555554 R12: 0000000000000003
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff88883fff0940
> FS: 00007fc4b8124740(0000) GS:ffff888827c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000033f3 CR3: 000000026cc08004 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? __die
> ? page_fault_oops
> ? __pte_offset_map_lock
> ? exc_page_fault
> ? asm_exc_page_fault
> ? wakeup_kswapd
> migrate_misplaced_page
> __handle_mm_fault
> handle_mm_fault
> do_user_addr_fault
> exc_page_fault
> asm_exc_page_fault
> RIP: 0033:0x55b897ba0808
> Code: (omitted)
> RSP: 002b:00007ffeefa821a0 EFLAGS: 00010287
> RAX: 000055b89983acd0 RBX: 00007ffeefa823f8 RCX: 000055b89983acd0
> RDX: 00007fc2f8122010 RSI: 0000000000020000 RDI: 000055b89983acd0
> RBP: 00007ffeefa821a0 R08: 0000000000000037 R09: 0000000000000075
> R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> R13: 00007ffeefa82410 R14: 000055b897ba5dd8 R15: 00007fc4b8340000
> </TASK>
Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reported-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system")
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/migrate.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/migrate.c b/mm/migrate.c
index e5f2f7243a659..d69b4556cc15f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2515,6 +2515,14 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio)
if (managed_zone(pgdat->node_zones + z))
break;
}
+
+ /*
+ * If there are no managed zones, it should not proceed
+ * further.
+ */
+ if (z < 0)
+ return 0;
+
wakeup_kswapd(pgdat->node_zones + z, 0,
folio_order(folio), ZONE_MOVABLE);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 09/60] xfrm: Pass UDP encapsulation in TX packet offload
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (7 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 08/60] mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 10/60] net: lan78xx: fix runtime PM count underflow on link stop Sasha Levin
` (57 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Leon Romanovsky, Steffen Klassert, Mike Yu, Saeed Mahameed,
Sasha Levin
From: Leon Romanovsky <leonro@nvidia.com>
[ Upstream commit 983a73da1f996faee9997149eb05b12fa7bd8cbf ]
In addition to citied commit in Fixes line, allow UDP encapsulation in
TX path too.
Fixes: 89edf40220be ("xfrm: Support UDP encapsulation in packet offload mode")
CC: Steffen Klassert <steffen.klassert@secunet.com>
Reported-by: Mike Yu <yumike@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/xfrm/xfrm_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 3784534c91855..653e51ae39648 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -407,7 +407,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
struct net_device *dev = x->xso.dev;
- if (!x->type_offload || x->encap)
+ if (!x->type_offload)
return false;
if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET ||
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 10/60] net: lan78xx: fix runtime PM count underflow on link stop
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (8 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 09/60] xfrm: Pass UDP encapsulation in TX packet offload Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 11/60] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able Sasha Levin
` (56 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Oleksij Rempel, Jiri Pirko, David S . Miller, Sasha Levin
From: Oleksij Rempel <o.rempel@pengutronix.de>
[ Upstream commit 1eecc7ab82c42133b748e1895275942a054a7f67 ]
Current driver has some asymmetry in the runtime PM calls. On lan78xx_open()
it will call usb_autopm_get() and unconditionally usb_autopm_put(). And
on lan78xx_stop() it will call only usb_autopm_put(). So far, it was
working only because this driver do not activate autosuspend by default,
so it was visible only by warning "Runtime PM usage count underflow!".
Since, with current driver, we can't use runtime PM with active link,
execute lan78xx_open()->usb_autopm_put() only in error case. Otherwise,
keep ref counting high as long as interface is open.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/usb/lan78xx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 8b1e1e1c8d5be..921ae046f8604 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3137,7 +3137,8 @@ static int lan78xx_open(struct net_device *net)
done:
mutex_unlock(&dev->dev_mutex);
- usb_autopm_put_interface(dev->intf);
+ if (ret < 0)
+ usb_autopm_put_interface(dev->intf);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 11/60] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (9 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 10/60] net: lan78xx: fix runtime PM count underflow on link stop Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 12/60] i40e: disable NAPI right after disabling irqs when handling xsk_pool Sasha Levin
` (55 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Maciej Fijalkowski, Pavel Vazharov, Magnus Karlsson,
Chandan Kumar Rout, Tony Nguyen, Sasha Levin
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
[ Upstream commit cbf996f52c4e658b3fb4349a869a62fd2d4c3c1c ]
Currently routines that are supposed to toggle state of ring pair do not
take care of associated interrupt with queue vector that these rings
belong to. This causes funky issues such as dead interface due to irq
misconfiguration, as per Pavel's report from Closes: tag.
Add a function responsible for disabling single IRQ in EIMC register and
call this as a very first thing when disabling ring pair during xsk_pool
setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
disable/enable NAPI as first/last thing when dealing with closing or
opening ring pair that xsk_pool is being configured on.
Reported-by: Pavel Vazharov <pavel@x3me.net>
Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
Fixes: 024aa5800f32 ("ixgbe: added Rx/Tx ring disable/enable functions")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++++++++++---
1 file changed, 49 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9d4f808c4bfa3..cb23aad5953b0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2939,8 +2939,8 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
u64 qmask)
{
- u32 mask;
struct ixgbe_hw *hw = &adapter->hw;
+ u32 mask;
switch (hw->mac.type) {
case ixgbe_mac_82598EB:
@@ -10524,6 +10524,44 @@ static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
}
+/**
+ * ixgbe_irq_disable_single - Disable single IRQ vector
+ * @adapter: adapter structure
+ * @ring: ring index
+ **/
+static void ixgbe_irq_disable_single(struct ixgbe_adapter *adapter, u32 ring)
+{
+ struct ixgbe_hw *hw = &adapter->hw;
+ u64 qmask = BIT_ULL(ring);
+ u32 mask;
+
+ switch (adapter->hw.mac.type) {
+ case ixgbe_mac_82598EB:
+ mask = qmask & IXGBE_EIMC_RTX_QUEUE;
+ IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask);
+ break;
+ case ixgbe_mac_82599EB:
+ case ixgbe_mac_X540:
+ case ixgbe_mac_X550:
+ case ixgbe_mac_X550EM_x:
+ case ixgbe_mac_x550em_a:
+ mask = (qmask & 0xFFFFFFFF);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+ mask = (qmask >> 32);
+ if (mask)
+ IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+ break;
+ default:
+ break;
+ }
+ IXGBE_WRITE_FLUSH(&adapter->hw);
+ if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+ synchronize_irq(adapter->msix_entries[ring].vector);
+ else
+ synchronize_irq(adapter->pdev->irq);
+}
+
/**
* ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
* @adapter: adapter structure
@@ -10540,6 +10578,11 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];
+ ixgbe_irq_disable_single(adapter, ring);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_disable(&rx_ring->q_vector->napi);
+
ixgbe_disable_txr(adapter, tx_ring);
if (xdp_ring)
ixgbe_disable_txr(adapter, xdp_ring);
@@ -10548,9 +10591,6 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
if (xdp_ring)
synchronize_rcu();
- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_disable(&rx_ring->q_vector->napi);
-
ixgbe_clean_tx_ring(tx_ring);
if (xdp_ring)
ixgbe_clean_tx_ring(xdp_ring);
@@ -10578,9 +10618,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
tx_ring = adapter->tx_ring[ring];
xdp_ring = adapter->xdp_ring[ring];
- /* Rx/Tx/XDP Tx share the same napi context. */
- napi_enable(&rx_ring->q_vector->napi);
-
ixgbe_configure_tx_ring(adapter, tx_ring);
if (xdp_ring)
ixgbe_configure_tx_ring(adapter, xdp_ring);
@@ -10589,6 +10626,11 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
if (xdp_ring)
clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+
+ /* Rx/Tx/XDP Tx share the same napi context. */
+ napi_enable(&rx_ring->q_vector->napi);
+ ixgbe_irq_enable_queues(adapter, BIT_ULL(ring));
+ IXGBE_WRITE_FLUSH(&adapter->hw);
}
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 12/60] i40e: disable NAPI right after disabling irqs when handling xsk_pool
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (10 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 11/60] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 13/60] ice: reorder disabling IRQ and NAPI in ice_qp_dis Sasha Levin
` (54 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Maciej Fijalkowski, Chandan Kumar Rout, Magnus Karlsson,
Tony Nguyen, Sasha Levin
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
[ Upstream commit d562b11c1eac7d73f4c778b4cbe5468f86b1f20d ]
Disable NAPI before shutting down queues that this particular NAPI
contains so that the order of actions in i40e_queue_pair_disable()
mirrors what we do in i40e_queue_pair_enable().
Fixes: 123cecd427b6 ("i40e: added queue pair disable/enable functions")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 9d37c0374c75e..ae32e83a69902 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -13617,9 +13617,9 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair)
return err;
i40e_queue_pair_disable_irq(vsi, queue_pair);
+ i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
err = i40e_queue_pair_toggle_rings(vsi, queue_pair, false /* off */);
i40e_clean_rx_ring(vsi->rx_rings[queue_pair]);
- i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
i40e_queue_pair_clean_rings(vsi, queue_pair);
i40e_queue_pair_reset_stats(vsi, queue_pair);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 13/60] ice: reorder disabling IRQ and NAPI in ice_qp_dis
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (11 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 12/60] i40e: disable NAPI right after disabling irqs when handling xsk_pool Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 14/60] Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" Sasha Levin
` (53 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Maciej Fijalkowski, Chandan Kumar Rout, Magnus Karlsson,
Tony Nguyen, Sasha Levin
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
[ Upstream commit 99099c6bc75a30b76bb5d6774a0509ab6f06af05 ]
ice_qp_dis() currently does things in very mixed way. Tx is stopped
before disabling IRQ on related queue vector, then it takes care of
disabling Rx and finally NAPI is disabled.
Let us start with disabling IRQs in the first place followed by turning
off NAPI. Then it is safe to handle queues.
One subtle change on top of that is that even though ice_qp_ena() looks
more sane, clear ICE_CFG_BUSY as the last thing there.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_xsk.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 307c609137bdf..7bd71660011e4 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -179,6 +179,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
return -EBUSY;
usleep_range(1000, 2000);
}
+
+ ice_qvec_dis_irq(vsi, rx_ring, q_vector);
+ ice_qvec_toggle_napi(vsi, q_vector, false);
+
netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
ice_fill_txq_meta(vsi, tx_ring, &txq_meta);
@@ -195,13 +199,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
if (err)
return err;
}
- ice_qvec_dis_irq(vsi, rx_ring, q_vector);
-
err = ice_vsi_ctrl_one_rx_ring(vsi, false, q_idx, true);
if (err)
return err;
- ice_qvec_toggle_napi(vsi, q_vector, false);
ice_qp_clean_rings(vsi, q_idx);
ice_qp_reset_stats(vsi, q_idx);
@@ -264,11 +265,11 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
if (err)
goto free_buf;
- clear_bit(ICE_CFG_BUSY, vsi->state);
ice_qvec_toggle_napi(vsi, q_vector, true);
ice_qvec_ena_irq(vsi, q_vector);
netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
+ clear_bit(ICE_CFG_BUSY, vsi->state);
free_buf:
kfree(qg_buf);
return err;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 14/60] Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (12 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 13/60] ice: reorder disabling IRQ and NAPI in ice_qp_dis Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 15/60] Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" Sasha Levin
` (52 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Gavin Li, Jiri Pirko, Saeed Mahameed, Sasha Levin
From: Gavin Li <gavinl@nvidia.com>
[ Upstream commit 8deeefb24786ea7950b37bde4516b286c877db00 ]
This reverts commit 662404b24a4c4d839839ed25e3097571f5938b9b.
The revert is required due to the suspicion it is not good for anything
and cause crash.
Fixes: 662404b24a4c ("net/mlx5e: Block entering switchdev mode with ns inconsistency")
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../mellanox/mlx5/core/eswitch_offloads.c | 23 -------------------
1 file changed, 23 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index b0455134c98ef..14b3bd3c5e2f7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3658,22 +3658,6 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
return 0;
}
-static bool esw_offloads_devlink_ns_eq_netdev_ns(struct devlink *devlink)
-{
- struct mlx5_core_dev *dev = devlink_priv(devlink);
- struct net *devl_net, *netdev_net;
- bool ret = false;
-
- mutex_lock(&dev->mlx5e_res.uplink_netdev_lock);
- if (dev->mlx5e_res.uplink_netdev) {
- netdev_net = dev_net(dev->mlx5e_res.uplink_netdev);
- devl_net = devlink_net(devlink);
- ret = net_eq(devl_net, netdev_net);
- }
- mutex_unlock(&dev->mlx5e_res.uplink_netdev_lock);
- return ret;
-}
-
int mlx5_eswitch_block_mode(struct mlx5_core_dev *dev)
{
struct mlx5_eswitch *esw = dev->priv.eswitch;
@@ -3718,13 +3702,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
if (esw_mode_from_devlink(mode, &mlx5_mode))
return -EINVAL;
- if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
- !esw_offloads_devlink_ns_eq_netdev_ns(devlink)) {
- NL_SET_ERR_MSG_MOD(extack,
- "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
- return -EPERM;
- }
-
mlx5_lag_disable_change(esw->dev);
err = mlx5_esw_try_lock(esw);
if (err < 0) {
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 15/60] Revert "net/mlx5e: Check the number of elements before walk TC rhashtable"
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (13 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 14/60] Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 16/60] net/mlx5: E-switch, Change flow rule destination checking Sasha Levin
` (51 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Saeed Mahameed, Sasha Levin
From: Saeed Mahameed <saeedm@nvidia.com>
[ Upstream commit b7bbd698c90591546d22093181e266785f08c18b ]
This reverts commit 4e25b661f484df54b6751b65f9ea2434a3b67539.
This Commit was mistakenly applied by pulling the wrong tag, remove it.
Fixes: 4e25b661f484 ("net/mlx5e: Check the number of elements before walk TC rhashtable")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
index d5d33c3b3aa2a..13b5916b64e22 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
@@ -152,7 +152,7 @@ void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev)
xa_for_each(&esw->offloads.vport_reps, i, rep) {
rpriv = rep->rep_data[REP_ETH].priv;
- if (!rpriv || !rpriv->netdev || !atomic_read(&rpriv->tc_ht.nelems))
+ if (!rpriv || !rpriv->netdev)
continue;
rhashtable_walk_enter(&rpriv->tc_ht, &iter);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 16/60] net/mlx5: E-switch, Change flow rule destination checking
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (14 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 15/60] Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 17/60] net/mlx5: Check capability for fw_reset Sasha Levin
` (50 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jianbo Liu, Rahul Rameshbabu, Saeed Mahameed, Sasha Levin
From: Jianbo Liu <jianbol@nvidia.com>
[ Upstream commit 85ea2c5c5ef5f24fe6e6e7028ddd90be1cb5d27e ]
The checking in the cited commit is not accurate. In the common case,
VF destination is internal, and uplink destination is external.
However, uplink destination with packet reformat is considered as
internal because firmware uses LB+hairpin to support it. Update the
checking so header rewrite rules with both internal and external
destinations are not allowed.
Fixes: e0e22d59b47a ("net/mlx5: E-switch, Add checking for flow rule destinations")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../mellanox/mlx5/core/eswitch_offloads.c | 23 +++++++++++--------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 14b3bd3c5e2f7..baaae628b0a0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -535,21 +535,26 @@ esw_src_port_rewrite_supported(struct mlx5_eswitch *esw)
}
static bool
-esw_dests_to_vf_pf_vports(struct mlx5_flow_destination *dests, int max_dest)
+esw_dests_to_int_external(struct mlx5_flow_destination *dests, int max_dest)
{
- bool vf_dest = false, pf_dest = false;
+ bool internal_dest = false, external_dest = false;
int i;
for (i = 0; i < max_dest; i++) {
- if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT)
+ if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT &&
+ dests[i].type != MLX5_FLOW_DESTINATION_TYPE_UPLINK)
continue;
- if (dests[i].vport.num == MLX5_VPORT_UPLINK)
- pf_dest = true;
+ /* Uplink dest is external, but considered as internal
+ * if there is reformat because firmware uses LB+hairpin to support it.
+ */
+ if (dests[i].vport.num == MLX5_VPORT_UPLINK &&
+ !(dests[i].vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID))
+ external_dest = true;
else
- vf_dest = true;
+ internal_dest = true;
- if (vf_dest && pf_dest)
+ if (internal_dest && external_dest)
return true;
}
@@ -695,9 +700,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
/* Header rewrite with combined wire+loopback in FDB is not allowed */
if ((flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) &&
- esw_dests_to_vf_pf_vports(dest, i)) {
+ esw_dests_to_int_external(dest, i)) {
esw_warn(esw->dev,
- "FDB: Header rewrite with forwarding to both PF and VF is not allowed\n");
+ "FDB: Header rewrite with forwarding to both internal and external dests is not allowed\n");
rule = ERR_PTR(-EINVAL);
goto err_esw_get;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 17/60] net/mlx5: Check capability for fw_reset
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (15 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 16/60] net/mlx5: E-switch, Change flow rule destination checking Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 18/60] net/mlx5e: Change the warning when ignore_flow_level is not supported Sasha Levin
` (49 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Moshe Shemesh, Aya Levin, Saeed Mahameed, Sasha Levin
From: Moshe Shemesh <moshe@nvidia.com>
[ Upstream commit 5e6107b499f3fc4748109e1d87fd9603b34f1e0d ]
Functions which can't access MFRL (Management Firmware Reset Level)
register, have no use of fw_reset structures or events. Remove fw_reset
structures allocation and registration for fw reset events notifications
for these functions.
Having the devlink param enable_remote_dev_reset on functions that don't
have this capability is misleading as these functions are not allowed to
influence the reset flow. Hence, this patch removes this parameter for
such functions.
In addition, return not supported on devlink reload action fw_activate
for these functions.
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../net/ethernet/mellanox/mlx5/core/devlink.c | 6 +++++
.../ethernet/mellanox/mlx5/core/fw_reset.c | 22 +++++++++++++++++--
include/linux/mlx5/mlx5_ifc.h | 4 +++-
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index af8460bb257b9..1bccb5633ab4b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -168,6 +168,12 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
return -EOPNOTSUPP;
}
+ if (action == DEVLINK_RELOAD_ACTION_FW_ACTIVATE &&
+ !dev->priv.fw_reset) {
+ NL_SET_ERR_MSG_MOD(extack, "FW activate is unsupported for this function");
+ return -EOPNOTSUPP;
+ }
+
if (mlx5_core_is_pf(dev) && pci_num_vf(pdev))
NL_SET_ERR_MSG_MOD(extack, "reload while VFs are present is unfavorable");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index c4e19d627da21..3a9cdf79403ae 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -679,19 +679,30 @@ void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
mlx5_eq_notifier_register(dev, &fw_reset->nb);
}
void mlx5_fw_reset_events_stop(struct mlx5_core_dev *dev)
{
- mlx5_eq_notifier_unregister(dev, &dev->priv.fw_reset->nb);
+ struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+ if (!fw_reset)
+ return;
+
+ mlx5_eq_notifier_unregister(dev, &fw_reset->nb);
}
void mlx5_drain_fw_reset(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
set_bit(MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, &fw_reset->reset_flags);
cancel_work_sync(&fw_reset->fw_live_patch_work);
cancel_work_sync(&fw_reset->reset_request_work);
@@ -709,9 +720,13 @@ static const struct devlink_param mlx5_fw_reset_devlink_params[] = {
int mlx5_fw_reset_init(struct mlx5_core_dev *dev)
{
- struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
+ struct mlx5_fw_reset *fw_reset;
int err;
+ if (!MLX5_CAP_MCAM_REG(dev, mfrl))
+ return 0;
+
+ fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
if (!fw_reset)
return -ENOMEM;
fw_reset->wq = create_singlethread_workqueue("mlx5_fw_reset_events");
@@ -747,6 +762,9 @@ void mlx5_fw_reset_cleanup(struct mlx5_core_dev *dev)
{
struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+ if (!fw_reset)
+ return;
+
devl_params_unregister(priv_to_devlink(dev),
mlx5_fw_reset_devlink_params,
ARRAY_SIZE(mlx5_fw_reset_devlink_params));
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 643e9ba4e64bd..58128de5dbdda 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -10154,7 +10154,9 @@ struct mlx5_ifc_mcam_access_reg_bits {
u8 regs_63_to_46[0x12];
u8 mrtc[0x1];
- u8 regs_44_to_32[0xd];
+ u8 regs_44_to_41[0x4];
+ u8 mfrl[0x1];
+ u8 regs_39_to_32[0x8];
u8 regs_31_to_10[0x16];
u8 mtmp[0x1];
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 18/60] net/mlx5e: Change the warning when ignore_flow_level is not supported
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (16 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 17/60] net/mlx5: Check capability for fw_reset Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 19/60] net/mlx5e: Fix MACsec state loss upon state update in offload path Sasha Levin
` (48 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jianbo Liu, Elliott, Robert, Roi Dayan, Saeed Mahameed,
Sasha Levin
From: Jianbo Liu <jianbol@nvidia.com>
[ Upstream commit dd238b702064b21d25b4fc39a19699319746d655 ]
Downgrade the print from mlx5_core_warn() to mlx5_core_dbg(), as it
is just a statement of fact that firmware doesn't support ignore flow
level.
And change the wording to "firmware flow level support is missing", to
make it more accurate.
Fixes: ae2ee3be99a8 ("net/mlx5: CT: Remove warning of ignore_flow_level support for VFs")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Suggested-by: Elliott, Robert (Servers) <elliott@hpe.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
index 86bf007fd05b7..b500cc2c9689d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
@@ -37,7 +37,7 @@ mlx5e_tc_post_act_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
if (!MLX5_CAP_FLOWTABLE_TYPE(priv->mdev, ignore_flow_level, table_type)) {
if (priv->mdev->coredev_type == MLX5_COREDEV_PF)
- mlx5_core_warn(priv->mdev, "firmware level support is missing\n");
+ mlx5_core_dbg(priv->mdev, "firmware flow level support is missing\n");
err = -EOPNOTSUPP;
goto err_check;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 19/60] net/mlx5e: Fix MACsec state loss upon state update in offload path
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (17 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 18/60] net/mlx5e: Change the warning when ignore_flow_level is not supported Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 20/60] net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map Sasha Levin
` (47 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Emeel Hakim, Rahul Rameshbabu, Gal Pressman, Tariq Toukan,
Saeed Mahameed, Sasha Levin
From: Emeel Hakim <ehakim@nvidia.com>
[ Upstream commit a71f2147b64941efee156bfda54fd6461d0f95df ]
The packet number attribute of the SA is incremented by the device rather
than the software stack when enabling hardware offload. Because the packet
number attribute is managed by the hardware, the software has no insight
into the value of the packet number attribute actually written by the
device.
Previously when MACsec offload was enabled, the hardware object for
handling the offload was destroyed when the SA was disabled. Re-enabling
the SA would lead to a new hardware object being instantiated. This new
hardware object would not have any recollection of the correct packet
number for the SA. Instead, destroy the flow steering rule when
deactivating the SA and recreate it upon reactivation, preserving the
original hardware object.
Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support")
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
.../mellanox/mlx5/core/en_accel/macsec.c | 82 ++++++++++++-------
1 file changed, 51 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
index d4ebd87431145..b2cabd6ab86cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
@@ -310,9 +310,9 @@ static void mlx5e_macsec_destroy_object(struct mlx5_core_dev *mdev, u32 macsec_o
mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
}
-static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
- struct mlx5e_macsec_sa *sa,
- bool is_tx, struct net_device *netdev, u32 fs_id)
+static void mlx5e_macsec_cleanup_sa_fs(struct mlx5e_macsec *macsec,
+ struct mlx5e_macsec_sa *sa, bool is_tx,
+ struct net_device *netdev, u32 fs_id)
{
int action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
@@ -322,20 +322,49 @@ static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
mlx5_macsec_fs_del_rule(macsec->mdev->macsec_fs, sa->macsec_rule, action, netdev,
fs_id);
- mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
sa->macsec_rule = NULL;
}
+static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
+ struct mlx5e_macsec_sa *sa, bool is_tx,
+ struct net_device *netdev, u32 fs_id)
+{
+ mlx5e_macsec_cleanup_sa_fs(macsec, sa, is_tx, netdev, fs_id);
+ mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
+}
+
+static int mlx5e_macsec_init_sa_fs(struct macsec_context *ctx,
+ struct mlx5e_macsec_sa *sa, bool encrypt,
+ bool is_tx, u32 *fs_id)
+{
+ struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
+ struct mlx5_macsec_fs *macsec_fs = priv->mdev->macsec_fs;
+ struct mlx5_macsec_rule_attrs rule_attrs;
+ union mlx5_macsec_rule *macsec_rule;
+
+ rule_attrs.macsec_obj_id = sa->macsec_obj_id;
+ rule_attrs.sci = sa->sci;
+ rule_attrs.assoc_num = sa->assoc_num;
+ rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
+ MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
+
+ macsec_rule = mlx5_macsec_fs_add_rule(macsec_fs, ctx, &rule_attrs, fs_id);
+ if (!macsec_rule)
+ return -ENOMEM;
+
+ sa->macsec_rule = macsec_rule;
+
+ return 0;
+}
+
static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
struct mlx5e_macsec_sa *sa,
bool encrypt, bool is_tx, u32 *fs_id)
{
struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
struct mlx5e_macsec *macsec = priv->macsec;
- struct mlx5_macsec_rule_attrs rule_attrs;
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5_macsec_obj_attrs obj_attrs;
- union mlx5_macsec_rule *macsec_rule;
int err;
obj_attrs.next_pn = sa->next_pn;
@@ -357,20 +386,12 @@ static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
if (err)
return err;
- rule_attrs.macsec_obj_id = sa->macsec_obj_id;
- rule_attrs.sci = sa->sci;
- rule_attrs.assoc_num = sa->assoc_num;
- rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
- MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
-
- macsec_rule = mlx5_macsec_fs_add_rule(mdev->macsec_fs, ctx, &rule_attrs, fs_id);
- if (!macsec_rule) {
- err = -ENOMEM;
- goto destroy_macsec_object;
+ if (sa->active) {
+ err = mlx5e_macsec_init_sa_fs(ctx, sa, encrypt, is_tx, fs_id);
+ if (err)
+ goto destroy_macsec_object;
}
- sa->macsec_rule = macsec_rule;
-
return 0;
destroy_macsec_object:
@@ -526,9 +547,7 @@ static int mlx5e_macsec_add_txsa(struct macsec_context *ctx)
goto destroy_sa;
macsec_device->tx_sa[assoc_num] = tx_sa;
- if (!secy->operational ||
- assoc_num != tx_sc->encoding_sa ||
- !tx_sa->active)
+ if (!secy->operational)
goto out;
err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
@@ -595,7 +614,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
goto out;
if (ctx_tx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+ err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
if (err)
goto out;
} else {
@@ -604,7 +623,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
goto out;
}
- mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+ mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
}
out:
mutex_unlock(&macsec->lock);
@@ -1030,8 +1049,9 @@ static int mlx5e_macsec_del_rxsa(struct macsec_context *ctx)
goto out;
}
- mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
- rx_sc->sc_xarray_element->fs_id);
+ if (rx_sa->active)
+ mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
+ rx_sc->sc_xarray_element->fs_id);
mlx5_destroy_encryption_key(macsec->mdev, rx_sa->enc_key_id);
kfree(rx_sa);
rx_sc->rx_sa[assoc_num] = NULL;
@@ -1112,8 +1132,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
if (!rx_sa || !rx_sa->macsec_rule)
continue;
- mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
- rx_sc->sc_xarray_element->fs_id);
+ mlx5e_macsec_cleanup_sa_fs(macsec, rx_sa, false, ctx->secy->netdev,
+ rx_sc->sc_xarray_element->fs_id);
}
}
@@ -1124,8 +1144,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
continue;
if (rx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, rx_sa, true, false,
- &rx_sc->sc_xarray_element->fs_id);
+ err = mlx5e_macsec_init_sa_fs(ctx, rx_sa, true, false,
+ &rx_sc->sc_xarray_element->fs_id);
if (err)
goto out;
}
@@ -1178,7 +1198,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
if (!tx_sa)
continue;
- mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+ mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
}
for (i = 0; i < MACSEC_NUM_AN; ++i) {
@@ -1187,7 +1207,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
continue;
if (tx_sa->assoc_num == tx_sc->encoding_sa && tx_sa->active) {
- err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+ err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
if (err)
goto out;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 20/60] net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (18 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 19/60] net/mlx5e: Fix MACsec state loss upon state update in offload path Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 21/60] net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context Sasha Levin
` (46 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Rahul Rameshbabu, Saeed Mahameed, Vadim Fedorenko, Sasha Levin
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
[ Upstream commit b7cf07586c40f926063d4d09f7de28ff82f62b2a ]
Just simply reordering the functions mlx5e_ptp_metadata_map_put and
mlx5e_ptpsq_track_metadata in the mlx5e_txwqe_complete context is not good
enough since both the compiler and CPU are free to reorder these two
functions. If reordering does occur, the issue that was supposedly fixed by
7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating
metadata map") will be seen. This will lead to NULL pointer dereferences in
mlx5e_ptpsq_mark_ts_cqes_undelivered in the NAPI polling context due to the
tracking list being populated before the metadata map.
Fixes: 7e3f3ba97e6c ("net/mlx5e: Track xmit submission to PTP WQ after populating metadata map")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index f0b506e562df3..1ead69c5f5fa3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -401,6 +401,8 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
mlx5e_skb_cb_hwtstamp_init(skb);
mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb,
metadata_index);
+ /* ensure skb is put on metadata_map before tracking the index */
+ wmb();
mlx5e_ptpsq_track_metadata(sq->ptpsq, metadata_index);
if (!netif_tx_queue_stopped(sq->txq) &&
mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq)) {
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 21/60] net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (19 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 20/60] net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 22/60] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string Sasha Levin
` (45 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Rahul Rameshbabu, Saeed Mahameed, Vadim Fedorenko, Sasha Levin
From: Rahul Rameshbabu <rrameshbabu@nvidia.com>
[ Upstream commit 90502d433c0e7e5483745a574cb719dd5d05b10c ]
The NAPI poll context is a softirq context. Do not use normal spinlock API
in this context to prevent concurrency issues.
Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
CC: Vadim Fedorenko <vadfed@meta.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
index 803035d4e5976..15d97c685ad33 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
@@ -42,9 +42,9 @@ mlx5e_ptp_port_ts_cqe_list_add(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metad
WARN_ON_ONCE(tracker->inuse);
tracker->inuse = true;
- spin_lock(&list->tracker_list_lock);
+ spin_lock_bh(&list->tracker_list_lock);
list_add_tail(&tracker->entry, &list->tracker_list_head);
- spin_unlock(&list->tracker_list_lock);
+ spin_unlock_bh(&list->tracker_list_lock);
}
static void
@@ -54,9 +54,9 @@ mlx5e_ptp_port_ts_cqe_list_remove(struct mlx5e_ptp_port_ts_cqe_list *list, u8 me
WARN_ON_ONCE(!tracker->inuse);
tracker->inuse = false;
- spin_lock(&list->tracker_list_lock);
+ spin_lock_bh(&list->tracker_list_lock);
list_del(&tracker->entry);
- spin_unlock(&list->tracker_list_lock);
+ spin_unlock_bh(&list->tracker_list_lock);
}
void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata)
@@ -155,7 +155,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map;
struct mlx5e_ptp_port_ts_cqe_tracker *pos, *n;
- spin_lock(&cqe_list->tracker_list_lock);
+ spin_lock_bh(&cqe_list->tracker_list_lock);
list_for_each_entry_safe(pos, n, &cqe_list->tracker_list_head, entry) {
struct sk_buff *skb =
mlx5e_ptp_metadata_map_lookup(metadata_map, pos->metadata_id);
@@ -170,7 +170,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
pos->inuse = false;
list_del(&pos->entry);
}
- spin_unlock(&cqe_list->tracker_list_lock);
+ spin_unlock_bh(&cqe_list->tracker_list_lock);
}
#define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask)
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 22/60] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (20 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 21/60] net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 23/60] geneve: make sure to pull inner header in geneve_rx() Sasha Levin
` (44 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Steven Rostedt (Google), Jamal Hadi Salim, David S . Miller,
Sasha Levin
From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
[ Upstream commit 51270d573a8d9dd5afdc7934de97d66c0e14b5fd ]
I'm updating __assign_str() and will be removing the second parameter. To
make sure that it does not break anything, I make sure that it matches the
__string() field, as that is where the string is actually going to be
saved in. To make sure there's nothing that breaks, I added a WARN_ON() to
make sure that what was used in __string() is the same that is used in
__assign_str().
In doing this change, an error was triggered as __assign_str() now expects
the string passed in to be a char * value. I instead had the following
warning:
include/trace/events/qdisc.h: In function ‘trace_event_raw_event_qdisc_reset’:
include/trace/events/qdisc.h:91:35: error: passing argument 1 of 'strcmp' from incompatible pointer type [-Werror=incompatible-pointer-types]
91 | __assign_str(dev, qdisc_dev(q));
That's because the qdisc_enqueue() and qdisc_reset() pass in qdisc_dev(q)
to __assign_str() and to __string(). But that function returns a pointer
to struct net_device and not a string.
It appears that these events are just saving the pointer as a string and
then reading it as a string as well.
Use qdisc_dev(q)->name to save the device instead.
Fixes: a34dac0b90552 ("net_sched: add tracepoints for qdisc_reset() and qdisc_destroy()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/trace/events/qdisc.h | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/trace/events/qdisc.h b/include/trace/events/qdisc.h
index a3995925cb057..1f4258308b967 100644
--- a/include/trace/events/qdisc.h
+++ b/include/trace/events/qdisc.h
@@ -81,14 +81,14 @@ TRACE_EVENT(qdisc_reset,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q) )
- __string( kind, q->ops->id )
- __field( u32, parent )
- __field( u32, handle )
+ __string( dev, qdisc_dev(q)->name )
+ __string( kind, q->ops->id )
+ __field( u32, parent )
+ __field( u32, handle )
),
TP_fast_assign(
- __assign_str(dev, qdisc_dev(q));
+ __assign_str(dev, qdisc_dev(q)->name);
__assign_str(kind, q->ops->id);
__entry->parent = q->parent;
__entry->handle = q->handle;
@@ -106,14 +106,14 @@ TRACE_EVENT(qdisc_destroy,
TP_ARGS(q),
TP_STRUCT__entry(
- __string( dev, qdisc_dev(q) )
- __string( kind, q->ops->id )
- __field( u32, parent )
- __field( u32, handle )
+ __string( dev, qdisc_dev(q)->name )
+ __string( kind, q->ops->id )
+ __field( u32, parent )
+ __field( u32, handle )
),
TP_fast_assign(
- __assign_str(dev, qdisc_dev(q));
+ __assign_str(dev, qdisc_dev(q)->name);
__assign_str(kind, q->ops->id);
__entry->parent = q->parent;
__entry->handle = q->handle;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 23/60] geneve: make sure to pull inner header in geneve_rx()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (21 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 22/60] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 24/60] net: sparx5: Fix use after free inside sparx5_del_mact_entry Sasha Levin
` (43 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, syzbot+6a1423ff3f97159aae64, Jiri Pirko,
David S . Miller, Sasha Levin
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 1ca1ba465e55b9460e4e75dec9fff31e708fec74 ]
syzbot triggered a bug in geneve_rx() [1]
Issue is similar to the one I fixed in commit 8d975c15c0cd
("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()")
We have to save skb->network_header in a temporary variable
in order to be able to recompute the network_header pointer
after a pskb_inet_may_pull() call.
pskb_inet_may_pull() makes sure the needed headers are in skb->head.
[1]
BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
BUG: KMSAN: uninit-value in geneve_rx drivers/net/geneve.c:279 [inline]
BUG: KMSAN: uninit-value in geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline]
geneve_rx drivers/net/geneve.c:279 [inline]
geneve_udp_encap_recv+0x36f9/0x3c10 drivers/net/geneve.c:391
udp_queue_rcv_one_skb+0x1d39/0x1f20 net/ipv4/udp.c:2108
udp_queue_rcv_skb+0x6ae/0x6e0 net/ipv4/udp.c:2186
udp_unicast_rcv_skb+0x184/0x4b0 net/ipv4/udp.c:2346
__udp4_lib_rcv+0x1c6b/0x3010 net/ipv4/udp.c:2422
udp_rcv+0x7d/0xa0 net/ipv4/udp.c:2604
ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254
dst_input include/net/dst.h:461 [inline]
ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
NF_HOOK include/linux/netfilter.h:314 [inline]
ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569
__netif_receive_skb_one_core net/core/dev.c:5534 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648
process_backlog+0x480/0x8b0 net/core/dev.c:5976
__napi_poll+0xe3/0x980 net/core/dev.c:6576
napi_poll net/core/dev.c:6645 [inline]
net_rx_action+0x8b8/0x1870 net/core/dev.c:6778
__do_softirq+0x1b7/0x7c5 kernel/softirq.c:553
do_softirq+0x9a/0xf0 kernel/softirq.c:454
__local_bh_enable_ip+0x9b/0xa0 kernel/softirq.c:381
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline]
__dev_queue_xmit+0x2768/0x51c0 net/core/dev.c:4378
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3819 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_node+0x5cb/0xbc0 mm/slub.c:3903
kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
__alloc_skb+0x352/0x790 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1296 [inline]
alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6394
sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2783
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x70c2/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels")
Reported-and-tested-by: syzbot+6a1423ff3f97159aae64@syzkaller.appspotmail.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/geneve.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 78f9d588f7129..9566fda8b2e2e 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -221,7 +221,7 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
struct genevehdr *gnvh = geneve_hdr(skb);
struct metadata_dst *tun_dst = NULL;
unsigned int len;
- int err = 0;
+ int nh, err = 0;
void *oiph;
if (ip_tunnel_collect_metadata() || gs->collect_md) {
@@ -272,9 +272,23 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
skb->pkt_type = PACKET_HOST;
}
- oiph = skb_network_header(skb);
+ /* Save offset of outer header relative to skb->head,
+ * because we are going to reset the network header to the inner header
+ * and might change skb->head.
+ */
+ nh = skb_network_header(skb) - skb->head;
+
skb_reset_network_header(skb);
+ if (!pskb_inet_may_pull(skb)) {
+ DEV_STATS_INC(geneve->dev, rx_length_errors);
+ DEV_STATS_INC(geneve->dev, rx_errors);
+ goto drop;
+ }
+
+ /* Get the outer header. */
+ oiph = skb->head + nh;
+
if (geneve_get_sk_family(gs) == AF_INET)
err = IP_ECN_decapsulate(oiph, skb);
#if IS_ENABLED(CONFIG_IPV6)
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 24/60] net: sparx5: Fix use after free inside sparx5_del_mact_entry
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (22 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 23/60] geneve: make sure to pull inner header in geneve_rx() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 25/60] ice: virtchnl: stop pretending to support RSS over AQ or registers Sasha Levin
` (42 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Horatiu Vultur, Simon Horman, Jakub Kicinski, Sasha Levin
From: Horatiu Vultur <horatiu.vultur@microchip.com>
[ Upstream commit 89d72d4125e94aa3c2140fedd97ce07ba9e37674 ]
Based on the static analyzis of the code it looks like when an entry
from the MAC table was removed, the entry was still used after being
freed. More precise the vid of the mac_entry was used after calling
devm_kfree on the mac_entry.
The fix consists in first using the vid of the mac_entry to delete the
entry from the HW and after that to free it.
Fixes: b37a1bae742f ("net: sparx5: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240301080608.3053468-1-horatiu.vultur@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
index 4af285918ea2a..75868b3f548ec 100644
--- a/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
+++ b/drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
@@ -347,10 +347,10 @@ int sparx5_del_mact_entry(struct sparx5 *sparx5,
list) {
if ((vid == 0 || mact_entry->vid == vid) &&
ether_addr_equal(addr, mact_entry->mac)) {
+ sparx5_mact_forget(sparx5, addr, mact_entry->vid);
+
list_del(&mact_entry->list);
devm_kfree(sparx5->dev, mact_entry);
-
- sparx5_mact_forget(sparx5, addr, mact_entry->vid);
}
}
mutex_unlock(&sparx5->mact_lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 25/60] ice: virtchnl: stop pretending to support RSS over AQ or registers
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (23 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 24/60] net: sparx5: Fix use after free inside sparx5_del_mact_entry Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 26/60] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() Sasha Levin
` (41 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jacob Keller, Alan Brady, Rafal Romanowski, Tony Nguyen,
Sasha Levin
From: Jacob Keller <jacob.e.keller@intel.com>
[ Upstream commit 2652b99e43403dc464f3648483ffb38e48872fe4 ]
The E800 series hardware uses the same iAVF driver as older devices,
including the virtchnl negotiation scheme.
This negotiation scheme includes a mechanism to determine what type of RSS
should be supported, including RSS over PF virtchnl messages, RSS over
firmware AdminQ messages, and RSS via direct register access.
The PF driver will always prefer VIRTCHNL_VF_OFFLOAD_RSS_PF if its
supported by the VF driver. However, if an older VF driver is loaded, it
may request only VIRTCHNL_VF_OFFLOAD_RSS_REG or VIRTCHNL_VF_OFFLOAD_RSS_AQ.
The ice driver happily agrees to support these methods. Unfortunately, the
underlying hardware does not support these mechanisms. The E800 series VFs
don't have the appropriate registers for RSS_REG. The mailbox queue used by
VFs for VF to PF communication blocks messages which do not have the
VF-to-PF opcode.
Stop lying to the VF that it could support RSS over AdminQ or registers, as
these interfaces do not work when the hardware is operating on an E800
series device.
In practice this is unlikely to be hit by any normal user. The iAVF driver
has supported RSS over PF virtchnl commands since 2016, and always defaults
to using RSS_PF if possible.
In principle, nothing actually stops the existing VF from attempting to
access the registers or send an AQ command. However a properly coded VF
will check the capability flags and will report a more useful error if it
detects a case where the driver does not support the RSS offloads that it
does.
Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +--------
drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c | 2 --
2 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index e7ab78bb0f861..3a28210be3c23 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -440,7 +440,6 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vf->driver_caps = *(u32 *)msg;
else
vf->driver_caps = VIRTCHNL_VF_OFFLOAD_L2 |
- VIRTCHNL_VF_OFFLOAD_RSS_REG |
VIRTCHNL_VF_OFFLOAD_VLAN;
vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
@@ -453,14 +452,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
vfres->vf_cap_flags |= ice_vc_get_vlan_caps(hw, vf, vsi,
vf->driver_caps);
- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
+ if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
- } else {
- if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ)
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
- else
- vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
- }
if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC)
vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC;
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index 7d547fa616fa6..588b77f1a4bf6 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -13,8 +13,6 @@
* - opcodes needed by VF when caps are activated
*
* Caps that don't use new opcodes (no opcodes should be allowed):
- * - VIRTCHNL_VF_OFFLOAD_RSS_AQ
- * - VIRTCHNL_VF_OFFLOAD_RSS_REG
* - VIRTCHNL_VF_OFFLOAD_WB_ON_ITR
* - VIRTCHNL_VF_OFFLOAD_CRC
* - VIRTCHNL_VF_OFFLOAD_RX_POLLING
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 26/60] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (24 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 25/60] ice: virtchnl: stop pretending to support RSS over AQ or registers Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 27/60] igc: avoid returning frame twice in XDP_REDIRECT Sasha Levin
` (40 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Rand Deeb, Simon Horman, Tony Nguyen, Sasha Levin
From: Rand Deeb <rand.sec96@gmail.com>
[ Upstream commit 06e456a05d669ca30b224b8ed962421770c1496c ]
The function ice_bridge_setlink() may encounter a NULL pointer dereference
if nlmsg_find_attr() returns NULL and br_spec is dereferenced subsequently
in nla_for_each_nested(). To address this issue, add a check to ensure that
br_spec is not NULL before proceeding with the nested attribute iteration.
Fixes: b1edc14a3fbf ("ice: Implement ice_bridge_getlink and ice_bridge_setlink")
Signed-off-by: Rand Deeb <rand.sec96@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/ice/ice_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d8d2aa4c0216a..d23f2ebddeb45 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -7786,6 +7786,8 @@ ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
pf_sw = pf->first_sw;
/* find the attribute in the netlink message */
br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+ if (!br_spec)
+ return -EINVAL;
nla_for_each_nested(attr, br_spec, rem) {
__u16 mode;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 27/60] igc: avoid returning frame twice in XDP_REDIRECT
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (25 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 26/60] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 28/60] net/ipv6: avoid possible UAF in ip6_route_mpath_notify() Sasha Levin
` (39 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Florian Kauer, Maciej Fijalkowski, Naama Meir, Tony Nguyen,
Sasha Levin
From: Florian Kauer <florian.kauer@linutronix.de>
[ Upstream commit ef27f655b438bed4c83680e4f01e1cde2739854b ]
When a frame can not be transmitted in XDP_REDIRECT
(e.g. due to a full queue), it is necessary to free
it by calling xdp_return_frame_rx_napi.
However, this is the responsibility of the caller of
the ndo_xdp_xmit (see for example bq_xmit_all in
kernel/bpf/devmap.c) and thus calling it inside
igc_xdp_xmit (which is the ndo_xdp_xmit of the igc
driver) as well will lead to memory corruption.
In fact, bq_xmit_all expects that it can return all
frames after the last successfully transmitted one.
Therefore, break for the first not transmitted frame,
but do not call xdp_return_frame_rx_napi in igc_xdp_xmit.
This is equally implemented in other Intel drivers
such as the igb.
There are two alternatives to this that were rejected:
1. Return num_frames as all the frames would have been
transmitted and release them inside igc_xdp_xmit.
While it might work technically, it is not what
the return value is meant to represent (i.e. the
number of SUCCESSFULLY transmitted packets).
2. Rework kernel/bpf/devmap.c and all drivers to
support non-consecutively dropped packets.
Besides being complex, it likely has a negative
performance impact without a significant gain
since it is anyway unlikely that the next frame
can be transmitted if the previous one was dropped.
The memory corruption can be reproduced with
the following script which leads to a kernel panic
after a few seconds. It basically generates more
traffic than a i225 NIC can transmit and pushes it
via XDP_REDIRECT from a virtual interface to the
physical interface where frames get dropped.
#!/bin/bash
INTERFACE=enp4s0
INTERFACE_IDX=`cat /sys/class/net/$INTERFACE/ifindex`
sudo ip link add dev veth1 type veth peer name veth2
sudo ip link set up $INTERFACE
sudo ip link set up veth1
sudo ip link set up veth2
cat << EOF > redirect.bpf.c
SEC("prog")
int redirect(struct xdp_md *ctx)
{
return bpf_redirect($INTERFACE_IDX, 0);
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c redirect.bpf.c -o redirect.bpf.o
sudo ip link set veth2 xdp obj redirect.bpf.o
cat << EOF > pass.bpf.c
SEC("prog")
int pass(struct xdp_md *ctx)
{
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";
EOF
clang -O2 -g -Wall -target bpf -c pass.bpf.c -o pass.bpf.o
sudo ip link set $INTERFACE xdp obj pass.bpf.o
cat << EOF > trafgen.cfg
{
/* Ethernet Header */
0xe8, 0x6a, 0x64, 0x41, 0xbf, 0x46,
0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
const16(ETH_P_IP),
/* IPv4 Header */
0b01000101, 0, # IPv4 version, IHL, TOS
const16(1028), # IPv4 total length (UDP length + 20 bytes (IP header))
const16(2), # IPv4 ident
0b01000000, 0, # IPv4 flags, fragmentation off
64, # IPv4 TTL
17, # Protocol UDP
csumip(14, 33), # IPv4 checksum
/* UDP Header */
10, 0, 1, 1, # IP Src - adapt as needed
10, 0, 1, 2, # IP Dest - adapt as needed
const16(6666), # UDP Src Port
const16(6666), # UDP Dest Port
const16(1008), # UDP length (UDP header 8 bytes + payload length)
csumudp(14, 34), # UDP checksum
/* Payload */
fill('W', 1000),
}
EOF
sudo trafgen -i trafgen.cfg -b3000MB -o veth1 --cpp
Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action")
Signed-off-by: Florian Kauer <florian.kauer@linutronix.de>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/intel/igc/igc_main.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 98de34d0ce07e..e549ffca88e39 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -6489,7 +6489,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
int cpu = smp_processor_id();
struct netdev_queue *nq;
struct igc_ring *ring;
- int i, drops;
+ int i, nxmit;
if (unlikely(!netif_carrier_ok(dev)))
return -ENETDOWN;
@@ -6505,16 +6505,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
/* Avoid transmit queue timeout since we share it with the slow path */
txq_trans_cond_update(nq);
- drops = 0;
+ nxmit = 0;
for (i = 0; i < num_frames; i++) {
int err;
struct xdp_frame *xdpf = frames[i];
err = igc_xdp_init_tx_descriptor(ring, xdpf);
- if (err) {
- xdp_return_frame_rx_napi(xdpf);
- drops++;
- }
+ if (err)
+ break;
+ nxmit++;
}
if (flags & XDP_XMIT_FLUSH)
@@ -6522,7 +6521,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
__netif_tx_unlock(nq);
- return num_frames - drops;
+ return nxmit;
}
static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 28/60] net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (26 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 27/60] igc: avoid returning frame twice in XDP_REDIRECT Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 29/60] bpf: check bpf_func_state->callback_depth when pruning states Sasha Levin
` (38 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eric Dumazet, syzbot, David Ahern, Jakub Kicinski, Sasha Levin
From: Eric Dumazet <edumazet@google.com>
[ Upstream commit 685f7d531264599b3f167f1e94bbd22f120e5fab ]
syzbot found another use-after-free in ip6_route_mpath_notify() [1]
Commit f7225172f25a ("net/ipv6: prevent use after free in
ip6_route_mpath_notify") was not able to fix the root cause.
We need to defer the fib6_info_release() calls after
ip6_route_mpath_notify(), in the cleanup phase.
[1]
BUG: KASAN: slab-use-after-free in rt6_fill_node+0x1460/0x1ac0
Read of size 4 at addr ffff88809a07fc64 by task syz-executor.2/23037
CPU: 0 PID: 23037 Comm: syz-executor.2 Not tainted 6.8.0-rc4-syzkaller-01035-gea7f3cfaa588 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x167/0x540 mm/kasan/report.c:488
kasan_report+0x142/0x180 mm/kasan/report.c:601
rt6_fill_node+0x1460/0x1ac0
inet6_rt_notify+0x13b/0x290 net/ipv6/route.c:6184
ip6_route_mpath_notify net/ipv6/route.c:5198 [inline]
ip6_route_multipath_add net/ipv6/route.c:5404 [inline]
inet6_rtm_newroute+0x1d0f/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f73dd87dda9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f73de6550c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f73dd9ac050 RCX: 00007f73dd87dda9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f73dd8ca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f73dd9ac050 R15: 00007ffdbdeb7858
</TASK>
Allocated by task 23037:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:372 [inline]
__kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:389
kasan_kmalloc include/linux/kasan.h:211 [inline]
__do_kmalloc_node mm/slub.c:3981 [inline]
__kmalloc+0x22e/0x490 mm/slub.c:3994
kmalloc include/linux/slab.h:594 [inline]
kzalloc include/linux/slab.h:711 [inline]
fib6_info_alloc+0x2e/0xf0 net/ipv6/ip6_fib.c:155
ip6_route_info_create+0x445/0x12b0 net/ipv6/route.c:3758
ip6_route_multipath_add net/ipv6/route.c:5298 [inline]
inet6_rtm_newroute+0x744/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
Freed by task 16:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x4e/0x60 mm/kasan/generic.c:640
poison_slab_object+0xa6/0xe0 mm/kasan/common.c:241
__kasan_slab_free+0x34/0x70 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kfree+0x14a/0x380 mm/slub.c:4409
rcu_do_batch kernel/rcu/tree.c:2190 [inline]
rcu_core+0xd76/0x1810 kernel/rcu/tree.c:2465
__do_softirq+0x2bb/0x942 kernel/softirq.c:553
Last potentially related work creation:
kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47
__kasan_record_aux_stack+0xae/0x100 mm/kasan/generic.c:586
__call_rcu_common kernel/rcu/tree.c:2715 [inline]
call_rcu+0x167/0xa80 kernel/rcu/tree.c:2829
fib6_info_release include/net/ip6_fib.h:341 [inline]
ip6_route_multipath_add net/ipv6/route.c:5344 [inline]
inet6_rtm_newroute+0x114d/0x2300 net/ipv6/route.c:5517
rtnetlink_rcv_msg+0x885/0x1040 net/core/rtnetlink.c:6597
netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2543
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1367
netlink_sendmsg+0xa3b/0xd70 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
___sys_sendmsg net/socket.c:2638 [inline]
__sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
The buggy address belongs to the object at ffff88809a07fc00
which belongs to the cache kmalloc-512 of size 512
The buggy address is located 100 bytes inside of
freed 512-byte region [ffff88809a07fc00, ffff88809a07fe00)
The buggy address belongs to the physical page:
page:ffffea0002681f00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9a07c
head:ffffea0002681f00 order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888014c41c80 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 23028, tgid 23027 (syz-executor.4), ts 2340253595219, free_ts 2339107097036
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x255/0x680 mm/page_alloc.c:4567
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2190
allocate_slab mm/slub.c:2354 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2407
___slab_alloc+0xd17/0x13e0 mm/slub.c:3540
__slab_alloc mm/slub.c:3625 [inline]
__slab_alloc_node mm/slub.c:3678 [inline]
slab_alloc_node mm/slub.c:3850 [inline]
__do_kmalloc_node mm/slub.c:3980 [inline]
__kmalloc+0x2e0/0x490 mm/slub.c:3994
kmalloc include/linux/slab.h:594 [inline]
kzalloc include/linux/slab.h:711 [inline]
new_dir fs/proc/proc_sysctl.c:956 [inline]
get_subdir fs/proc/proc_sysctl.c:1000 [inline]
sysctl_mkdir_p fs/proc/proc_sysctl.c:1295 [inline]
__register_sysctl_table+0xb30/0x1440 fs/proc/proc_sysctl.c:1376
neigh_sysctl_register+0x416/0x500 net/core/neighbour.c:3859
devinet_sysctl_register+0xaf/0x1f0 net/ipv4/devinet.c:2644
inetdev_init+0x296/0x4d0 net/ipv4/devinet.c:286
inetdev_event+0x338/0x15c0 net/ipv4/devinet.c:1555
notifier_call_chain+0x18f/0x3b0 kernel/notifier.c:93
call_netdevice_notifiers_extack net/core/dev.c:1987 [inline]
call_netdevice_notifiers net/core/dev.c:2001 [inline]
register_netdevice+0x15b2/0x1a20 net/core/dev.c:10340
br_dev_newlink+0x27/0x100 net/bridge/br_netlink.c:1563
rtnl_newlink_create net/core/rtnetlink.c:3497 [inline]
__rtnl_newlink net/core/rtnetlink.c:3717 [inline]
rtnl_newlink+0x158f/0x20a0 net/core/rtnetlink.c:3730
page last free pid 11583 tgid 11583 stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1140 [inline]
free_unref_page_prepare+0x968/0xa90 mm/page_alloc.c:2346
free_unref_page+0x37/0x3f0 mm/page_alloc.c:2486
kasan_depopulate_vmalloc_pte+0x74/0x90 mm/kasan/shadow.c:415
apply_to_pte_range mm/memory.c:2619 [inline]
apply_to_pmd_range mm/memory.c:2663 [inline]
apply_to_pud_range mm/memory.c:2699 [inline]
apply_to_p4d_range mm/memory.c:2735 [inline]
__apply_to_page_range+0x8ec/0xe40 mm/memory.c:2769
kasan_release_vmalloc+0x9a/0xb0 mm/kasan/shadow.c:532
__purge_vmap_area_lazy+0x163f/0x1a10 mm/vmalloc.c:1770
drain_vmap_area_work+0x40/0xd0 mm/vmalloc.c:1804
process_one_work kernel/workqueue.c:2633 [inline]
process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
kthread+0x2ef/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
Memory state around the buggy address:
ffff88809a07fb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88809a07fb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88809a07fc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88809a07fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88809a07fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 3b1137fe7482 ("net: ipv6: Change notifications for multipath add to RTA_MULTIPATH")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20240303144801.702646-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/ipv6/route.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 56525b5b95a2b..236a45557ba18 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5332,19 +5332,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = NULL;
list_for_each_entry(nh, &rt6_nh_list, next) {
err = __ip6_ins_rt(nh->fib6_info, info, extack);
- fib6_info_release(nh->fib6_info);
-
- if (!err) {
- /* save reference to last route successfully inserted */
- rt_last = nh->fib6_info;
-
- /* save reference to first route for notification */
- if (!rt_notif)
- rt_notif = nh->fib6_info;
- }
- /* nh->fib6_info is used or freed at this point, reset to NULL*/
- nh->fib6_info = NULL;
if (err) {
if (replace && nhn)
NL_SET_ERR_MSG_MOD(extack,
@@ -5352,6 +5340,12 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
err_nh = nh;
goto add_errout;
}
+ /* save reference to last route successfully inserted */
+ rt_last = nh->fib6_info;
+
+ /* save reference to first route for notification */
+ if (!rt_notif)
+ rt_notif = nh->fib6_info;
/* Because each route is added like a single route we remove
* these flags after the first nexthop: if there is a collision,
@@ -5412,8 +5406,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
cleanup:
list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
- if (nh->fib6_info)
- fib6_info_release(nh->fib6_info);
+ fib6_info_release(nh->fib6_info);
list_del(&nh->next);
kfree(nh);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 29/60] bpf: check bpf_func_state->callback_depth when pruning states
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (27 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 28/60] net/ipv6: avoid possible UAF in ip6_route_mpath_notify() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 30/60] xdp, bonding: Fix feature flags when there are no slave devs anymore Sasha Levin
` (37 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Eduard Zingerman, Yonghong Song, Alexei Starovoitov, Sasha Levin
From: Eduard Zingerman <eddyz87@gmail.com>
[ Upstream commit e9a8e5a587ca55fec6c58e4881742705d45bee54 ]
When comparing current and cached states verifier should consider
bpf_func_state->callback_depth. Current state cannot be pruned against
cached state, when current states has more iterations left compared to
cached state. Current state has more iterations left when it's
callback_depth is smaller.
Below is an example illustrating this bug, minimized from mailing list
discussion [0] (assume that BPF_F_TEST_STATE_FREQ is set).
The example is not a safe program: if loop_cb point (1) is followed by
loop_cb point (2), then division by zero is possible at point (4).
struct ctx {
__u64 a;
__u64 b;
__u64 c;
};
static void loop_cb(int i, struct ctx *ctx)
{
/* assume that generated code is "fallthrough-first":
* if ... == 1 goto
* if ... == 2 goto
* <default>
*/
switch (bpf_get_prandom_u32()) {
case 1: /* 1 */ ctx->a = 42; return 0; break;
case 2: /* 2 */ ctx->b = 42; return 0; break;
default: /* 3 */ ctx->c = 42; return 0; break;
}
}
SEC("tc")
__failure
__flag(BPF_F_TEST_STATE_FREQ)
int test(struct __sk_buff *skb)
{
struct ctx ctx = { 7, 7, 7 };
bpf_loop(2, loop_cb, &ctx, 0); /* 0 */
/* assume generated checks are in-order: .a first */
if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
asm volatile("r0 /= 0;":::"r0"); /* 4 */
return 0;
}
Prior to this commit verifier built the following checkpoint tree for
this example:
.------------------------------------- Checkpoint / State name
| .-------------------------------- Code point number
| | .---------------------------- Stack state {ctx.a,ctx.b,ctx.c}
| | | .------------------- Callback depth in frame #0
v v v v
- (0) {7P,7P,7},depth=0
- (3) {7P,7P,7},depth=1
- (0) {7P,7P,42},depth=1
- (3) {7P,7,42},depth=2
- (0) {7P,7,42},depth=2 loop terminates because of depth limit
- (4) {7P,7,42},depth=0 predicted false, ctx.a marked precise
- (6) exit
(a) - (2) {7P,7,42},depth=2
- (0) {7P,42,42},depth=2 loop terminates because of depth limit
- (4) {7P,42,42},depth=0 predicted false, ctx.a marked precise
- (6) exit
(b) - (1) {7P,7P,42},depth=2
- (0) {42P,7P,42},depth=2 loop terminates because of depth limit
- (4) {42P,7P,42},depth=0 predicted false, ctx.{a,b} marked precise
- (6) exit
- (2) {7P,7,7},depth=1 considered safe, pruned using checkpoint (a)
(c) - (1) {7P,7P,7},depth=1 considered safe, pruned using checkpoint (b)
Here checkpoint (b) has callback_depth of 2, meaning that it would
never reach state {42,42,7}.
While checkpoint (c) has callback_depth of 1, and thus
could yet explore the state {42,42,7} if not pruned prematurely.
This commit makes forbids such premature pruning,
allowing verifier to explore states sub-tree starting at (c):
(c) - (1) {7,7,7P},depth=1
- (0) {42P,7,7P},depth=1
...
- (2) {42,7,7},depth=2
- (0) {42,42,7},depth=2 loop terminates because of depth limit
- (4) {42,42,7},depth=0 predicted true, ctx.{a,b,c} marked precise
- (5) division by zero
[0] https://lore.kernel.org/bpf/9b251840-7cb8-4d17-bd23-1fc8071d8eef@linux.dev/
Fixes: bb124da69c47 ("bpf: keep track of max number of bpf_loop callback iterations")
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240222154121.6991-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/verifier.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a7901ed358a0f..396c4c66932f2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -16238,6 +16238,9 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
{
int i;
+ if (old->callback_depth > cur->callback_depth)
+ return false;
+
for (i = 0; i < MAX_BPF_REG; i++)
if (!regsafe(env, &old->regs[i], &cur->regs[i],
&env->idmap_scratch, exact))
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 30/60] xdp, bonding: Fix feature flags when there are no slave devs anymore
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (28 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 29/60] bpf: check bpf_func_state->callback_depth when pruning states Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 31/60] selftests/bpf: Fix up xdp bonding test wrt feature flags Sasha Levin
` (36 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Daniel Borkmann, Magnus Karlsson, Prashant Batra,
Toke Høiland-Jørgensen, Jakub Kicinski,
Alexei Starovoitov, Sasha Levin
From: Daniel Borkmann <daniel@iogearbox.net>
[ Upstream commit f267f262815033452195f46c43b572159262f533 ]
Commit 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
changed the driver from reporting everything as supported before a device
was bonded into having the driver report that no XDP feature is supported
until a real device is bonded as it seems to be more truthful given
eventually real underlying devices decide what XDP features are supported.
The change however did not take into account when all slave devices get
removed from the bond device. In this case after 9b0ed890ac2a, the driver
keeps reporting a feature mask of 0x77, that is, NETDEV_XDP_ACT_MASK &
~NETDEV_XDP_ACT_XSK_ZEROCOPY whereas it should have reported a feature
mask of 0.
Fix it by resetting XDP feature flags in the same way as if no XDP program
is attached to the bond device. This was uncovered by the XDP bond selftest
which let BPF CI fail. After adjusting the starting masks on the latter
to 0 instead of NETDEV_XDP_ACT_MASK the test passes again together with
this fix.
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: Prashant Batra <prbatra.mail@gmail.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-1-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/bonding/bond_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6cf7f364704e8..b094c48bebc30 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1811,7 +1811,7 @@ void bond_xdp_set_features(struct net_device *bond_dev)
ASSERT_RTNL();
- if (!bond_xdp_check(bond)) {
+ if (!bond_xdp_check(bond) || !bond_has_slaves(bond)) {
xdp_clear_features_flag(bond_dev);
return;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 31/60] selftests/bpf: Fix up xdp bonding test wrt feature flags
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (29 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 30/60] xdp, bonding: Fix feature flags when there are no slave devs anymore Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 32/60] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program Sasha Levin
` (35 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov, Sasha Levin
From: Daniel Borkmann <daniel@iogearbox.net>
[ Upstream commit 0bfc0336e1348883fdab4689f0c8c56458f36dd8 ]
Adjust the XDP feature flags for the bond device when no bond slave
devices are attached. After 9b0ed890ac2a ("bonding: do not report
NETDEV_XDP_ACT_XSK_ZEROCOPY"), the empty bond device must report 0
as flags instead of NETDEV_XDP_ACT_MASK.
# ./vmtest.sh -- ./test_progs -t xdp_bond
[...]
[ 3.983311] bond1 (unregistering): (slave veth1_1): Releasing backup interface
[ 3.995434] bond1 (unregistering): Released all slaves
[ 4.022311] bond2: (slave veth2_1): Releasing backup interface
#507/1 xdp_bonding/xdp_bonding_attach:OK
#507/2 xdp_bonding/xdp_bonding_nested:OK
#507/3 xdp_bonding/xdp_bonding_features:OK
#507/4 xdp_bonding/xdp_bonding_roundrobin:OK
#507/5 xdp_bonding/xdp_bonding_activebackup:OK
#507/6 xdp_bonding/xdp_bonding_xor_layer2:OK
#507/7 xdp_bonding/xdp_bonding_xor_layer23:OK
#507/8 xdp_bonding/xdp_bonding_xor_layer34:OK
#507/9 xdp_bonding/xdp_bonding_redirect_multi:OK
#507 xdp_bonding:OK
Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED
[ 4.185255] bond2 (unregistering): Released all slaves
[...]
Fixes: 9b0ed890ac2a ("bonding: do not report NETDEV_XDP_ACT_XSK_ZEROCOPY")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Message-ID: <20240305090829.17131-2-daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/bpf/prog_tests/xdp_bonding.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
index c3b45745cbccd..6d8b54124cb35 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -511,7 +511,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
if (!ASSERT_OK(err, "bond bpf_xdp_query"))
goto out;
- if (!ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+ if (!ASSERT_EQ(query_opts.feature_flags, 0,
"bond query_opts.feature_flags"))
goto out;
@@ -601,7 +601,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
if (!ASSERT_OK(err, "bond bpf_xdp_query"))
goto out;
- ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+ ASSERT_EQ(query_opts.feature_flags, 0,
"bond query_opts.feature_flags");
out:
bpf_link__destroy(link);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 32/60] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (30 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 31/60] selftests/bpf: Fix up xdp bonding test wrt feature flags Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 33/60] net: dsa: microchip: fix register write order in ksz8_ind_write8() Sasha Levin
` (34 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Toke Høiland-Jørgensen, Tobias Böhm,
Martin KaFai Lau, Sasha Levin
From: Toke Høiland-Jørgensen <toke@redhat.com>
[ Upstream commit 2487007aa3b9fafbd2cb14068f49791ce1d7ede5 ]
When running an XDP program that is attached to a cpumap entry, we don't
initialise the xdp_rxq_info data structure being used in the xdp_buff
that backs the XDP program invocation. Tobias noticed that this leads to
random values being returned as the xdp_md->rx_queue_index value for XDP
programs running in a cpumap.
This means we're basically returning the contents of the uninitialised
memory, which is bad. Fix this by zero-initialising the rxq data
structure before running the XDP program.
Fixes: 9216477449f3 ("bpf: cpumap: Add the possibility to attach an eBPF program to cpumap")
Reported-by: Tobias Böhm <tobias@aibor.de>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20240305213132.11955-1-toke@redhat.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/bpf/cpumap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index e42a1bdb7f536..9140b8bff9c04 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -178,7 +178,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
void **frames, int n,
struct xdp_cpumap_stats *stats)
{
- struct xdp_rxq_info rxq;
+ struct xdp_rxq_info rxq = {};
struct xdp_buff xdp;
int i, nframes = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 33/60] net: dsa: microchip: fix register write order in ksz8_ind_write8()
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (31 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 32/60] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 34/60] net/rds: fix WARNING in rds_conn_connect_if_down Sasha Levin
` (33 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Tobias Jakobi (Compleo), Oleksij Rempel, Jakub Kicinski,
Sasha Levin
From: "Tobias Jakobi (Compleo)" <tobias.jakobi.compleo@gmail.com>
[ Upstream commit b7fb7729c94fb2d23c79ff44f7a2da089c92d81c ]
This bug was noticed while re-implementing parts of the kernel
driver in userspace using spidev. The goal was to enable some
of the errata workarounds that Microchip describes in their
errata sheet [1].
Both the errata sheet and the regular datasheet of e.g. the KSZ8795
imply that you need to do this for indirect register accesses:
- write a 16-bit value to a control register pair (this value
consists of the indirect register table, and the offset inside
the table)
- either read or write an 8-bit value from the data storage
register (indicated by REG_IND_BYTE in the kernel)
The current implementation has the order swapped. It can be
proven, by reading back some indirect register with known content
(the EEE register modified in ksz8_handle_global_errata() is one of
these), that this implementation does not work.
Private discussion with Oleksij Rempel of Pengutronix has revealed
that the workaround was apparantly never tested on actual hardware.
[1] https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/KSZ87xx-Errata-DS80000687C.pdf
Signed-off-by: Tobias Jakobi (Compleo) <tobias.jakobi.compleo@gmail.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Fixes: 7b6e6235b664 ("net: dsa: microchip: ksz8795: handle eee specif erratum")
Link: https://lore.kernel.org/r/20240304154135.161332-1-tobias.jakobi.compleo@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/dsa/microchip/ksz8795.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/microchip/ksz8795.c b/drivers/net/dsa/microchip/ksz8795.c
index 91aba470fb2fa..28d7ada3ec067 100644
--- a/drivers/net/dsa/microchip/ksz8795.c
+++ b/drivers/net/dsa/microchip/ksz8795.c
@@ -49,9 +49,9 @@ static int ksz8_ind_write8(struct ksz_device *dev, u8 table, u16 addr, u8 data)
mutex_lock(&dev->alu_mutex);
ctrl_addr = IND_ACC_TABLE(table) | addr;
- ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
+ ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
if (!ret)
- ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
+ ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
mutex_unlock(&dev->alu_mutex);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 34/60] net/rds: fix WARNING in rds_conn_connect_if_down
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (32 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 33/60] net: dsa: microchip: fix register write order in ksz8_ind_write8() Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 35/60] netfilter: nft_ct: fix l3num expectations with inet pseudo family Sasha Levin
` (32 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Edward Adam Davis, syzbot+d4faee732755bba9838e, David S . Miller,
Sasha Levin
From: Edward Adam Davis <eadavis@qq.com>
[ Upstream commit c055fc00c07be1f0df7375ab0036cebd1106ed38 ]
If connection isn't established yet, get_mr() will fail, trigger connection after
get_mr().
Fixes: 584a8279a44a ("RDS: RDMA: return appropriate error on rdma map failures")
Reported-and-tested-by: syzbot+d4faee732755bba9838e@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/rds/rdma.c | 3 +++
net/rds/send.c | 6 +-----
2 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index fba82d36593ad..a4e3c5de998be 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -301,6 +301,9 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
kfree(sg);
}
ret = PTR_ERR(trans_private);
+ /* Trigger connection so that its ready for the next retry */
+ if (ret == -ENODEV)
+ rds_conn_connect_if_down(cp->cp_conn);
goto out;
}
diff --git a/net/rds/send.c b/net/rds/send.c
index 5e57a1581dc60..2899def23865f 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1313,12 +1313,8 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
/* Parse any control messages the user may have included. */
ret = rds_cmsg_send(rs, rm, msg, &allocated_mr, &vct);
- if (ret) {
- /* Trigger connection so that its ready for the next retry */
- if (ret == -EAGAIN)
- rds_conn_connect_if_down(conn);
+ if (ret)
goto out;
- }
if (rm->rdma.op_active && !conn->c_trans->xmit_rdma) {
printk_ratelimited(KERN_NOTICE "rdma_op %p conn xmit_rdma %p\n",
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 35/60] netfilter: nft_ct: fix l3num expectations with inet pseudo family
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (33 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 34/60] net/rds: fix WARNING in rds_conn_connect_if_down Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 36/60] netfilter: nf_conntrack_h323: Add protection for bmp length out of range Sasha Levin
` (31 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Florian Westphal, Pablo Neira Ayuso, Sasha Levin
From: Florian Westphal <fw@strlen.de>
[ Upstream commit 99993789966a6eb4f1295193dc543686899892d3 ]
Following is rejected but should be allowed:
table inet t {
ct expectation exp1 {
[..]
l3proto ip
Valid combos are:
table ip t, l3proto ip
table ip6 t, l3proto ip6
table inet t, l3proto ip OR l3proto ip6
Disallow inet pseudeo family, the l3num must be a on-wire protocol known
to conntrack.
Retain NFPROTO_INET case to make it clear its rejected
intentionally rather as oversight.
Fixes: 8059918a1377 ("netfilter: nft_ct: sanitize layer 3 and 4 protocol number in custom expectations")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nft_ct.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index bfd3e5a14dab6..255640013ab84 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -1256,14 +1256,13 @@ static int nft_ct_expect_obj_init(const struct nft_ctx *ctx,
switch (priv->l3num) {
case NFPROTO_IPV4:
case NFPROTO_IPV6:
- if (priv->l3num != ctx->family)
- return -EINVAL;
+ if (priv->l3num == ctx->family || ctx->family == NFPROTO_INET)
+ break;
- fallthrough;
- case NFPROTO_INET:
- break;
+ return -EINVAL;
+ case NFPROTO_INET: /* tuple.src.l3num supports NFPROTO_IPV4/6 only */
default:
- return -EOPNOTSUPP;
+ return -EAFNOSUPPORT;
}
priv->l4proto = nla_get_u8(tb[NFTA_CT_EXPECT_L4PROTO]);
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 36/60] netfilter: nf_conntrack_h323: Add protection for bmp length out of range
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (34 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 35/60] netfilter: nft_ct: fix l3num expectations with inet pseudo family Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 37/60] erofs: apply proper VMA alignment for memory mapped files on THP Sasha Levin
` (30 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Lena Wang, Pablo Neira Ayuso, Sasha Levin
From: Lena Wang <lena.wang@mediatek.com>
[ Upstream commit 767146637efc528b5e3d31297df115e85a2fd362 ]
UBSAN load reports an exception of BRK#5515 SHIFT_ISSUE:Bitwise shifts
that are out of bounds for their data type.
vmlinux get_bitmap(b=75) + 712
<net/netfilter/nf_conntrack_h323_asn1.c:0>
vmlinux decode_seq(bs=0xFFFFFFD008037000, f=0xFFFFFFD008037018, level=134443100) + 1956
<net/netfilter/nf_conntrack_h323_asn1.c:592>
vmlinux decode_choice(base=0xFFFFFFD0080370F0, level=23843636) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux decode_seq(f=0xFFFFFFD0080371A8, level=134443500) + 812
<net/netfilter/nf_conntrack_h323_asn1.c:576>
vmlinux decode_choice(base=0xFFFFFFD008037280, level=0) + 1216
<net/netfilter/nf_conntrack_h323_asn1.c:814>
vmlinux DecodeRasMessage() + 304
<net/netfilter/nf_conntrack_h323_asn1.c:833>
vmlinux ras_help() + 684
<net/netfilter/nf_conntrack_h323_main.c:1728>
vmlinux nf_confirm() + 188
<net/netfilter/nf_conntrack_proto.c:137>
Due to abnormal data in skb->data, the extension bitmap length
exceeds 32 when decoding ras message then uses the length to make
a shift operation. It will change into negative after several loop.
UBSAN load could detect a negative shift as an undefined behaviour
and reports exception.
So we add the protection to avoid the length exceeding 32. Or else
it will return out of range error and stop decoding.
Fixes: 5e35941d9901 ("[NETFILTER]: Add H.323 conntrack/NAT helper")
Signed-off-by: Lena Wang <lena.wang@mediatek.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netfilter/nf_conntrack_h323_asn1.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index e697a824b0018..540d97715bd23 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -533,6 +533,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
/* Get fields bitmap */
if (nf_h323_error_boundary(bs, 0, f->sz))
return H323_ERROR_BOUND;
+ if (f->sz > 32)
+ return H323_ERROR_RANGE;
bmp = get_bitmap(bs, f->sz);
if (base)
*(unsigned int *)base = bmp;
@@ -589,6 +591,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
bmp2_len = get_bits(bs, 7) + 1;
if (nf_h323_error_boundary(bs, 0, bmp2_len))
return H323_ERROR_BOUND;
+ if (bmp2_len > 32)
+ return H323_ERROR_RANGE;
bmp2 = get_bitmap(bs, bmp2_len);
bmp |= bmp2 >> f->sz;
if (base)
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 37/60] erofs: apply proper VMA alignment for memory mapped files on THP
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (35 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 36/60] netfilter: nf_conntrack_h323: Add protection for bmp length out of range Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 38/60] netrom: Fix a data-race around sysctl_netrom_default_path_quality Sasha Levin
` (29 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Gao Xiang, Jingbo Xu, Chao Yu, Sasha Levin
From: Gao Xiang <hsiangkao@linux.alibaba.com>
[ Upstream commit 4127caee89612a84adedd78c9453089138cd5afe ]
There are mainly two reasons that thp_get_unmapped_area() should be
used for EROFS as other filesystems:
- It's needed to enable PMD mappings as a FSDAX filesystem, see
commit 74d2fad1334d ("thp, dax: add thp_get_unmapped_area for pmd
mappings");
- It's useful together with large folios and
CONFIG_READ_ONLY_THP_FOR_FS which enable THPs for mmapped files
(e.g. shared libraries) even without FSDAX. See commit 1854bc6e2420
("mm/readahead: Align file mappings for non-DAX").
Fixes: 06252e9ce05b ("erofs: dax support for non-tailpacking regular file")
Fixes: ce529cc25b18 ("erofs: enable large folios for iomap mode")
Fixes: e6687b89225e ("erofs: enable large folios for fscache mode")
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240306053138.2240206-1-hsiangkao@linux.alibaba.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/erofs/data.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 0c2c99c58b5e3..977bc23f96e47 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -448,5 +448,6 @@ const struct file_operations erofs_file_fops = {
.llseek = generic_file_llseek,
.read_iter = erofs_file_read_iter,
.mmap = erofs_file_mmap,
+ .get_unmapped_area = thp_get_unmapped_area,
.splice_read = filemap_splice_read,
};
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 38/60] netrom: Fix a data-race around sysctl_netrom_default_path_quality
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (36 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 37/60] erofs: apply proper VMA alignment for memory mapped files on THP Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 39/60] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser Sasha Levin
` (28 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit 958d6145a6d9ba9e075c921aead8753fb91c9101 ]
We need to protect the reader reading sysctl_netrom_default_path_quality
because the value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index baea3cbd76ca5..6f709fdffc11f 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -153,7 +153,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
nr_neigh->digipeat = NULL;
nr_neigh->ax25 = NULL;
nr_neigh->dev = dev;
- nr_neigh->quality = sysctl_netrom_default_path_quality;
+ nr_neigh->quality = READ_ONCE(sysctl_netrom_default_path_quality);
nr_neigh->locked = 0;
nr_neigh->count = 0;
nr_neigh->number = nr_neigh_no++;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 39/60] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (37 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 38/60] netrom: Fix a data-race around sysctl_netrom_default_path_quality Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 40/60] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser Sasha Levin
` (27 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit cfd9f4a740f772298308b2e6070d2c744fb5cf79 ]
We need to protect the reader reading the sysctl value
because the value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 6f709fdffc11f..b8ddd8048f352 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -766,7 +766,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
if (ax25 != NULL) {
ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
ax25->ax25_dev->dev, 0,
- sysctl_netrom_obsolescence_count_initialiser);
+ READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
if (ret)
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 40/60] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (38 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 39/60] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 41/60] netrom: Fix a data-race around sysctl_netrom_transport_timeout Sasha Levin
` (26 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit 119cae5ea3f9e35cdada8e572cc067f072fa825a ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_subr.c | 5 +++--
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/netrom/nr_dev.c b/net/netrom/nr_dev.c
index 3aaac4a22b387..2c34389c3ce6f 100644
--- a/net/netrom/nr_dev.c
+++ b/net/netrom/nr_dev.c
@@ -81,7 +81,7 @@ static int nr_header(struct sk_buff *skb, struct net_device *dev,
buff[6] |= AX25_SSSID_SPARE;
buff += AX25_ADDR_LEN;
- *buff++ = sysctl_netrom_network_ttl_initialiser;
+ *buff++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
*buff++ = NR_PROTO_IP;
*buff++ = NR_PROTO_IP;
diff --git a/net/netrom/nr_out.c b/net/netrom/nr_out.c
index 44929657f5b71..5e531394a724b 100644
--- a/net/netrom/nr_out.c
+++ b/net/netrom/nr_out.c
@@ -204,7 +204,7 @@ void nr_transmit_buffer(struct sock *sk, struct sk_buff *skb)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;
- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
if (!nr_route_frame(skb, NULL)) {
kfree_skb(skb);
diff --git a/net/netrom/nr_subr.c b/net/netrom/nr_subr.c
index e2d2af924cff4..c3bbd5880850b 100644
--- a/net/netrom/nr_subr.c
+++ b/net/netrom/nr_subr.c
@@ -182,7 +182,8 @@ void nr_write_internal(struct sock *sk, int frametype)
*dptr++ = nr->my_id;
*dptr++ = frametype;
*dptr++ = nr->window;
- if (nr->bpqext) *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ if (nr->bpqext)
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
break;
case NR_DISCREQ:
@@ -236,7 +237,7 @@ void __nr_transmit_reply(struct sk_buff *skb, int mine, unsigned char cmdflags)
dptr[6] |= AX25_SSSID_SPARE;
dptr += AX25_ADDR_LEN;
- *dptr++ = sysctl_netrom_network_ttl_initialiser;
+ *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
if (mine) {
*dptr++ = 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 41/60] netrom: Fix a data-race around sysctl_netrom_transport_timeout
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (39 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 40/60] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 42/60] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries Sasha Levin
` (25 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit 60a7a152abd494ed4f69098cf0f322e6bb140612 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 96e91ab71573c..d8a25f614c500 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -453,7 +453,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr_init_timers(sk);
nr->t1 =
- msecs_to_jiffies(sysctl_netrom_transport_timeout);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
nr->t2 =
msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
nr->n2 =
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 42/60] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (40 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 41/60] netrom: Fix a data-race around sysctl_netrom_transport_timeout Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 43/60] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay Sasha Levin
` (24 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit e799299aafed417cc1f32adccb2a0e5268b3f6d5 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index d8a25f614c500..a3b02f090bba2 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -457,7 +457,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t2 =
msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
nr->n2 =
- msecs_to_jiffies(sysctl_netrom_transport_maximum_tries);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
nr->idle =
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 43/60] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (41 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 42/60] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 44/60] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay Sasha Levin
` (23 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit 806f462ba9029d41aadf8ec93f2f99c5305deada ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index a3b02f090bba2..474044072baf9 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -455,7 +455,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t1 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
nr->t2 =
- msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_acknowledge_delay));
nr->n2 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 44/60] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (42 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 43/60] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 45/60] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size Sasha Levin
` (22 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit 43547d8699439a67b78d6bb39015113f7aa360fd ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 474044072baf9..3aa52187cf982 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -459,7 +459,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->n2 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
nr->t4 =
- msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
nr->window = sysctl_netrom_transport_requested_window_size;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 45/60] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (43 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 44/60] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 46/60] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout Sasha Levin
` (21 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit a2e706841488f474c06e9b33f71afc947fb3bf56 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 3aa52187cf982..851c3a625768d 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -462,7 +462,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
- nr->window = sysctl_netrom_transport_requested_window_size;
+ nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);
nr->bpqext = 1;
nr->state = NR_STATE_0;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 46/60] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (44 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 45/60] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 47/60] netrom: Fix a data-race around sysctl_netrom_routing_control Sasha Levin
` (20 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit f99b494b40431f0ca416859f2345746199398e2b ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 851c3a625768d..8c69f07651258 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -461,7 +461,7 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
nr->t4 =
msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
nr->idle =
- msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
+ msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_no_activity_timeout));
nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);
nr->bpqext = 1;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 47/60] netrom: Fix a data-race around sysctl_netrom_routing_control
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (45 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 46/60] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 48/60] netrom: Fix a data-race around sysctl_netrom_link_fails_count Sasha Levin
` (19 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit b5dffcb8f71bdd02a4e5799985b51b12f4eeaf76 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b8ddd8048f352..89e12e6eea2ef 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -780,7 +780,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
return ret;
}
- if (!sysctl_netrom_routing_control && ax25 != NULL)
+ if (!READ_ONCE(sysctl_netrom_routing_control) && ax25 != NULL)
return 0;
/* Its Time-To-Live has expired */
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 48/60] netrom: Fix a data-race around sysctl_netrom_link_fails_count
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (46 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 47/60] netrom: Fix a data-race around sysctl_netrom_routing_control Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 49/60] netrom: Fix data-races around sysctl_net_busy_read Sasha Levin
` (18 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit bc76645ebdd01be9b9994dac39685a3d0f6f7985 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/nr_route.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index 89e12e6eea2ef..70480869ad1c5 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -728,7 +728,7 @@ void nr_link_failed(ax25_cb *ax25, int reason)
nr_neigh->ax25 = NULL;
ax25_cb_put(ax25);
- if (++nr_neigh->failed < sysctl_netrom_link_fails_count) {
+ if (++nr_neigh->failed < READ_ONCE(sysctl_netrom_link_fails_count)) {
nr_neigh_put(nr_neigh);
return;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 49/60] netrom: Fix data-races around sysctl_net_busy_read
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (47 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 48/60] netrom: Fix a data-race around sysctl_netrom_link_fails_count Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 50/60] net: pds_core: Fix possible double free in error handling path Sasha Levin
` (17 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jason Xing, Paolo Abeni, Sasha Levin
From: Jason Xing <kernelxing@tencent.com>
[ Upstream commit d380ce70058a4ccddc3e5f5c2063165dc07672c6 ]
We need to protect the reader reading the sysctl value because the
value can be changed concurrently.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/netrom/af_netrom.c | 2 +-
net/netrom/nr_in.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 8c69f07651258..f26dee48e03af 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -954,7 +954,7 @@ int nr_rx_frame(struct sk_buff *skb, struct net_device *dev)
* G8PZT's Xrouter which is sending packets with command type 7
* as an extension of the protocol.
*/
- if (sysctl_netrom_reset_circuit &&
+ if (READ_ONCE(sysctl_netrom_reset_circuit) &&
(frametype != NR_RESET || flags != 0))
nr_transmit_reset(skb, 1);
diff --git a/net/netrom/nr_in.c b/net/netrom/nr_in.c
index 2f084b6f69d7e..97944db6b5ac6 100644
--- a/net/netrom/nr_in.c
+++ b/net/netrom/nr_in.c
@@ -97,7 +97,7 @@ static int nr_state1_machine(struct sock *sk, struct sk_buff *skb,
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
@@ -128,7 +128,7 @@ static int nr_state2_machine(struct sock *sk, struct sk_buff *skb,
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
@@ -262,7 +262,7 @@ static int nr_state3_machine(struct sock *sk, struct sk_buff *skb, int frametype
break;
case NR_RESET:
- if (sysctl_netrom_reset_circuit)
+ if (READ_ONCE(sysctl_netrom_reset_circuit))
nr_disconnect(sk, ECONNRESET);
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 50/60] net: pds_core: Fix possible double free in error handling path
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (48 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 49/60] netrom: Fix data-races around sysctl_net_busy_read Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 51/60] KVM: s390: add stat counter for shadow gmap events Sasha Levin
` (16 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Yongzhi Liu, Wojciech Drewek, Shannon Nelson, Paolo Abeni,
Sasha Levin
From: Yongzhi Liu <hyperlyzcs@gmail.com>
[ Upstream commit ba18deddd6d502da71fd6b6143c53042271b82bd ]
When auxiliary_device_add() returns error and then calls
auxiliary_device_uninit(), Callback function pdsc_auxbus_dev_release
calls kfree(padev) to free memory. We shouldn't call kfree(padev)
again in the error handling path.
Fix this by cleaning up the redundant kfree() and putting
the error handling back to where the errors happened.
Fixes: 4569cce43bc6 ("pds_core: add auxiliary_bus devices")
Signed-off-by: Yongzhi Liu <hyperlyzcs@gmail.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20240306105714.20597-1-hyperlyzcs@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/net/ethernet/amd/pds_core/auxbus.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/amd/pds_core/auxbus.c b/drivers/net/ethernet/amd/pds_core/auxbus.c
index 11c23a7f3172d..fd1a5149c0031 100644
--- a/drivers/net/ethernet/amd/pds_core/auxbus.c
+++ b/drivers/net/ethernet/amd/pds_core/auxbus.c
@@ -160,23 +160,19 @@ static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *cf,
if (err < 0) {
dev_warn(cf->dev, "auxiliary_device_init of %s failed: %pe\n",
name, ERR_PTR(err));
- goto err_out;
+ kfree(padev);
+ return ERR_PTR(err);
}
err = auxiliary_device_add(aux_dev);
if (err) {
dev_warn(cf->dev, "auxiliary_device_add of %s failed: %pe\n",
name, ERR_PTR(err));
- goto err_out_uninit;
+ auxiliary_device_uninit(aux_dev);
+ return ERR_PTR(err);
}
return padev;
-
-err_out_uninit:
- auxiliary_device_uninit(aux_dev);
-err_out:
- kfree(padev);
- return ERR_PTR(err);
}
int pdsc_auxbus_dev_del(struct pdsc *cf, struct pdsc *pf)
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 51/60] KVM: s390: add stat counter for shadow gmap events
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (49 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 50/60] net: pds_core: Fix possible double free in error handling path Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 52/60] KVM: s390: vsie: fix race during shadow creation Sasha Levin
` (15 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Nico Boehr, David Hildenbrand, Claudio Imbrenda, Janosch Frank,
Sasha Levin
From: Nico Boehr <nrb@linux.ibm.com>
[ Upstream commit c3235e2dd6956448a562d6b1112205eeebc8ab43 ]
The shadow gmap tracks memory of nested guests (guest-3). In certain
scenarios, the shadow gmap needs to be rebuilt, which is a costly operation
since it involves a SIE exit into guest-1 for every entry in the respective
shadow level.
Add kvm stat counters when new shadow structures are created at various
levels. Also add a counter gmap_shadow_create when a completely fresh
shadow gmap is created as well as a counter gmap_shadow_reuse when an
existing gmap is being reused.
Note that when several levels are shadowed at once, counters on all
affected levels will be increased.
Also note that not all page table levels need to be present and a ASCE
can directly point to e.g. a segment table. In this case, a new segment
table will always be equivalent to a new shadow gmap and hence will be
counted as gmap_shadow_create and not as gmap_shadow_segment.
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20231009093304.2555344-2-nrb@linux.ibm.com
Message-Id: <20231009093304.2555344-2-nrb@linux.ibm.com>
Stable-dep-of: fe752331d4b3 ("KVM: s390: vsie: fix race during shadow creation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/s390/include/asm/kvm_host.h | 7 +++++++
arch/s390/kvm/gaccess.c | 7 +++++++
arch/s390/kvm/kvm-s390.c | 9 ++++++++-
arch/s390/kvm/vsie.c | 5 ++++-
4 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 427f9528a7b69..67a298b6cf6e9 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -777,6 +777,13 @@ struct kvm_vm_stat {
u64 inject_service_signal;
u64 inject_virtio;
u64 aen_forward;
+ u64 gmap_shadow_create;
+ u64 gmap_shadow_reuse;
+ u64 gmap_shadow_r1_entry;
+ u64 gmap_shadow_r2_entry;
+ u64 gmap_shadow_r3_entry;
+ u64 gmap_shadow_sg_entry;
+ u64 gmap_shadow_pg_entry;
};
struct kvm_arch_memory_slot {
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 6d6bc19b37dcb..ff8349d17b331 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -1382,6 +1382,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
unsigned long *pgt, int *dat_protection,
int *fake)
{
+ struct kvm *kvm;
struct gmap *parent;
union asce asce;
union vaddress vaddr;
@@ -1390,6 +1391,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
*fake = 0;
*dat_protection = 0;
+ kvm = sg->private;
parent = sg->parent;
vaddr.addr = saddr;
asce.val = sg->orig_asce;
@@ -1450,6 +1452,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_r2t(sg, saddr, rfte.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r1_entry++;
}
fallthrough;
case ASCE_TYPE_REGION2: {
@@ -1478,6 +1481,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_r3t(sg, saddr, rste.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r2_entry++;
}
fallthrough;
case ASCE_TYPE_REGION3: {
@@ -1515,6 +1519,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_sgt(sg, saddr, rtte.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_r3_entry++;
}
fallthrough;
case ASCE_TYPE_SEGMENT: {
@@ -1548,6 +1553,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rc = gmap_shadow_pgt(sg, saddr, ste.val, *fake);
if (rc)
return rc;
+ kvm->stat.gmap_shadow_sg_entry++;
}
}
/* Return the parent address of the page table */
@@ -1618,6 +1624,7 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
pte.p |= dat_protection;
if (!rc)
rc = gmap_shadow_page(sg, saddr, __pte(pte.val));
+ vcpu->kvm->stat.gmap_shadow_pg_entry++;
ipte_unlock(vcpu->kvm);
mmap_read_unlock(sg->mm);
return rc;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 49cce436444e0..1af55343a606b 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -66,7 +66,14 @@ const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
STATS_DESC_COUNTER(VM, inject_virtio),
- STATS_DESC_COUNTER(VM, aen_forward)
+ STATS_DESC_COUNTER(VM, aen_forward),
+ STATS_DESC_COUNTER(VM, gmap_shadow_reuse),
+ STATS_DESC_COUNTER(VM, gmap_shadow_create),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r1_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r2_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_r3_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_sg_entry),
+ STATS_DESC_COUNTER(VM, gmap_shadow_pg_entry),
};
const struct kvm_stats_header kvm_vm_stats_header = {
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index e55f489e1fb79..8207a892bbe22 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1210,8 +1210,10 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
* we're holding has been unshadowed. If the gmap is still valid,
* we can safely reuse it.
*/
- if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat))
+ if (vsie_page->gmap && gmap_shadow_valid(vsie_page->gmap, asce, edat)) {
+ vcpu->kvm->stat.gmap_shadow_reuse++;
return 0;
+ }
/* release the old shadow - if any, and mark the prefix as unmapped */
release_gmap_shadow(vsie_page);
@@ -1219,6 +1221,7 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;
+ vcpu->kvm->stat.gmap_shadow_create++;
WRITE_ONCE(vsie_page->gmap, gmap);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 52/60] KVM: s390: vsie: fix race during shadow creation
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (50 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 51/60] KVM: s390: add stat counter for shadow gmap events Sasha Levin
@ 2024-03-13 16:36 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 53/60] readahead: avoid multiple marked readahead pages Sasha Levin
` (14 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:36 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Christian Borntraeger, Marc Hartmayer, David Hildenbrand,
Janosch Frank, Claudio Imbrenda, Sasha Levin
From: Christian Borntraeger <borntraeger@linux.ibm.com>
[ Upstream commit fe752331d4b361d43cfd0b89534b4b2176057c32 ]
Right now it is possible to see gmap->private being zero in
kvm_s390_vsie_gmap_notifier resulting in a crash. This is due to the
fact that we add gmap->private == kvm after creation:
static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
struct vsie_page *vsie_page)
{
[...]
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
gmap->private = vcpu->kvm;
Let children inherit the private field of the parent.
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization")
Cc: <stable@vger.kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20231220125317.4258-1-borntraeger@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/s390/kvm/vsie.c | 1 -
arch/s390/mm/gmap.c | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index 8207a892bbe22..db9a180de65f1 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -1220,7 +1220,6 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
if (IS_ERR(gmap))
return PTR_ERR(gmap);
- gmap->private = vcpu->kvm;
vcpu->kvm->stat.gmap_shadow_create++;
WRITE_ONCE(vsie_page->gmap, gmap);
return 0;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 20786f6883b29..157e0a8d5157d 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -1691,6 +1691,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
return ERR_PTR(-ENOMEM);
new->mm = parent->mm;
new->parent = gmap_get(parent);
+ new->private = parent->private;
new->orig_asce = asce;
new->edat_level = edat_level;
new->initialized = false;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 53/60] readahead: avoid multiple marked readahead pages
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (51 preceding siblings ...)
2024-03-13 16:36 ` [PATCH 6.6 52/60] KVM: s390: vsie: fix race during shadow creation Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 54/60] selftests: mptcp: decrease BW in simult flows Sasha Levin
` (13 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Jan Kara, Matthew Wilcox, Guo Xuenan, Andrew Morton, Sasha Levin
From: Jan Kara <jack@suse.cz>
[ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]
ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page being
allocated. This can however lead to multiple consecutive pages being
marked with readahead flag. Consider situation with index == 1, mark ==
1, order == 0. We insert order 0 page at index 1 and mark it. Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well. Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well. The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1. This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:
"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:
[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
[jack@suse.cz: fix one more place where mark rounding was done as well]
Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/readahead.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 6925e6959fd3f..1d1a84deb5bc5 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
if (!folio)
return -ENOMEM;
- mark = round_up(mark, 1UL << order);
+ mark = round_down(mark, 1UL << order);
if (index == mark)
folio_set_readahead(folio);
err = filemap_add_folio(ractl->mapping, folio, index, gfp);
@@ -577,7 +577,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
* It's the expected callback index, assume sequential access.
* Ramp up sizes, and push forward the readahead window.
*/
- expected = round_up(ra->start + ra->size - ra->async_size,
+ expected = round_down(ra->start + ra->size - ra->async_size,
1UL << order);
if (index == expected || index == (ra->start + ra->size)) {
ra->start += ra->size;
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 54/60] selftests: mptcp: decrease BW in simult flows
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (52 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 53/60] readahead: avoid multiple marked readahead pages Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 55/60] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) Sasha Levin
` (12 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Matthieu Baerts (NGI0), Paolo Abeni, Jakub Kicinski, Sasha Levin
From: "Matthieu Baerts (NGI0)" <matttbe@kernel.org>
[ Upstream commit 5e2f3c65af47e527ccac54060cf909e3306652ff ]
When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.
To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.
Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space")
Cc: stable@vger.kernel.org
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1-6-4c1c11e571ff@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 9096bf5794888..25693b37f820d 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -302,12 +302,12 @@ done
setup
run_test 10 10 0 0 "balanced bwidth"
-run_test 10 10 1 50 "balanced bwidth with unbalanced delay"
+run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
# we still need some additional infrastructure to pass the following test-cases
-run_test 30 10 0 0 "unbalanced bwidth"
-run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay"
-run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 0 0 "unbalanced bwidth"
+run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
mptcp_lib_result_print_all_tap
exit $ret
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 55/60] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (53 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 54/60] selftests: mptcp: decrease BW in simult flows Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 56/60] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Sasha Levin
` (11 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Oleg Nesterov, Dylan Hatch, Eric W . Biederman, Andrew Morton,
Sasha Levin
From: Oleg Nesterov <oleg@redhat.com>
[ Upstream commit c1be35a16b2f1fe21f4f26f9de030ad6eaaf6a25 ]
After the recent changes nobody use siglock to read the values protected
by stats_lock, we can kill spin_lock_irq(¤t->sighand->siglock) and
update the comment.
With this patch only __exit_signal() and thread_group_start_cputime() take
stats_lock under siglock.
Link: https://lkml.kernel.org/r/20240123153359.GA21866@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
kernel/exit.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 21a59a6e1f2e8..1867d420c36c4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1148,17 +1148,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
* and nobody can change them.
*
* psig->stats_lock also protects us from our sub-threads
- * which can reap other children at the same time. Until
- * we change k_getrusage()-like users to rely on this lock
- * we have to take ->siglock as well.
+ * which can reap other children at the same time.
*
* We use thread_group_cputime_adjusted() to get times for
* the thread group, which consolidates times for all threads
* in the group including the group leader.
*/
thread_group_cputime_adjusted(p, &tgutime, &tgstime);
- spin_lock_irq(¤t->sighand->siglock);
- write_seqlock(&psig->stats_lock);
+ write_seqlock_irq(&psig->stats_lock);
psig->cutime += tgutime + sig->cutime;
psig->cstime += tgstime + sig->cstime;
psig->cgtime += task_gtime(p) + sig->gtime + sig->cgtime;
@@ -1181,8 +1178,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
psig->cmaxrss = maxrss;
task_io_accounting_add(&psig->ioac, &p->ioac);
task_io_accounting_add(&psig->ioac, &sig->ioac);
- write_sequnlock(&psig->stats_lock);
- spin_unlock_irq(¤t->sighand->siglock);
+ write_sequnlock_irq(&psig->stats_lock);
}
if (wo->wo_rusage)
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 56/60] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (54 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 55/60] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 57/60] Documentation/hw-vuln: Add documentation for RFDS Sasha Levin
` (10 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Pawan Gupta, Dave Hansen, Greg Kroah-Hartman
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit e95df4ec0c0c9791941f112db699fae794b9862a upstream.
Currently MMIO Stale Data mitigation for CPUs not affected by MDS/TAA is
to only deploy VERW at VMentry by enabling mmio_stale_data_clear static
branch. No mitigation is needed for kernel->user transitions. If such
CPUs are also affected by RFDS, its mitigation may set
X86_FEATURE_CLEAR_CPU_BUF to deploy VERW at kernel->user and VMentry.
This could result in duplicate VERW at VMentry.
Fix this by disabling mmio_stale_data_clear static branch when
X86_FEATURE_CLEAR_CPU_BUF is enabled.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kernel/cpu/bugs.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 17eb4d76e3a53..19256accc0784 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -422,6 +422,13 @@ static void __init mmio_select_mitigation(void)
if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
boot_cpu_has(X86_FEATURE_RTM)))
setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
+
+ /*
+ * X86_FEATURE_CLEAR_CPU_BUF could be enabled by other VERW based
+ * mitigations, disable KVM-only mitigation in that case.
+ */
+ if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
+ static_branch_disable(&mmio_stale_data_clear);
else
static_branch_enable(&mmio_stale_data_clear);
@@ -498,8 +505,11 @@ static void __init md_clear_update_mitigation(void)
taa_mitigation = TAA_MITIGATION_VERW;
taa_select_mitigation();
}
- if (mmio_mitigation == MMIO_MITIGATION_OFF &&
- boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
+ /*
+ * MMIO_MITIGATION_OFF is not checked here so that mmio_stale_data_clear
+ * gets updated correctly as per X86_FEATURE_CLEAR_CPU_BUF state.
+ */
+ if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) {
mmio_mitigation = MMIO_MITIGATION_VERW;
mmio_select_mitigation();
}
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 57/60] Documentation/hw-vuln: Add documentation for RFDS
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (55 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 56/60] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 58/60] x86/rfds: Mitigate Register File Data Sampling (RFDS) Sasha Levin
` (9 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Pawan Gupta, Dave Hansen, Thomas Gleixner, Josh Poimboeuf,
Greg Kroah-Hartman
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 4e42765d1be01111df0c0275bbaf1db1acef346e upstream.
Add the documentation for transient execution vulnerability Register
File Data Sampling (RFDS) that affects Intel Atom CPUs.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
2 files changed, 105 insertions(+)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index de99caabf65a3..ff0b440ef2dc9 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -21,3 +21,4 @@ are configurable at compile, boot or run time.
cross-thread-rsb
srso
gather_data_sampling
+ reg-file-data-sampling
diff --git a/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
new file mode 100644
index 0000000000000..0585d02b9a6cb
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
@@ -0,0 +1,104 @@
+==================================
+Register File Data Sampling (RFDS)
+==================================
+
+Register File Data Sampling (RFDS) is a microarchitectural vulnerability that
+only affects Intel Atom parts(also branded as E-cores). RFDS may allow
+a malicious actor to infer data values previously used in floating point
+registers, vector registers, or integer registers. RFDS does not provide the
+ability to choose which data is inferred. CVE-2023-28746 is assigned to RFDS.
+
+Affected Processors
+===================
+Below is the list of affected Intel processors [#f1]_:
+
+ =================== ============
+ Common name Family_Model
+ =================== ============
+ ATOM_GOLDMONT 06_5CH
+ ATOM_GOLDMONT_D 06_5FH
+ ATOM_GOLDMONT_PLUS 06_7AH
+ ATOM_TREMONT_D 06_86H
+ ATOM_TREMONT 06_96H
+ ALDERLAKE 06_97H
+ ALDERLAKE_L 06_9AH
+ ATOM_TREMONT_L 06_9CH
+ RAPTORLAKE 06_B7H
+ RAPTORLAKE_P 06_BAH
+ ATOM_GRACEMONT 06_BEH
+ RAPTORLAKE_S 06_BFH
+ =================== ============
+
+As an exception to this table, Intel Xeon E family parts ALDERLAKE(06_97H) and
+RAPTORLAKE(06_B7H) codenamed Catlow are not affected. They are reported as
+vulnerable in Linux because they share the same family/model with an affected
+part. Unlike their affected counterparts, they do not enumerate RFDS_CLEAR or
+CPUID.HYBRID. This information could be used to distinguish between the
+affected and unaffected parts, but it is deemed not worth adding complexity as
+the reporting is fixed automatically when these parts enumerate RFDS_NO.
+
+Mitigation
+==========
+Intel released a microcode update that enables software to clear sensitive
+information using the VERW instruction. Like MDS, RFDS deploys the same
+mitigation strategy to force the CPU to clear the affected buffers before an
+attacker can extract the secrets. This is achieved by using the otherwise
+unused and obsolete VERW instruction in combination with a microcode update.
+The microcode clears the affected CPU buffers when the VERW instruction is
+executed.
+
+Mitigation points
+-----------------
+VERW is executed by the kernel before returning to user space, and by KVM
+before VMentry. None of the affected cores support SMT, so VERW is not required
+at C-state transitions.
+
+New bits in IA32_ARCH_CAPABILITIES
+----------------------------------
+Newer processors and microcode update on existing affected processors added new
+bits to IA32_ARCH_CAPABILITIES MSR. These bits can be used to enumerate
+vulnerability and mitigation capability:
+
+- Bit 27 - RFDS_NO - When set, processor is not affected by RFDS.
+- Bit 28 - RFDS_CLEAR - When set, processor is affected by RFDS, and has the
+ microcode that clears the affected buffers on VERW execution.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+The kernel command line allows to control RFDS mitigation at boot time with the
+parameter "reg_file_data_sampling=". The valid arguments are:
+
+ ========== =================================================================
+ on If the CPU is vulnerable, enable mitigation; CPU buffer clearing
+ on exit to userspace and before entering a VM.
+ off Disables mitigation.
+ ========== =================================================================
+
+Mitigation default is selected by CONFIG_MITIGATION_RFDS.
+
+Mitigation status information
+-----------------------------
+The Linux kernel provides a sysfs interface to enumerate the current
+vulnerability status of the system: whether the system is vulnerable, and
+which mitigations are active. The relevant sysfs file is:
+
+ /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
+
+The possible values in this file are:
+
+ .. list-table::
+
+ * - 'Not affected'
+ - The processor is not vulnerable
+ * - 'Vulnerable'
+ - The processor is vulnerable, but no mitigation enabled
+ * - 'Vulnerable: No microcode'
+ - The processor is vulnerable but microcode is not updated.
+ * - 'Mitigation: Clear Register File'
+ - The processor is vulnerable and the CPU buffer clearing mitigation is
+ enabled.
+
+References
+----------
+.. [#f1] Affected Processors
+ https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 58/60] x86/rfds: Mitigate Register File Data Sampling (RFDS)
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (56 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 57/60] Documentation/hw-vuln: Add documentation for RFDS Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 59/60] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Sasha Levin
` (8 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Pawan Gupta, Dave Hansen, Thomas Gleixner, Josh Poimboeuf,
Greg Kroah-Hartman
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 8076fcde016c9c0e0660543e67bff86cb48a7c9c upstream.
RFDS is a CPU vulnerability that may allow userspace to infer kernel
stale data previously used in floating point registers, vector registers
and integer registers. RFDS only affects certain Intel Atom processors.
Intel released a microcode update that uses VERW instruction to clear
the affected CPU buffers. Unlike MDS, none of the affected cores support
SMT.
Add RFDS bug infrastructure and enable the VERW based mitigation by
default, that clears the affected buffers just before exiting to
userspace. Also add sysfs reporting and cmdline parameter
"reg_file_data_sampling" to control the mitigation.
For details see:
Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
.../ABI/testing/sysfs-devices-system-cpu | 1 +
.../admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 78 ++++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++++-
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 2 +
9 files changed, 157 insertions(+), 6 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 7ecd5c8161a61..34b6f6ab47422 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -519,6 +519,7 @@ What: /sys/devices/system/cpu/vulnerabilities
/sys/devices/system/cpu/vulnerabilities/mds
/sys/devices/system/cpu/vulnerabilities/meltdown
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data
+ /sys/devices/system/cpu/vulnerabilities/reg_file_data_sampling
/sys/devices/system/cpu/vulnerabilities/retbleed
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
/sys/devices/system/cpu/vulnerabilities/spectre_v1
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 41644336e3587..c28a095333670 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1133,6 +1133,26 @@
The filter can be disabled or changed to another
driver later using sysfs.
+ reg_file_data_sampling=
+ [X86] Controls mitigation for Register File Data
+ Sampling (RFDS) vulnerability. RFDS is a CPU
+ vulnerability which may allow userspace to infer
+ kernel data values previously stored in floating point
+ registers, vector registers, or integer registers.
+ RFDS only affects Intel Atom processors.
+
+ on: Turns ON the mitigation.
+ off: Turns OFF the mitigation.
+
+ This parameter overrides the compile time default set
+ by CONFIG_MITIGATION_RFDS. Mitigation cannot be
+ disabled when other VERW based mitigations (like MDS)
+ are enabled. In order to disable RFDS mitigation all
+ VERW based mitigations need to be disabled.
+
+ For details see:
+ Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
+
driver_async_probe= [KNL]
List of driver names to be probed asynchronously. *
matches with all driver names. If * is specified, the
@@ -3322,6 +3342,7 @@
nospectre_bhb [ARM64]
nospectre_v1 [X86,PPC]
nospectre_v2 [X86,PPC,S390,ARM64]
+ reg_file_data_sampling=off [X86]
retbleed=off [X86]
spec_store_bypass_disable=off [X86,PPC]
spectre_v2_user=off [X86]
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fe3292e310d48..de1adec887336 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2568,6 +2568,17 @@ config GDS_FORCE_MITIGATION
If in doubt, say N.
+config MITIGATION_RFDS
+ bool "RFDS Mitigation"
+ depends on CPU_SUP_INTEL
+ default y
+ help
+ Enable mitigation for Register File Data Sampling (RFDS) by default.
+ RFDS is a hardware vulnerability which affects Intel Atom CPUs. It
+ allows unprivileged speculative access to stale data previously
+ stored in floating point, vector and integer registers.
+ See also <file:Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst>
+
endif
config ARCH_HAS_ADD_PAGES
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e7b0554be04fa..bd33f6366c80d 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -498,4 +498,5 @@
/* BUG word 2 */
#define X86_BUG_SRSO X86_BUG(1*32 + 0) /* AMD SRSO bug */
#define X86_BUG_DIV0 X86_BUG(1*32 + 1) /* AMD DIV0 speculation bug */
+#define X86_BUG_RFDS X86_BUG(1*32 + 2) /* CPU is vulnerable to Register File Data Sampling */
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 389f9594746ef..c75cc5610be30 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -165,6 +165,14 @@
* CPU is not vulnerable to Gather
* Data Sampling (GDS).
*/
+#define ARCH_CAP_RFDS_NO BIT(27) /*
+ * Not susceptible to Register
+ * File Data Sampling.
+ */
+#define ARCH_CAP_RFDS_CLEAR BIT(28) /*
+ * VERW clears CPU Register
+ * File.
+ */
#define ARCH_CAP_XAPIC_DISABLE BIT(21) /*
* IA32_XAPIC_DISABLE_STATUS MSR
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 19256accc0784..3452f7271d074 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -480,6 +480,57 @@ static int __init mmio_stale_data_parse_cmdline(char *str)
}
early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);
+#undef pr_fmt
+#define pr_fmt(fmt) "Register File Data Sampling: " fmt
+
+enum rfds_mitigations {
+ RFDS_MITIGATION_OFF,
+ RFDS_MITIGATION_VERW,
+ RFDS_MITIGATION_UCODE_NEEDED,
+};
+
+/* Default mitigation for Register File Data Sampling */
+static enum rfds_mitigations rfds_mitigation __ro_after_init =
+ IS_ENABLED(CONFIG_MITIGATION_RFDS) ? RFDS_MITIGATION_VERW : RFDS_MITIGATION_OFF;
+
+static const char * const rfds_strings[] = {
+ [RFDS_MITIGATION_OFF] = "Vulnerable",
+ [RFDS_MITIGATION_VERW] = "Mitigation: Clear Register File",
+ [RFDS_MITIGATION_UCODE_NEEDED] = "Vulnerable: No microcode",
+};
+
+static void __init rfds_select_mitigation(void)
+{
+ if (!boot_cpu_has_bug(X86_BUG_RFDS) || cpu_mitigations_off()) {
+ rfds_mitigation = RFDS_MITIGATION_OFF;
+ return;
+ }
+ if (rfds_mitigation == RFDS_MITIGATION_OFF)
+ return;
+
+ if (x86_read_arch_cap_msr() & ARCH_CAP_RFDS_CLEAR)
+ setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
+ else
+ rfds_mitigation = RFDS_MITIGATION_UCODE_NEEDED;
+}
+
+static __init int rfds_parse_cmdline(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ return 0;
+
+ if (!strcmp(str, "off"))
+ rfds_mitigation = RFDS_MITIGATION_OFF;
+ else if (!strcmp(str, "on"))
+ rfds_mitigation = RFDS_MITIGATION_VERW;
+
+ return 0;
+}
+early_param("reg_file_data_sampling", rfds_parse_cmdline);
+
#undef pr_fmt
#define pr_fmt(fmt) "" fmt
@@ -513,6 +564,11 @@ static void __init md_clear_update_mitigation(void)
mmio_mitigation = MMIO_MITIGATION_VERW;
mmio_select_mitigation();
}
+ if (rfds_mitigation == RFDS_MITIGATION_OFF &&
+ boot_cpu_has_bug(X86_BUG_RFDS)) {
+ rfds_mitigation = RFDS_MITIGATION_VERW;
+ rfds_select_mitigation();
+ }
out:
if (boot_cpu_has_bug(X86_BUG_MDS))
pr_info("MDS: %s\n", mds_strings[mds_mitigation]);
@@ -522,6 +578,8 @@ static void __init md_clear_update_mitigation(void)
pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]);
else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN))
pr_info("MMIO Stale Data: Unknown: No mitigations\n");
+ if (boot_cpu_has_bug(X86_BUG_RFDS))
+ pr_info("Register File Data Sampling: %s\n", rfds_strings[rfds_mitigation]);
}
static void __init md_clear_select_mitigation(void)
@@ -529,11 +587,12 @@ static void __init md_clear_select_mitigation(void)
mds_select_mitigation();
taa_select_mitigation();
mmio_select_mitigation();
+ rfds_select_mitigation();
/*
- * As MDS, TAA and MMIO Stale Data mitigations are inter-related, update
- * and print their mitigation after MDS, TAA and MMIO Stale Data
- * mitigation selection is done.
+ * As these mitigations are inter-related and rely on VERW instruction
+ * to clear the microarchitural buffers, update and print their status
+ * after mitigation selection is done for each of these vulnerabilities.
*/
md_clear_update_mitigation();
}
@@ -2623,6 +2682,11 @@ static ssize_t mmio_stale_data_show_state(char *buf)
sched_smt_active() ? "vulnerable" : "disabled");
}
+static ssize_t rfds_show_state(char *buf)
+{
+ return sysfs_emit(buf, "%s\n", rfds_strings[rfds_mitigation]);
+}
+
static char *stibp_state(void)
{
if (spectre_v2_in_eibrs_mode(spectre_v2_enabled) &&
@@ -2782,6 +2846,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
case X86_BUG_GDS:
return gds_show_state(buf);
+ case X86_BUG_RFDS:
+ return rfds_show_state(buf);
+
default:
break;
}
@@ -2856,4 +2923,9 @@ ssize_t cpu_show_gds(struct device *dev, struct device_attribute *attr, char *bu
{
return cpu_show_common(dev, attr, buf, X86_BUG_GDS);
}
+
+ssize_t cpu_show_reg_file_data_sampling(struct device *dev, struct device_attribute *attr, char *buf)
+{
+ return cpu_show_common(dev, attr, buf, X86_BUG_RFDS);
+}
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d98d023ae497f..73cfac3fc9c4c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1269,6 +1269,8 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
#define SRSO BIT(5)
/* CPU is affected by GDS */
#define GDS BIT(6)
+/* CPU is affected by Register File Data Sampling */
+#define RFDS BIT(7)
static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(IVYBRIDGE, X86_STEPPING_ANY, SRBDS),
@@ -1296,9 +1298,18 @@ static const struct x86_cpu_id cpu_vuln_blacklist[] __initconst = {
VULNBL_INTEL_STEPPINGS(TIGERLAKE, X86_STEPPING_ANY, GDS),
VULNBL_INTEL_STEPPINGS(LAKEFIELD, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RETBLEED),
VULNBL_INTEL_STEPPINGS(ROCKETLAKE, X86_STEPPING_ANY, MMIO | RETBLEED | GDS),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO),
- VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS),
+ VULNBL_INTEL_STEPPINGS(ALDERLAKE, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ALDERLAKE_L, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE_P, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(RAPTORLAKE_S, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GRACEMONT, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_D, X86_STEPPING_ANY, MMIO | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_TREMONT_L, X86_STEPPING_ANY, MMIO | MMIO_SBDS | RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_D, X86_STEPPING_ANY, RFDS),
+ VULNBL_INTEL_STEPPINGS(ATOM_GOLDMONT_PLUS, X86_STEPPING_ANY, RFDS),
VULNBL_AMD(0x15, RETBLEED),
VULNBL_AMD(0x16, RETBLEED),
@@ -1332,6 +1343,24 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
}
+static bool __init vulnerable_to_rfds(u64 ia32_cap)
+{
+ /* The "immunity" bit trumps everything else: */
+ if (ia32_cap & ARCH_CAP_RFDS_NO)
+ return false;
+
+ /*
+ * VMMs set ARCH_CAP_RFDS_CLEAR for processors not in the blacklist to
+ * indicate that mitigation is needed because guest is running on a
+ * vulnerable hardware or may migrate to such hardware:
+ */
+ if (ia32_cap & ARCH_CAP_RFDS_CLEAR)
+ return true;
+
+ /* Only consult the blacklist when there is no enumeration: */
+ return cpu_matches(cpu_vuln_blacklist, RFDS);
+}
+
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
u64 ia32_cap = x86_read_arch_cap_msr();
@@ -1443,6 +1472,9 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
boot_cpu_has(X86_FEATURE_AVX))
setup_force_cpu_bug(X86_BUG_GDS);
+ if (vulnerable_to_rfds(ia32_cap))
+ setup_force_cpu_bug(X86_BUG_RFDS);
+
if (cpu_matches(cpu_vuln_whitelist, NO_MELTDOWN))
return;
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 548491de818ef..ef427ee787a99 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -565,6 +565,7 @@ CPU_SHOW_VULN_FALLBACK(mmio_stale_data);
CPU_SHOW_VULN_FALLBACK(retbleed);
CPU_SHOW_VULN_FALLBACK(spec_rstack_overflow);
CPU_SHOW_VULN_FALLBACK(gds);
+CPU_SHOW_VULN_FALLBACK(reg_file_data_sampling);
static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
@@ -579,6 +580,7 @@ static DEVICE_ATTR(mmio_stale_data, 0444, cpu_show_mmio_stale_data, NULL);
static DEVICE_ATTR(retbleed, 0444, cpu_show_retbleed, NULL);
static DEVICE_ATTR(spec_rstack_overflow, 0444, cpu_show_spec_rstack_overflow, NULL);
static DEVICE_ATTR(gather_data_sampling, 0444, cpu_show_gds, NULL);
+static DEVICE_ATTR(reg_file_data_sampling, 0444, cpu_show_reg_file_data_sampling, NULL);
static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_meltdown.attr,
@@ -594,6 +596,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
&dev_attr_retbleed.attr,
&dev_attr_spec_rstack_overflow.attr,
&dev_attr_gather_data_sampling.attr,
+ &dev_attr_reg_file_data_sampling.attr,
NULL
};
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index eb768a866fe31..59dd421a8e35d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -75,6 +75,8 @@ extern ssize_t cpu_show_spec_rstack_overflow(struct device *dev,
struct device_attribute *attr, char *buf);
extern ssize_t cpu_show_gds(struct device *dev,
struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_reg_file_data_sampling(struct device *dev,
+ struct device_attribute *attr, char *buf);
extern __printf(4, 5)
struct device *cpu_device_create(struct device *parent, void *drvdata,
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 59/60] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (57 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 58/60] x86/rfds: Mitigate Register File Data Sampling (RFDS) Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 60/60] Linux 6.6.22-rc1 Sasha Levin
` (7 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Pawan Gupta, Dave Hansen, Thomas Gleixner, Josh Poimboeuf,
Greg Kroah-Hartman
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
commit 2a0180129d726a4b953232175857d442651b55a0 upstream.
Mitigation for RFDS requires RFDS_CLEAR capability which is enumerated
by MSR_IA32_ARCH_CAPABILITIES bit 27. If the host has it set, export it
to guests so that they can deploy the mitigation.
RFDS_NO indicates that the system is not vulnerable to RFDS, export it
to guests so that they don't deploy the mitigation unnecessarily. When
the host is not affected by X86_BUG_RFDS, but has RFDS_NO=0, synthesize
RFDS_NO to the guest.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/x86/kvm/x86.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d8472d00024c..9b6639d87a62e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1620,7 +1620,8 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
ARCH_CAP_SKIP_VMENTRY_L1DFLUSH | ARCH_CAP_SSB_NO | ARCH_CAP_MDS_NO | \
ARCH_CAP_PSCHANGE_MC_NO | ARCH_CAP_TSX_CTRL_MSR | ARCH_CAP_TAA_NO | \
ARCH_CAP_SBDR_SSDP_NO | ARCH_CAP_FBSDP_NO | ARCH_CAP_PSDP_NO | \
- ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO)
+ ARCH_CAP_FB_CLEAR | ARCH_CAP_RRSBA | ARCH_CAP_PBRSB_NO | ARCH_CAP_GDS_NO | \
+ ARCH_CAP_RFDS_NO | ARCH_CAP_RFDS_CLEAR)
static u64 kvm_get_arch_capabilities(void)
{
@@ -1652,6 +1653,8 @@ static u64 kvm_get_arch_capabilities(void)
data |= ARCH_CAP_SSB_NO;
if (!boot_cpu_has_bug(X86_BUG_MDS))
data |= ARCH_CAP_MDS_NO;
+ if (!boot_cpu_has_bug(X86_BUG_RFDS))
+ data |= ARCH_CAP_RFDS_NO;
if (!boot_cpu_has(X86_FEATURE_RTM)) {
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH 6.6 60/60] Linux 6.6.22-rc1
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (58 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 59/60] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Sasha Levin
@ 2024-03-13 16:37 ` Sasha Levin
2024-03-14 8:02 ` [PATCH 6.6 00/60] 6.6.22-rc1 review Bagas Sanjaya
` (6 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 16:37 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index a36819b045a63..48267900c8032 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
VERSION = 6
PATCHLEVEL = 6
-SUBLEVEL = 21
-EXTRAVERSION =
+SUBLEVEL = 22
+EXTRAVERSION = -rc1
NAME = Hurr durr I'ma ninja sloth
# *DOCUMENTATION*
--
2.43.0
^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page()
2024-03-13 16:36 ` [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page() Sasha Levin
@ 2024-03-13 17:29 ` Hugh Dickins
0 siblings, 0 replies; 72+ messages in thread
From: Hugh Dickins @ 2024-03-13 17:29 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, Kefeng Wang, Matthew Wilcox,
David Hildenbrand, Huang Ying, Hugh Dickins, Mike Kravetz, Zi Yan,
Andrew Morton
On Wed, 13 Mar 2024, Sasha Levin wrote:
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> [ Upstream commit a8ac4a767dcd9d87d8229045904d9fe15ea5e0e8 ]
>
> Patch series "mm: migrate: more folio conversion and unification", v3.
>
> Convert more migrate functions to use a folio, it is also a preparation
> for large folio migration support when balancing numa.
>
> This patch (of 8):
>
> The assert VM_BUG_ON_PAGE(order && !PageTransHuge(page), page) is not very
> useful,
>
> 1) for a tail/base page, order = 0, for a head page, the order > 0 &&
> PageTransHuge() is true
> 2) there is a PageCompound() check and only base page is handled in
> do_numa_page(), and do_huge_pmd_numa_page() only handle PMD-mapped
> THP
> 3) even though the page is a tail page, isolate_lru_page() will post
> a warning, and fail to isolate the page
> 4) if large folio/pte-mapped THP migration supported in the future,
> we could migrate the entire folio if numa fault on a tail page
>
> so just remove the check.
>
> Link: https://lkml.kernel.org/r/20230913095131.2426871-1-wangkefeng.wang@huawei.com
> Link: https://lkml.kernel.org/r/20230913095131.2426871-2-wangkefeng.wang@huawei.com
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
No it is not: that one is appropriate to include, this one is not.
Hugh
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> mm/migrate.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index b4d972d80b10c..6f8ad6b64c9bc 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2506,8 +2506,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> int nr_pages = thp_nr_pages(page);
> int order = compound_order(page);
>
> - VM_BUG_ON_PAGE(order && !PageTransHuge(page), page);
> -
> /* Do not migrate THP mapped by multiple processes */
> if (PageTransHuge(page) && total_mapcount(page) > 1)
> return 0;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 06/60] mm: migrate: remove THP mapcount check in numamigrate_isolate_page()
2024-03-13 16:36 ` [PATCH 6.6 06/60] mm: migrate: remove THP mapcount " Sasha Levin
@ 2024-03-13 17:31 ` Hugh Dickins
0 siblings, 0 replies; 72+ messages in thread
From: Hugh Dickins @ 2024-03-13 17:31 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, Kefeng Wang, Matthew Wilcox, Huang, Ying,
David Hildenbrand, Hugh Dickins, Mike Kravetz, Zi Yan,
Andrew Morton
On Wed, 13 Mar 2024, Sasha Levin wrote:
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> [ Upstream commit 728be28fae8c838d52c91dce4867133798146357 ]
>
> The check of THP mapped by multiple processes was introduced by commit
> 04fa5d6a6547 ("mm: migrate: check page_count of THP before migrating") and
> refactor by commit 340ef3902cf2 ("mm: numa: cleanup flow of transhuge page
> migration"), which is out of date, since migrate_misplaced_page() is now
> using the standard migrate_pages() for small pages and THPs, the reference
> count checking is in folio_migrate_mapping(), so let's remove the special
> check for THP.
>
> Link: https://lkml.kernel.org/r/20230913095131.2426871-3-wangkefeng.wang@huawei.com
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
No it is not: that one is appropriate to include, this one is not.
Hugh
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> mm/migrate.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 6f8ad6b64c9bc..c9fabb960996f 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2506,10 +2506,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> int nr_pages = thp_nr_pages(page);
> int order = compound_order(page);
>
> - /* Do not migrate THP mapped by multiple processes */
> - if (PageTransHuge(page) && total_mapcount(page) > 1)
> - return 0;
> -
> /* Avoid migrating to a node that is nearly full */
> if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
> int z;
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()
2024-03-13 16:36 ` [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio() Sasha Levin
@ 2024-03-13 17:32 ` Hugh Dickins
2024-03-13 18:32 ` Sasha Levin
0 siblings, 1 reply; 72+ messages in thread
From: Hugh Dickins @ 2024-03-13 17:32 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, Kefeng Wang, Zi Yan, David Hildenbrand,
Huang, Ying, Hugh Dickins, Matthew Wilcox, Mike Kravetz,
Andrew Morton
On Wed, 13 Mar 2024, Sasha Levin wrote:
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> [ Upstream commit 2ac9e99f3b21b2864305fbfba4bae5913274c409 ]
>
> Rename numamigrate_isolate_page() to numamigrate_isolate_folio(), then
> make it takes a folio and use folio API to save compound_head() calls.
>
> Link: https://lkml.kernel.org/r/20230913095131.2426871-4-wangkefeng.wang@huawei.com
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Huang, Ying" <ying.huang@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
No it is not: that one is appropriate to include, this one is not.
Hugh
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
> mm/migrate.c | 20 ++++++++++----------
> 1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c9fabb960996f..e5f2f7243a659 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2501,10 +2501,9 @@ static struct folio *alloc_misplaced_dst_folio(struct folio *src,
> return __folio_alloc_node(gfp, order, nid);
> }
>
> -static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> +static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio)
> {
> - int nr_pages = thp_nr_pages(page);
> - int order = compound_order(page);
> + int nr_pages = folio_nr_pages(folio);
>
> /* Avoid migrating to a node that is nearly full */
> if (!migrate_balanced_pgdat(pgdat, nr_pages)) {
> @@ -2516,22 +2515,23 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
> if (managed_zone(pgdat->node_zones + z))
> break;
> }
> - wakeup_kswapd(pgdat->node_zones + z, 0, order, ZONE_MOVABLE);
> + wakeup_kswapd(pgdat->node_zones + z, 0,
> + folio_order(folio), ZONE_MOVABLE);
> return 0;
> }
>
> - if (!isolate_lru_page(page))
> + if (!folio_isolate_lru(folio))
> return 0;
>
> - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_is_file_lru(page),
> + node_stat_mod_folio(folio, NR_ISOLATED_ANON + folio_is_file_lru(folio),
> nr_pages);
>
> /*
> - * Isolating the page has taken another reference, so the
> - * caller's reference can be safely dropped without the page
> + * Isolating the folio has taken another reference, so the
> + * caller's reference can be safely dropped without the folio
> * disappearing underneath us during migration.
> */
> - put_page(page);
> + folio_put(folio);
> return 1;
> }
>
> @@ -2565,7 +2565,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
> if (page_is_file_lru(page) && PageDirty(page))
> goto out;
>
> - isolated = numamigrate_isolate_page(pgdat, page);
> + isolated = numamigrate_isolate_folio(pgdat, page_folio(page));
> if (!isolated)
> goto out;
>
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()
2024-03-13 17:32 ` Hugh Dickins
@ 2024-03-13 18:32 ` Sasha Levin
0 siblings, 0 replies; 72+ messages in thread
From: Sasha Levin @ 2024-03-13 18:32 UTC (permalink / raw)
To: Hugh Dickins
Cc: linux-kernel, stable, Kefeng Wang, Zi Yan, David Hildenbrand,
Huang, Ying, Matthew Wilcox, Mike Kravetz, Andrew Morton
On Wed, Mar 13, 2024 at 10:32:26AM -0700, Hugh Dickins wrote:
>On Wed, 13 Mar 2024, Sasha Levin wrote:
>
>> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>>
>> [ Upstream commit 2ac9e99f3b21b2864305fbfba4bae5913274c409 ]
>>
>> Rename numamigrate_isolate_page() to numamigrate_isolate_folio(), then
>> make it takes a folio and use folio API to save compound_head() calls.
>>
>> Link: https://lkml.kernel.org/r/20230913095131.2426871-4-wangkefeng.wang@huawei.com
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> Reviewed-by: Zi Yan <ziy@nvidia.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Stable-dep-of: 2774f256e7c0 ("mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index")
>
>No it is not: that one is appropriate to include, this one is not.
Yeah that's fair - I'll rework the backport of 2774f256e7c0 to remove
these dependencies.
Thanks for reviewing!
--
Thanks,
Sasha
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (59 preceding siblings ...)
2024-03-13 16:37 ` [PATCH 6.6 60/60] Linux 6.6.22-rc1 Sasha Levin
@ 2024-03-14 8:02 ` Bagas Sanjaya
2024-03-14 10:08 ` Naresh Kamboju
` (5 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Bagas Sanjaya @ 2024-03-14 8:02 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel
[-- Attachment #1: Type: text/plain, Size: 554 bytes --]
On Wed, Mar 13, 2024 at 12:36:07PM -0400, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
Successfully compiled and installed the kernel on my computer (Acer
Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (60 preceding siblings ...)
2024-03-14 8:02 ` [PATCH 6.6 00/60] 6.6.22-rc1 review Bagas Sanjaya
@ 2024-03-14 10:08 ` Naresh Kamboju
2024-03-14 11:56 ` Takeshi Ogasawara
` (4 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Naresh Kamboju @ 2024-03-14 10:08 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel
On Wed, 13 Mar 2024 at 22:07, Sasha Levin <sashal@kernel.org> wrote:
>
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
## Build
* kernel: 6.6.22-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-6.6.y
* git commit: 11496a5d363eb35c9b4de8012eae7ffa557594f0
* git describe: v6.6.21-60-g11496a5d363e
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.21-60-g11496a5d363e
## Test Regressions (compared to v6.6.21)
## Metric Regressions (compared to v6.6.21)
## Test Fixes (compared to v6.6.21)
## Metric Fixes (compared to v6.6.21)
## Test result summary
total: 132425, pass: 115163, fail: 1216, skip: 15923, xfail: 123
## Build Summary
* arc: 5 total, 5 passed, 0 failed
* arm: 135 total, 132 passed, 3 failed
* arm64: 43 total, 41 passed, 2 failed
* i386: 35 total, 30 passed, 5 failed
* mips: 26 total, 23 passed, 3 failed
* parisc: 4 total, 4 passed, 0 failed
* powerpc: 36 total, 28 passed, 8 failed
* riscv: 18 total, 18 passed, 0 failed
* s390: 13 total, 13 passed, 0 failed
* sh: 10 total, 10 passed, 0 failed
* sparc: 8 total, 8 passed, 0 failed
* x86_64: 39 total, 34 passed, 5 failed
## Test suites summary
* boot
* kselftest-android
* kselftest-arm64
* kselftest-breakpoints
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-drivers-dma-buf
* kselftest-efivarfs
* kselftest-exec
* kselftest-filesystems
* kselftest-filesystems-binderfs
* kselftest-filesystems-epoll
* kselftest-firmware
* kselftest-fpu
* kselftest-ftrace
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mm
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-net-forwarding
* kselftest-net-mptcp
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-user_events
* kselftest-vDSO
* kselftest-watchdog
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libgpiod
* libhugetlbfs
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-smoketest
* ltp-syscalls
* ltp-tracing
* perf
* rcutorture
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (61 preceding siblings ...)
2024-03-14 10:08 ` Naresh Kamboju
@ 2024-03-14 11:56 ` Takeshi Ogasawara
2024-03-14 20:55 ` Florian Fainelli
` (3 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Takeshi Ogasawara @ 2024-03-14 11:56 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel
Hi Sasha
On Thu, Mar 14, 2024 at 1:47 AM Sasha Levin <sashal@kernel.org> wrote:
>
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
6.6.22-rc1 tested.
Build successfully completed.
Boot successfully completed.
No dmesg regressions.
Video output normal.
Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.6.22-rc1rv
(takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU
Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Thu Mar 14 20:32:45 JST 2024
Thanks
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (62 preceding siblings ...)
2024-03-14 11:56 ` Takeshi Ogasawara
@ 2024-03-14 20:55 ` Florian Fainelli
2024-03-15 15:44 ` Mark Brown
` (2 subsequent siblings)
66 siblings, 0 replies; 72+ messages in thread
From: Florian Fainelli @ 2024-03-14 20:55 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel
On 3/13/24 09:36, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on
BMIPS_GENERIC:
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
--
Florian
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (63 preceding siblings ...)
2024-03-14 20:55 ` Florian Fainelli
@ 2024-03-15 15:44 ` Mark Brown
2024-03-15 16:01 ` Ron Economos
2024-03-15 17:36 ` Harshit Mogalapalli
66 siblings, 0 replies; 72+ messages in thread
From: Mark Brown @ 2024-03-15 15:44 UTC (permalink / raw)
To: Sasha Levin
Cc: linux-kernel, stable, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel
[-- Attachment #1: Type: text/plain, Size: 348 bytes --]
On Wed, Mar 13, 2024 at 12:36:07PM -0400, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
Tested-by: Mark Brown <broonie@kernel.org>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (64 preceding siblings ...)
2024-03-15 15:44 ` Mark Brown
@ 2024-03-15 16:01 ` Ron Economos
2024-03-15 17:36 ` Harshit Mogalapalli
66 siblings, 0 replies; 72+ messages in thread
From: Ron Economos @ 2024-03-15 16:01 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel
On 3/13/24 9:36 AM, Sasha Levin wrote:
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos <re@w6rz.net>
^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH 6.6 00/60] 6.6.22-rc1 review
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
` (65 preceding siblings ...)
2024-03-15 16:01 ` Ron Economos
@ 2024-03-15 17:36 ` Harshit Mogalapalli
66 siblings, 0 replies; 72+ messages in thread
From: Harshit Mogalapalli @ 2024-03-15 17:36 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable
Cc: torvalds, akpm, linux, shuah, patches, lkft-triage, pavel,
Vegard Nossum, Darren Kenny
Hi Sasha,
On 13/03/24 22:06, Sasha Levin wrote:
>
> This is the start of the stable review cycle for the 6.6.22 release.
> There are 60 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
> Anything received after that time might be too late.
>
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Thanks,
Harshit
> The whole patch series can be found in one patch at:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-6.6.y&id2=v6.6.21
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
> and the diffstat can be found below.
>
> Thanks,
> Sasha
>
^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2024-03-15 17:37 UTC | newest]
Thread overview: 72+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-13 16:36 [PATCH 6.6 00/60] 6.6.22-rc1 review Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 01/60] dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in dts Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 02/60] dmaengine: fsl-edma: utilize common dt-binding header file Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 03/60] dmaengine: fsl-edma: correct max_segment_size setting Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 04/60] ceph: switch to corrected encoding of max_xattr_size in mdsmap Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 05/60] mm: migrate: remove PageTransHuge check in numamigrate_isolate_page() Sasha Levin
2024-03-13 17:29 ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 06/60] mm: migrate: remove THP mapcount " Sasha Levin
2024-03-13 17:31 ` Hugh Dickins
2024-03-13 16:36 ` [PATCH 6.6 07/60] mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio() Sasha Levin
2024-03-13 17:32 ` Hugh Dickins
2024-03-13 18:32 ` Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 08/60] mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 09/60] xfrm: Pass UDP encapsulation in TX packet offload Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 10/60] net: lan78xx: fix runtime PM count underflow on link stop Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 11/60] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 12/60] i40e: disable NAPI right after disabling irqs when handling xsk_pool Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 13/60] ice: reorder disabling IRQ and NAPI in ice_qp_dis Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 14/60] Revert "net/mlx5: Block entering switchdev mode with ns inconsistency" Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 15/60] Revert "net/mlx5e: Check the number of elements before walk TC rhashtable" Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 16/60] net/mlx5: E-switch, Change flow rule destination checking Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 17/60] net/mlx5: Check capability for fw_reset Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 18/60] net/mlx5e: Change the warning when ignore_flow_level is not supported Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 19/60] net/mlx5e: Fix MACsec state loss upon state update in offload path Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 20/60] net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission tracking occurs after populating the metadata_map Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 21/60] net/mlx5e: Switch to using _bh variant of of spinlock API in port timestamping NAPI poll context Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 22/60] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 23/60] geneve: make sure to pull inner header in geneve_rx() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 24/60] net: sparx5: Fix use after free inside sparx5_del_mact_entry Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 25/60] ice: virtchnl: stop pretending to support RSS over AQ or registers Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 26/60] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 27/60] igc: avoid returning frame twice in XDP_REDIRECT Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 28/60] net/ipv6: avoid possible UAF in ip6_route_mpath_notify() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 29/60] bpf: check bpf_func_state->callback_depth when pruning states Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 30/60] xdp, bonding: Fix feature flags when there are no slave devs anymore Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 31/60] selftests/bpf: Fix up xdp bonding test wrt feature flags Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 32/60] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 33/60] net: dsa: microchip: fix register write order in ksz8_ind_write8() Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 34/60] net/rds: fix WARNING in rds_conn_connect_if_down Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 35/60] netfilter: nft_ct: fix l3num expectations with inet pseudo family Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 36/60] netfilter: nf_conntrack_h323: Add protection for bmp length out of range Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 37/60] erofs: apply proper VMA alignment for memory mapped files on THP Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 38/60] netrom: Fix a data-race around sysctl_netrom_default_path_quality Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 39/60] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 40/60] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 41/60] netrom: Fix a data-race around sysctl_netrom_transport_timeout Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 42/60] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 43/60] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 44/60] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 45/60] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 46/60] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 47/60] netrom: Fix a data-race around sysctl_netrom_routing_control Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 48/60] netrom: Fix a data-race around sysctl_netrom_link_fails_count Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 49/60] netrom: Fix data-races around sysctl_net_busy_read Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 50/60] net: pds_core: Fix possible double free in error handling path Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 51/60] KVM: s390: add stat counter for shadow gmap events Sasha Levin
2024-03-13 16:36 ` [PATCH 6.6 52/60] KVM: s390: vsie: fix race during shadow creation Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 53/60] readahead: avoid multiple marked readahead pages Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 54/60] selftests: mptcp: decrease BW in simult flows Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 55/60] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 56/60] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 57/60] Documentation/hw-vuln: Add documentation for RFDS Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 58/60] x86/rfds: Mitigate Register File Data Sampling (RFDS) Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 59/60] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Sasha Levin
2024-03-13 16:37 ` [PATCH 6.6 60/60] Linux 6.6.22-rc1 Sasha Levin
2024-03-14 8:02 ` [PATCH 6.6 00/60] 6.6.22-rc1 review Bagas Sanjaya
2024-03-14 10:08 ` Naresh Kamboju
2024-03-14 11:56 ` Takeshi Ogasawara
2024-03-14 20:55 ` Florian Fainelli
2024-03-15 15:44 ` Mark Brown
2024-03-15 16:01 ` Ron Economos
2024-03-15 17:36 ` Harshit Mogalapalli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox