netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/3] mlxsw: Fixes
@ 2024-06-17 16:55 Petr Machata
  2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Petr Machata @ 2024-06-17 16:55 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, mlxsw

This patchset fixes two issues with mlxsw driver initialization, and a
memory corruption issue in shared buffer occupancy handling.

Ido Schimmel (3):
  mlxsw: pci: Fix driver initialization with Spectrum-4
  mlxsw: core_thermal: Fix driver initialization failure
  mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems

 .../ethernet/mellanox/mlxsw/core_thermal.c    | 50 ++++++++++---------
 drivers/net/ethernet/mellanox/mlxsw/pci.c     | 18 +++++--
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |  2 +
 .../mellanox/mlxsw/spectrum_buffers.c         | 20 +++++---
 4 files changed, 57 insertions(+), 33 deletions(-)

-- 
2.45.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4
  2024-06-17 16:55 [PATCH net 0/3] mlxsw: Fixes Petr Machata
@ 2024-06-17 16:56 ` Petr Machata
  2024-06-19 14:50   ` Simon Horman
  2024-06-17 16:56 ` [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure Petr Machata
  2024-06-17 16:56 ` [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems Petr Machata
  2 siblings, 1 reply; 8+ messages in thread
From: Petr Machata @ 2024-06-17 16:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, mlxsw, Simon Horman, Maksym Yaremchuk

From: Ido Schimmel <idosch@nvidia.com>

Cited commit added support for a new reset flow ("all reset") which is
deeper than the existing reset flow ("software reset") and allows the
device's PCI firmware to be upgraded.

In the new flow the driver first tells the firmware that "all reset" is
required by issuing a new reset command (i.e., MRSR.command=6) and then
triggers the reset by having the PCI core issue a secondary bus reset
(SBR).

However, due to a race condition in the device's firmware the device is
not always able to recover from this reset, resulting in initialization
failures [1].

New firmware versions include a fix for the bug and advertise it using a
new capability bit in the Management Capabilities Mask (MCAM) register.

Avoid initialization failures by reading the new capability bit and
triggering the new reset flow only if the bit is set. If the bit is not
set, trigger a normal PCI hot reset by skipping the call to the
Management Reset and Shutdown Register (MRSR).

Normal PCI hot reset is weaker than "all reset", but it results in a
fully operational driver and allows users to flash a new firmware, if
they want to.

[1]
mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting
mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up
mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25
mlxsw_spectrum4 0000:01:00.0: cannot register bus device
mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25

Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow")
Cc: Simon Horman <horms@kernel.org>
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlxsw/pci.c | 18 +++++++++++++++---
 drivers/net/ethernet/mellanox/mlxsw/reg.h |  2 ++
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
index bf66d996e32e..c0ced4d315f3 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
@@ -1594,18 +1594,25 @@ static int mlxsw_pci_sys_ready_wait(struct mlxsw_pci *mlxsw_pci,
 	return -EBUSY;
 }
 
-static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci)
+static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci,
+					  bool pci_reset_sbr_supported)
 {
 	struct pci_dev *pdev = mlxsw_pci->pdev;
 	char mrsr_pl[MLXSW_REG_MRSR_LEN];
 	int err;
 
+	if (!pci_reset_sbr_supported) {
+		pci_dbg(pdev, "Performing PCI hot reset instead of \"all reset\"\n");
+		goto sbr;
+	}
+
 	mlxsw_reg_mrsr_pack(mrsr_pl,
 			    MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE);
 	err = mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl);
 	if (err)
 		return err;
 
+sbr:
 	device_lock_assert(&pdev->dev);
 
 	pci_cfg_access_lock(pdev);
@@ -1633,6 +1640,7 @@ static int
 mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id)
 {
 	struct pci_dev *pdev = mlxsw_pci->pdev;
+	bool pci_reset_sbr_supported = false;
 	char mcam_pl[MLXSW_REG_MCAM_LEN];
 	bool pci_reset_supported = false;
 	u32 sys_status;
@@ -1652,13 +1660,17 @@ mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id)
 	mlxsw_reg_mcam_pack(mcam_pl,
 			    MLXSW_REG_MCAM_FEATURE_GROUP_ENHANCED_FEATURES);
 	err = mlxsw_reg_query(mlxsw_pci->core, MLXSW_REG(mcam), mcam_pl);
-	if (!err)
+	if (!err) {
 		mlxsw_reg_mcam_unpack(mcam_pl, MLXSW_REG_MCAM_PCI_RESET,
 				      &pci_reset_supported);
+		mlxsw_reg_mcam_unpack(mcam_pl, MLXSW_REG_MCAM_PCI_RESET_SBR,
+				      &pci_reset_sbr_supported);
+	}
 
 	if (pci_reset_supported) {
 		pci_dbg(pdev, "Starting PCI reset flow\n");
-		err = mlxsw_pci_reset_at_pci_disable(mlxsw_pci);
+		err = mlxsw_pci_reset_at_pci_disable(mlxsw_pci,
+						     pci_reset_sbr_supported);
 	} else {
 		pci_dbg(pdev, "Starting software reset flow\n");
 		err = mlxsw_pci_reset_sw(mlxsw_pci);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 8adf86a6f5cc..3bb89045eaf5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -10671,6 +10671,8 @@ enum mlxsw_reg_mcam_mng_feature_cap_mask_bits {
 	MLXSW_REG_MCAM_MCIA_128B = 34,
 	/* If set, MRSR.command=6 is supported. */
 	MLXSW_REG_MCAM_PCI_RESET = 48,
+	/* If set, MRSR.command=6 is supported with Secondary Bus Reset. */
+	MLXSW_REG_MCAM_PCI_RESET_SBR = 67,
 };
 
 #define MLXSW_REG_BYTES_PER_DWORD 0x4
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure
  2024-06-17 16:55 [PATCH net 0/3] mlxsw: Fixes Petr Machata
  2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
@ 2024-06-17 16:56 ` Petr Machata
  2024-06-17 19:53   ` Wysocki, Rafael J
  2024-06-17 16:56 ` [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems Petr Machata
  2 siblings, 1 reply; 8+ messages in thread
From: Petr Machata @ 2024-06-17 16:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, mlxsw, Rafael J. Wysocki, Lukasz Luba,
	Daniel Lezcano, Vadim Pasternak

From: Ido Schimmel <idosch@nvidia.com>

Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to
thermal_debug_cdev_add()") changed the thermal core to read the current
state of the cooling device as part of the cooling device's
registration. This is incompatible with the current implementation of
the cooling device operations in mlxsw, leading to initialization
failure with errors such as:

mlxsw_spectrum 0000:01:00.0: Failed to register cooling device
mlxsw_spectrum 0000:01:00.0: cannot register bus device

The reason for the failure is that when the get current state operation
is invoked the driver tries to derive the index of the cooling device by
walking a per thermal zone array and looking for the matching cooling
device pointer. However, the pointer is returned from the registration
function and therefore only set in the array after the registration.

Fix by passing to the registration function a per cooling device private
data that already has the cooling device index populated.

Decided to fix the issue in the driver since as far as I can tell other
drivers do not suffer from this problem.

Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../ethernet/mellanox/mlxsw/core_thermal.c    | 50 ++++++++++---------
 1 file changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
index 5c511e1a8efa..eee3e37983ca 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -100,6 +100,12 @@ static const struct mlxsw_cooling_states default_cooling_states[] = {
 
 struct mlxsw_thermal;
 
+struct mlxsw_thermal_cooling_device {
+	struct mlxsw_thermal *thermal;
+	struct thermal_cooling_device *cdev;
+	unsigned int idx;
+};
+
 struct mlxsw_thermal_module {
 	struct mlxsw_thermal *parent;
 	struct thermal_zone_device *tzdev;
@@ -123,7 +129,7 @@ struct mlxsw_thermal {
 	const struct mlxsw_bus_info *bus_info;
 	struct thermal_zone_device *tzdev;
 	int polling_delay;
-	struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
+	struct mlxsw_thermal_cooling_device cdevs[MLXSW_MFCR_PWMS_MAX];
 	struct thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
 	struct mlxsw_cooling_states cooling_states[MLXSW_THERMAL_NUM_TRIPS];
 	struct mlxsw_thermal_area line_cards[];
@@ -147,7 +153,7 @@ static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
 	int i;
 
 	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
-		if (thermal->cdevs[i] == cdev)
+		if (thermal->cdevs[i].cdev == cdev)
 			return i;
 
 	/* Allow mlxsw thermal zone binding to an external cooling device */
@@ -352,17 +358,14 @@ static int mlxsw_thermal_get_cur_state(struct thermal_cooling_device *cdev,
 				       unsigned long *p_state)
 
 {
-	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
+	struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
 	struct device *dev = thermal->bus_info->dev;
 	char mfsc_pl[MLXSW_REG_MFSC_LEN];
-	int err, idx;
 	u8 duty;
+	int err;
 
-	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
-	if (idx < 0)
-		return idx;
-
-	mlxsw_reg_mfsc_pack(mfsc_pl, idx, 0);
+	mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx, 0);
 	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
 	if (err) {
 		dev_err(dev, "Failed to query PWM duty\n");
@@ -378,22 +381,19 @@ static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
 				       unsigned long state)
 
 {
-	struct mlxsw_thermal *thermal = cdev->devdata;
+	struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
+	struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
 	struct device *dev = thermal->bus_info->dev;
 	char mfsc_pl[MLXSW_REG_MFSC_LEN];
-	int idx;
 	int err;
 
 	if (state > MLXSW_THERMAL_MAX_STATE)
 		return -EINVAL;
 
-	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
-	if (idx < 0)
-		return idx;
-
 	/* Normalize the state to the valid speed range. */
 	state = max_t(unsigned long, MLXSW_THERMAL_MIN_STATE, state);
-	mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
+	mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx,
+			    mlxsw_state_to_duty(state));
 	err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
 	if (err) {
 		dev_err(dev, "Failed to write PWM duty\n");
@@ -753,17 +753,21 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
 	}
 	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
 		if (pwm_active & BIT(i)) {
+			struct mlxsw_thermal_cooling_device *mlxsw_cdev;
 			struct thermal_cooling_device *cdev;
 
+			mlxsw_cdev = &thermal->cdevs[i];
+			mlxsw_cdev->thermal = thermal;
+			mlxsw_cdev->idx = i;
 			cdev = thermal_cooling_device_register("mlxsw_fan",
-							       thermal,
+							       mlxsw_cdev,
 							       &mlxsw_cooling_ops);
 			if (IS_ERR(cdev)) {
 				err = PTR_ERR(cdev);
 				dev_err(dev, "Failed to register cooling device\n");
 				goto err_thermal_cooling_device_register;
 			}
-			thermal->cdevs[i] = cdev;
+			mlxsw_cdev->cdev = cdev;
 		}
 	}
 
@@ -824,8 +828,8 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
 err_thermal_zone_device_register:
 err_thermal_cooling_device_register:
 	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
-		if (thermal->cdevs[i])
-			thermal_cooling_device_unregister(thermal->cdevs[i]);
+		if (thermal->cdevs[i].cdev)
+			thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
 err_reg_write:
 err_reg_query:
 	kfree(thermal);
@@ -848,10 +852,8 @@ void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
 	}
 
 	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
-		if (thermal->cdevs[i]) {
-			thermal_cooling_device_unregister(thermal->cdevs[i]);
-			thermal->cdevs[i] = NULL;
-		}
+		if (thermal->cdevs[i].cdev)
+			thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
 	}
 
 	kfree(thermal);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems
  2024-06-17 16:55 [PATCH net 0/3] mlxsw: Fixes Petr Machata
  2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
  2024-06-17 16:56 ` [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure Petr Machata
@ 2024-06-17 16:56 ` Petr Machata
  2024-06-19 14:51   ` Simon Horman
  2 siblings, 1 reply; 8+ messages in thread
From: Petr Machata @ 2024-06-17 16:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev
  Cc: Ido Schimmel, Petr Machata, mlxsw, Amit Cohen

From: Ido Schimmel <idosch@nvidia.com>

The following two shared buffer operations make use of the Shared Buffer
Status Register (SBSR):

 # devlink sb occupancy snapshot pci/0000:01:00.0
 # devlink sb occupancy clearmax pci/0000:01:00.0

The register has two masks of 256 bits to denote on which ingress /
egress ports the register should operate on. Spectrum-4 has more than
256 ports, so the register was extended by cited commit with a new
'port_page' field.

However, when filling the register's payload, the driver specifies the
ports as absolute numbers and not relative to the first port of the port
page, resulting in memory corruptions [1].

Fix by specifying the ports relative to the first port of the port page.

[1]
BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
Read of size 1 at addr ffff8881068cb00f by task devlink/1566
[...]
Call Trace:
 <TASK>
 dump_stack_lvl+0xc6/0x120
 print_report+0xce/0x670
 kasan_report+0xd7/0x110
 mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
 mlxsw_devlink_sb_occ_snapshot+0x75/0xb0
 devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0
 genl_family_rcv_msg_doit+0x20c/0x300
 genl_rcv_msg+0x567/0x800
 netlink_rcv_skb+0x170/0x450
 genl_rcv+0x2d/0x40
 netlink_unicast+0x547/0x830
 netlink_sendmsg+0x8d4/0xdb0
 __sys_sendto+0x49b/0x510
 __x64_sys_sendto+0xe5/0x1c0
 do_syscall_64+0xc1/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
[...]
Allocated by task 1:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 __kasan_kmalloc+0x8f/0xa0
 copy_verifier_state+0xbc2/0xfb0
 do_check_common+0x2c51/0xc7e0
 bpf_check+0x5107/0x9960
 bpf_prog_load+0xf0e/0x2690
 __sys_bpf+0x1a61/0x49d0
 __x64_sys_bpf+0x7d/0xc0
 do_syscall_64+0xc1/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 1:
 kasan_save_stack+0x33/0x60
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x60
 poison_slab_object+0x109/0x170
 __kasan_slab_free+0x14/0x30
 kfree+0xca/0x2b0
 free_verifier_state+0xce/0x270
 do_check_common+0x4828/0xc7e0
 bpf_check+0x5107/0x9960
 bpf_prog_load+0xf0e/0x2690
 __sys_bpf+0x1a61/0x49d0
 __x64_sys_bpf+0x7d/0xc0
 do_syscall_64+0xc1/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Fixes: f8538aec88b4 ("mlxsw: Add support for more than 256 ports in SBSR register")
Cc: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
---
 .../mellanox/mlxsw/spectrum_buffers.c         | 20 +++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index 1b9ed393fbd4..2c0cfa79d138 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -1611,8 +1611,8 @@ static void mlxsw_sp_sb_sr_occ_query_cb(struct mlxsw_core *mlxsw_core,
 int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 			     unsigned int sb_index)
 {
+	u16 local_port, local_port_1, first_local_port, last_local_port;
 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
-	u16 local_port, local_port_1, last_local_port;
 	struct mlxsw_sp_sb_sr_occ_query_cb_ctx cb_ctx;
 	u8 masked_count, current_page = 0;
 	unsigned long cb_priv = 0;
@@ -1632,6 +1632,7 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 	masked_count = 0;
 	mlxsw_reg_sbsr_pack(sbsr_pl, false);
 	mlxsw_reg_sbsr_port_page_set(sbsr_pl, current_page);
+	first_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE;
 	last_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE +
 			  MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE - 1;
 
@@ -1649,9 +1650,12 @@ int mlxsw_sp_sb_occ_snapshot(struct mlxsw_core *mlxsw_core,
 		if (local_port != MLXSW_PORT_CPU_PORT) {
 			/* Ingress quotas are not supported for the CPU port */
 			mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
-							     local_port, 1);
+							     local_port - first_local_port,
+							     1);
 		}
-		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
+		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl,
+						    local_port - first_local_port,
+						    1);
 		for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
 			err = mlxsw_sp_sb_pm_occ_query(mlxsw_sp, local_port, i,
 						       &bulk_list);
@@ -1688,7 +1692,7 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 			      unsigned int sb_index)
 {
 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
-	u16 local_port, last_local_port;
+	u16 local_port, first_local_port, last_local_port;
 	LIST_HEAD(bulk_list);
 	unsigned int masked_count;
 	u8 current_page = 0;
@@ -1706,6 +1710,7 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 	masked_count = 0;
 	mlxsw_reg_sbsr_pack(sbsr_pl, true);
 	mlxsw_reg_sbsr_port_page_set(sbsr_pl, current_page);
+	first_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE;
 	last_local_port = current_page * MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE +
 			  MLXSW_REG_SBSR_NUM_PORTS_IN_PAGE - 1;
 
@@ -1723,9 +1728,12 @@ int mlxsw_sp_sb_occ_max_clear(struct mlxsw_core *mlxsw_core,
 		if (local_port != MLXSW_PORT_CPU_PORT) {
 			/* Ingress quotas are not supported for the CPU port */
 			mlxsw_reg_sbsr_ingress_port_mask_set(sbsr_pl,
-							     local_port, 1);
+							     local_port - first_local_port,
+							     1);
 		}
-		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl, local_port, 1);
+		mlxsw_reg_sbsr_egress_port_mask_set(sbsr_pl,
+						    local_port - first_local_port,
+						    1);
 		for (i = 0; i < mlxsw_sp->sb_vals->pool_count; i++) {
 			err = mlxsw_sp_sb_pm_occ_clear(mlxsw_sp, local_port, i,
 						       &bulk_list);
-- 
2.45.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure
  2024-06-17 16:56 ` [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure Petr Machata
@ 2024-06-17 19:53   ` Wysocki, Rafael J
  2024-06-18  6:55     ` Ido Schimmel
  0 siblings, 1 reply; 8+ messages in thread
From: Wysocki, Rafael J @ 2024-06-17 19:53 UTC (permalink / raw)
  To: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev
  Cc: Ido Schimmel, mlxsw, Lukasz Luba, Daniel Lezcano, Vadim Pasternak

On 6/17/2024 6:56 PM, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
>
> Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to
> thermal_debug_cdev_add()") changed the thermal core to read the current
> state of the cooling device as part of the cooling device's
> registration. This is incompatible with the current implementation of
> the cooling device operations in mlxsw, leading to initialization
> failure with errors such as:
>
> mlxsw_spectrum 0000:01:00.0: Failed to register cooling device
> mlxsw_spectrum 0000:01:00.0: cannot register bus device

Is this still a problem after

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=thermal&id=1af89dedc8a58006d8e385b1e0d2cd24df8a3b69

which has been merged into 6.10-rc4?

> The reason for the failure is that when the get current state operation
> is invoked the driver tries to derive the index of the cooling device by
> walking a per thermal zone array and looking for the matching cooling
> device pointer. However, the pointer is returned from the registration
> function and therefore only set in the array after the registration.
>
> Fix by passing to the registration function a per cooling device private
> data that already has the cooling device index populated.
>
> Decided to fix the issue in the driver since as far as I can tell other
> drivers do not suffer from this problem.
>
> Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()")
> Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: Lukasz Luba <lukasz.luba@arm.com>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>
> ---
>   .../ethernet/mellanox/mlxsw/core_thermal.c    | 50 ++++++++++---------
>   1 file changed, 26 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> index 5c511e1a8efa..eee3e37983ca 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
> @@ -100,6 +100,12 @@ static const struct mlxsw_cooling_states default_cooling_states[] = {
>   
>   struct mlxsw_thermal;
>   
> +struct mlxsw_thermal_cooling_device {
> +	struct mlxsw_thermal *thermal;
> +	struct thermal_cooling_device *cdev;
> +	unsigned int idx;
> +};
> +
>   struct mlxsw_thermal_module {
>   	struct mlxsw_thermal *parent;
>   	struct thermal_zone_device *tzdev;
> @@ -123,7 +129,7 @@ struct mlxsw_thermal {
>   	const struct mlxsw_bus_info *bus_info;
>   	struct thermal_zone_device *tzdev;
>   	int polling_delay;
> -	struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
> +	struct mlxsw_thermal_cooling_device cdevs[MLXSW_MFCR_PWMS_MAX];
>   	struct thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
>   	struct mlxsw_cooling_states cooling_states[MLXSW_THERMAL_NUM_TRIPS];
>   	struct mlxsw_thermal_area line_cards[];
> @@ -147,7 +153,7 @@ static int mlxsw_get_cooling_device_idx(struct mlxsw_thermal *thermal,
>   	int i;
>   
>   	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
> -		if (thermal->cdevs[i] == cdev)
> +		if (thermal->cdevs[i].cdev == cdev)
>   			return i;
>   
>   	/* Allow mlxsw thermal zone binding to an external cooling device */
> @@ -352,17 +358,14 @@ static int mlxsw_thermal_get_cur_state(struct thermal_cooling_device *cdev,
>   				       unsigned long *p_state)
>   
>   {
> -	struct mlxsw_thermal *thermal = cdev->devdata;
> +	struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
> +	struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
>   	struct device *dev = thermal->bus_info->dev;
>   	char mfsc_pl[MLXSW_REG_MFSC_LEN];
> -	int err, idx;
>   	u8 duty;
> +	int err;
>   
> -	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
> -	if (idx < 0)
> -		return idx;
> -
> -	mlxsw_reg_mfsc_pack(mfsc_pl, idx, 0);
> +	mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx, 0);
>   	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
>   	if (err) {
>   		dev_err(dev, "Failed to query PWM duty\n");
> @@ -378,22 +381,19 @@ static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
>   				       unsigned long state)
>   
>   {
> -	struct mlxsw_thermal *thermal = cdev->devdata;
> +	struct mlxsw_thermal_cooling_device *mlxsw_cdev = cdev->devdata;
> +	struct mlxsw_thermal *thermal = mlxsw_cdev->thermal;
>   	struct device *dev = thermal->bus_info->dev;
>   	char mfsc_pl[MLXSW_REG_MFSC_LEN];
> -	int idx;
>   	int err;
>   
>   	if (state > MLXSW_THERMAL_MAX_STATE)
>   		return -EINVAL;
>   
> -	idx = mlxsw_get_cooling_device_idx(thermal, cdev);
> -	if (idx < 0)
> -		return idx;
> -
>   	/* Normalize the state to the valid speed range. */
>   	state = max_t(unsigned long, MLXSW_THERMAL_MIN_STATE, state);
> -	mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
> +	mlxsw_reg_mfsc_pack(mfsc_pl, mlxsw_cdev->idx,
> +			    mlxsw_state_to_duty(state));
>   	err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
>   	if (err) {
>   		dev_err(dev, "Failed to write PWM duty\n");
> @@ -753,17 +753,21 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
>   	}
>   	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
>   		if (pwm_active & BIT(i)) {
> +			struct mlxsw_thermal_cooling_device *mlxsw_cdev;
>   			struct thermal_cooling_device *cdev;
>   
> +			mlxsw_cdev = &thermal->cdevs[i];
> +			mlxsw_cdev->thermal = thermal;
> +			mlxsw_cdev->idx = i;
>   			cdev = thermal_cooling_device_register("mlxsw_fan",
> -							       thermal,
> +							       mlxsw_cdev,
>   							       &mlxsw_cooling_ops);
>   			if (IS_ERR(cdev)) {
>   				err = PTR_ERR(cdev);
>   				dev_err(dev, "Failed to register cooling device\n");
>   				goto err_thermal_cooling_device_register;
>   			}
> -			thermal->cdevs[i] = cdev;
> +			mlxsw_cdev->cdev = cdev;
>   		}
>   	}
>   
> @@ -824,8 +828,8 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
>   err_thermal_zone_device_register:
>   err_thermal_cooling_device_register:
>   	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++)
> -		if (thermal->cdevs[i])
> -			thermal_cooling_device_unregister(thermal->cdevs[i]);
> +		if (thermal->cdevs[i].cdev)
> +			thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
>   err_reg_write:
>   err_reg_query:
>   	kfree(thermal);
> @@ -848,10 +852,8 @@ void mlxsw_thermal_fini(struct mlxsw_thermal *thermal)
>   	}
>   
>   	for (i = 0; i < MLXSW_MFCR_PWMS_MAX; i++) {
> -		if (thermal->cdevs[i]) {
> -			thermal_cooling_device_unregister(thermal->cdevs[i]);
> -			thermal->cdevs[i] = NULL;
> -		}
> +		if (thermal->cdevs[i].cdev)
> +			thermal_cooling_device_unregister(thermal->cdevs[i].cdev);
>   	}
>   
>   	kfree(thermal);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure
  2024-06-17 19:53   ` Wysocki, Rafael J
@ 2024-06-18  6:55     ` Ido Schimmel
  0 siblings, 0 replies; 8+ messages in thread
From: Ido Schimmel @ 2024-06-18  6:55 UTC (permalink / raw)
  To: Wysocki, Rafael J
  Cc: Petr Machata, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, mlxsw, Lukasz Luba, Daniel Lezcano,
	Vadim Pasternak

On Mon, Jun 17, 2024 at 09:53:59PM +0200, Wysocki, Rafael J wrote:
> On 6/17/2024 6:56 PM, Petr Machata wrote:
> > From: Ido Schimmel <idosch@nvidia.com>
> > 
> > Commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to
> > thermal_debug_cdev_add()") changed the thermal core to read the current
> > state of the cooling device as part of the cooling device's
> > registration. This is incompatible with the current implementation of
> > the cooling device operations in mlxsw, leading to initialization
> > failure with errors such as:
> > 
> > mlxsw_spectrum 0000:01:00.0: Failed to register cooling device
> > mlxsw_spectrum 0000:01:00.0: cannot register bus device
> 
> Is this still a problem after
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=thermal&id=1af89dedc8a58006d8e385b1e0d2cd24df8a3b69
> 
> which has been merged into 6.10-rc4?

No, cooling device registration does not fail after your patch.

However, I think it's still worth merging my patch since without it the
driver does not provide a valid initial state which should not happen.

Are you OK with us dropping this patch from v2 and targeting it instead
at net-next (with an updated commit message)?

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4
  2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
@ 2024-06-19 14:50   ` Simon Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2024-06-19 14:50 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Ido Schimmel, mlxsw, Maksym Yaremchuk

On Mon, Jun 17, 2024 at 06:56:00PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> Cited commit added support for a new reset flow ("all reset") which is
> deeper than the existing reset flow ("software reset") and allows the
> device's PCI firmware to be upgraded.
> 
> In the new flow the driver first tells the firmware that "all reset" is
> required by issuing a new reset command (i.e., MRSR.command=6) and then
> triggers the reset by having the PCI core issue a secondary bus reset
> (SBR).
> 
> However, due to a race condition in the device's firmware the device is
> not always able to recover from this reset, resulting in initialization
> failures [1].
> 
> New firmware versions include a fix for the bug and advertise it using a
> new capability bit in the Management Capabilities Mask (MCAM) register.
> 
> Avoid initialization failures by reading the new capability bit and
> triggering the new reset flow only if the bit is set. If the bit is not
> set, trigger a normal PCI hot reset by skipping the call to the
> Management Reset and Shutdown Register (MRSR).
> 
> Normal PCI hot reset is weaker than "all reset", but it results in a
> fully operational driver and allows users to flash a new firmware, if
> they want to.
> 
> [1]
> mlxsw_spectrum4 0000:01:00.0: not ready 1023ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 2047ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 4095ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 8191ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 16383ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 32767ms after bus reset; waiting
> mlxsw_spectrum4 0000:01:00.0: not ready 65535ms after bus reset; giving up
> mlxsw_spectrum4 0000:01:00.0: PCI function reset failed with -25
> mlxsw_spectrum4 0000:01:00.0: cannot register bus device
> mlxsw_spectrum4: probe of 0000:01:00.0 failed with error -25
> 
> Fixes: f257c73e5356 ("mlxsw: pci: Add support for new reset flow")
> Cc: Simon Horman <horms@kernel.org>
> Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Tested-by: Maksym Yaremchuk <maksymy@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems
  2024-06-17 16:56 ` [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems Petr Machata
@ 2024-06-19 14:51   ` Simon Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2024-06-19 14:51 UTC (permalink / raw)
  To: Petr Machata
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, Ido Schimmel, mlxsw, Amit Cohen

On Mon, Jun 17, 2024 at 06:56:02PM +0200, Petr Machata wrote:
> From: Ido Schimmel <idosch@nvidia.com>
> 
> The following two shared buffer operations make use of the Shared Buffer
> Status Register (SBSR):
> 
>  # devlink sb occupancy snapshot pci/0000:01:00.0
>  # devlink sb occupancy clearmax pci/0000:01:00.0
> 
> The register has two masks of 256 bits to denote on which ingress /
> egress ports the register should operate on. Spectrum-4 has more than
> 256 ports, so the register was extended by cited commit with a new
> 'port_page' field.
> 
> However, when filling the register's payload, the driver specifies the
> ports as absolute numbers and not relative to the first port of the port
> page, resulting in memory corruptions [1].
> 
> Fix by specifying the ports relative to the first port of the port page.
> 
> [1]
> BUG: KASAN: slab-use-after-free in mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
> Read of size 1 at addr ffff8881068cb00f by task devlink/1566
> [...]
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0xc6/0x120
>  print_report+0xce/0x670
>  kasan_report+0xd7/0x110
>  mlxsw_sp_sb_occ_snapshot+0xb6d/0xbc0
>  mlxsw_devlink_sb_occ_snapshot+0x75/0xb0
>  devlink_nl_sb_occ_snapshot_doit+0x1f9/0x2a0
>  genl_family_rcv_msg_doit+0x20c/0x300
>  genl_rcv_msg+0x567/0x800
>  netlink_rcv_skb+0x170/0x450
>  genl_rcv+0x2d/0x40
>  netlink_unicast+0x547/0x830
>  netlink_sendmsg+0x8d4/0xdb0
>  __sys_sendto+0x49b/0x510
>  __x64_sys_sendto+0xe5/0x1c0
>  do_syscall_64+0xc1/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [...]
> Allocated by task 1:
>  kasan_save_stack+0x33/0x60
>  kasan_save_track+0x14/0x30
>  __kasan_kmalloc+0x8f/0xa0
>  copy_verifier_state+0xbc2/0xfb0
>  do_check_common+0x2c51/0xc7e0
>  bpf_check+0x5107/0x9960
>  bpf_prog_load+0xf0e/0x2690
>  __sys_bpf+0x1a61/0x49d0
>  __x64_sys_bpf+0x7d/0xc0
>  do_syscall_64+0xc1/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> Freed by task 1:
>  kasan_save_stack+0x33/0x60
>  kasan_save_track+0x14/0x30
>  kasan_save_free_info+0x3b/0x60
>  poison_slab_object+0x109/0x170
>  __kasan_slab_free+0x14/0x30
>  kfree+0xca/0x2b0
>  free_verifier_state+0xce/0x270
>  do_check_common+0x4828/0xc7e0
>  bpf_check+0x5107/0x9960
>  bpf_prog_load+0xf0e/0x2690
>  __sys_bpf+0x1a61/0x49d0
>  __x64_sys_bpf+0x7d/0xc0
>  do_syscall_64+0xc1/0x1d0
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> 
> Fixes: f8538aec88b4 ("mlxsw: Add support for more than 256 ports in SBSR register")
> Cc: Amit Cohen <amcohen@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> Reviewed-by: Petr Machata <petrm@nvidia.com>
> Signed-off-by: Petr Machata <petrm@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-19 14:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 16:55 [PATCH net 0/3] mlxsw: Fixes Petr Machata
2024-06-17 16:56 ` [PATCH net 1/3] mlxsw: pci: Fix driver initialization with Spectrum-4 Petr Machata
2024-06-19 14:50   ` Simon Horman
2024-06-17 16:56 ` [PATCH net 2/3] mlxsw: core_thermal: Fix driver initialization failure Petr Machata
2024-06-17 19:53   ` Wysocki, Rafael J
2024-06-18  6:55     ` Ido Schimmel
2024-06-17 16:56 ` [PATCH net 3/3] mlxsw: spectrum_buffers: Fix memory corruptions on Spectrum-4 systems Petr Machata
2024-06-19 14:51   ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).