* [PATCH net 0/5] mlx5 misc fixes
@ 2024-05-09 11:29 Tariq Toukan
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
Tariq Toukan
Hi,
This patchset provides bug fixes to mlx5 driver.
Patch 1 by Shay fixes the error flow in mlx5e_suspend().
Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow.
Patch 3 by Maher solves a deadlock in lag enable/disable.
Patches 4 and 5 by Akiva address issues in command interface corner cases.
Series generated against:
commit 393ceeb9211e ("Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'")
Thanks,
Tariq.
Akiva Goldberger (2):
net/mlx5: Add a timeout to acquire the command queue semaphore
net/mlx5: Discard command completions in internal error
Maher Sanalla (1):
net/mlx5: Reload only IB representors upon lag disable/enable
Shay Drory (2):
net/mlx5e: Fix netif state handling
net/mlx5: Fix peer devlink set for SF representor devlink port
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 44 +++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/en_main.c | 10 ++---
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 4 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 28 +++++++-----
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 6 +--
.../ethernet/mellanox/mlx5/core/lag/mpesw.c | 4 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 14 +++---
.../mellanox/mlx5/core/sf/dev/driver.c | 19 ++++----
include/linux/mlx5/driver.h | 1 +
9 files changed, 79 insertions(+), 51 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH net 1/5] net/mlx5e: Fix netif state handling
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
@ 2024-05-09 11:29 ` Tariq Toukan
2024-05-10 15:31 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port Tariq Toukan
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Shay Drory,
Tariq Toukan
From: Shay Drory <shayd@nvidia.com>
mlx5e_suspend cleans resources only if netif_device_present() returns
true. However, mlx5e_resume changes the state of netif, via
mlx5e_nic_enable, only if reg_state == NETREG_REGISTERED.
In the below case, the above leads to NULL-ptr Oops[1] and memory
leaks:
mlx5e_probe
_mlx5e_resume
mlx5e_attach_netdev
mlx5e_nic_enable <-- netdev not reg, not calling netif_device_attach()
register_netdev <-- failed for some reason.
ERROR_FLOW:
_mlx5e_suspend <-- netif_device_present return false, resources aren't freed :(
Hence, clean resources in this case as well.
[1]
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0010 [#1] SMP
CPU: 2 PID: 9345 Comm: test-ovs-ct-gen Not tainted 6.5.0_for_upstream_min_debug_2023_09_05_16_01 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:0x0
Code: Unable to access opcode bytes at0xffffffffffffffd6.
RSP: 0018:ffff888178aaf758 EFLAGS: 00010246
Call Trace:
<TASK>
? __die+0x20/0x60
? page_fault_oops+0x14c/0x3c0
? exc_page_fault+0x75/0x140
? asm_exc_page_fault+0x22/0x30
notifier_call_chain+0x35/0xb0
blocking_notifier_call_chain+0x3d/0x60
mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core]
mlx5_core_uplink_netdev_event_replay+0x3e/0x60 [mlx5_core]
mlx5_mdev_netdev_track+0x53/0x60 [mlx5_ib]
mlx5_ib_roce_init+0xc3/0x340 [mlx5_ib]
__mlx5_ib_add+0x34/0xd0 [mlx5_ib]
mlx5r_probe+0xe1/0x210 [mlx5_ib]
? auxiliary_match_id+0x6a/0x90
auxiliary_bus_probe+0x38/0x80
? driver_sysfs_add+0x51/0x80
really_probe+0xc9/0x3e0
? driver_probe_device+0x90/0x90
__driver_probe_device+0x80/0x160
driver_probe_device+0x1e/0x90
__device_attach_driver+0x7d/0x100
bus_for_each_drv+0x80/0xd0
__device_attach+0xbc/0x1f0
bus_probe_device+0x86/0xa0
device_add+0x637/0x840
__auxiliary_device_add+0x3b/0xa0
add_adev+0xc9/0x140 [mlx5_core]
mlx5_rescan_drivers_locked+0x22a/0x310 [mlx5_core]
mlx5_register_device+0x53/0xa0 [mlx5_core]
mlx5_init_one_devl_locked+0x5c4/0x9c0 [mlx5_core]
mlx5_init_one+0x3b/0x60 [mlx5_core]
probe_one+0x44c/0x730 [mlx5_core]
local_pci_probe+0x3e/0x90
pci_device_probe+0xbf/0x210
? kernfs_create_link+0x5d/0xa0
? sysfs_do_create_link_sd+0x60/0xc0
really_probe+0xc9/0x3e0
? driver_probe_device+0x90/0x90
__driver_probe_device+0x80/0x160
driver_probe_device+0x1e/0x90
__device_attach_driver+0x7d/0x100
bus_for_each_drv+0x80/0xd0
__device_attach+0xbc/0x1f0
pci_bus_add_device+0x54/0x80
pci_iov_add_virtfn+0x2e6/0x320
sriov_enable+0x208/0x420
mlx5_core_sriov_configure+0x9e/0x200 [mlx5_core]
sriov_numvfs_store+0xae/0x1a0
kernfs_fop_write_iter+0x10c/0x1a0
vfs_write+0x291/0x3c0
ksys_write+0x5f/0xe0
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
Fixes: 2c3b5beec46a ("net/mlx5e: More generic netdev management API")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 319930c04093..64497b6eebd3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -6058,7 +6058,7 @@ static int mlx5e_resume(struct auxiliary_device *adev)
return 0;
}
-static int _mlx5e_suspend(struct auxiliary_device *adev)
+static int _mlx5e_suspend(struct auxiliary_device *adev, bool pre_netdev_reg)
{
struct mlx5e_dev *mlx5e_dev = auxiliary_get_drvdata(adev);
struct mlx5e_priv *priv = mlx5e_dev->priv;
@@ -6067,7 +6067,7 @@ static int _mlx5e_suspend(struct auxiliary_device *adev)
struct mlx5_core_dev *pos;
int i;
- if (!netif_device_present(netdev)) {
+ if (!pre_netdev_reg && !netif_device_present(netdev)) {
if (test_bit(MLX5E_STATE_DESTROYING, &priv->state))
mlx5_sd_for_each_dev(i, mdev, pos)
mlx5e_destroy_mdev_resources(pos);
@@ -6090,7 +6090,7 @@ static int mlx5e_suspend(struct auxiliary_device *adev, pm_message_t state)
actual_adev = mlx5_sd_get_adev(mdev, adev, edev->idx);
if (actual_adev)
- err = _mlx5e_suspend(actual_adev);
+ err = _mlx5e_suspend(actual_adev, false);
mlx5_sd_cleanup(mdev);
return err;
@@ -6157,7 +6157,7 @@ static int _mlx5e_probe(struct auxiliary_device *adev)
return 0;
err_resume:
- _mlx5e_suspend(adev);
+ _mlx5e_suspend(adev, true);
err_profile_cleanup:
profile->cleanup(priv);
err_destroy_netdev:
@@ -6197,7 +6197,7 @@ static void _mlx5e_remove(struct auxiliary_device *adev)
mlx5_core_uplink_netdev_set(mdev, NULL);
mlx5e_dcbnl_delete_app(priv);
unregister_netdev(priv->netdev);
- _mlx5e_suspend(adev);
+ _mlx5e_suspend(adev, false);
priv->profile->cleanup(priv);
mlx5e_destroy_netdev(priv);
mlx5e_devlink_port_unregister(mlx5e_dev);
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
@ 2024-05-09 11:29 ` Tariq Toukan
2024-05-10 15:38 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable Tariq Toukan
` (3 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Shay Drory,
Moshe Shemesh, Tariq Toukan
From: Shay Drory <shayd@nvidia.com>
The cited patch change register devlink flow, and neglect to reflect
the changes for peer devlink set logic. Peer devlink set is
triggering a call trace if done after devl_register.[1]
Hence, align peer devlink set logic with register devlink flow.
[1]
WARNING: CPU: 4 PID: 3394 at net/devlink/core.c:155 devlink_rel_nested_in_add+0x177/0x180
CPU: 4 PID: 3394 Comm: kworker/u40:1 Not tainted 6.9.0-rc4_for_linust_min_debug_2024_04_16_14_08 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: mlx5_vhca_event0 mlx5_vhca_state_work_handler [mlx5_core]
RIP: 0010:devlink_rel_nested_in_add+0x177/0x180
Call Trace:
<TASK>
? __warn+0x78/0x120
? devlink_rel_nested_in_add+0x177/0x180
? report_bug+0x16d/0x180
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? devlink_port_init+0x30/0x30
? devlink_port_type_clear+0x50/0x50
? devlink_rel_nested_in_add+0x177/0x180
? devlink_rel_nested_in_add+0xdd/0x180
mlx5_sf_mdev_event+0x74/0xb0 [mlx5_core]
notifier_call_chain+0x35/0xb0
blocking_notifier_call_chain+0x3d/0x60
mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core]
mlx5_sf_dev_probe+0x185/0x3e0 [mlx5_core]
auxiliary_bus_probe+0x38/0x80
? driver_sysfs_add+0x51/0x80
really_probe+0xc5/0x3a0
? driver_probe_device+0x90/0x90
__driver_probe_device+0x80/0x160
driver_probe_device+0x1e/0x90
__device_attach_driver+0x7d/0x100
bus_for_each_drv+0x80/0xd0
__device_attach+0xbc/0x1f0
bus_probe_device+0x86/0xa0
device_add+0x64f/0x860
__auxiliary_device_add+0x3b/0xa0
mlx5_sf_dev_add+0x139/0x330 [mlx5_core]
mlx5_sf_dev_state_change_handler+0x1e4/0x250 [mlx5_core]
notifier_call_chain+0x35/0xb0
blocking_notifier_call_chain+0x3d/0x60
mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core]
process_one_work+0x13f/0x2e0
worker_thread+0x2bd/0x3c0
? rescuer_thread+0x410/0x410
kthread+0xc4/0xf0
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x2d/0x50
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
Fixes: bf729988303a ("net/mlx5: Restore mistakenly dropped parts in register devlink flow")
Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/main.c | 14 +++++---------
.../mellanox/mlx5/core/sf/dev/driver.c | 19 ++++++++-----------
2 files changed, 13 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 331ce47f51a1..6574c145dc1e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1680,6 +1680,8 @@ int mlx5_init_one_light(struct mlx5_core_dev *dev)
struct devlink *devlink = priv_to_devlink(dev);
int err;
+ devl_lock(devlink);
+ devl_register(devlink);
dev->state = MLX5_DEVICE_STATE_UP;
err = mlx5_function_enable(dev, true, mlx5_tout_ms(dev, FW_PRE_INIT_TIMEOUT));
if (err) {
@@ -1693,27 +1695,21 @@ int mlx5_init_one_light(struct mlx5_core_dev *dev)
goto query_hca_caps_err;
}
- devl_lock(devlink);
- devl_register(devlink);
-
err = mlx5_devlink_params_register(priv_to_devlink(dev));
if (err) {
mlx5_core_warn(dev, "mlx5_devlink_param_reg err = %d\n", err);
- goto params_reg_err;
+ goto query_hca_caps_err;
}
devl_unlock(devlink);
return 0;
-params_reg_err:
- devl_unregister(devlink);
- devl_unlock(devlink);
query_hca_caps_err:
- devl_unregister(devlink);
- devl_unlock(devlink);
mlx5_function_disable(dev, true);
out:
dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+ devl_unregister(devlink);
+ devl_unlock(devlink);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 7ebe71280827..b2986175d9af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -60,6 +60,13 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
goto remap_err;
}
+ /* Peer devlink logic expects to work on unregistered devlink instance. */
+ err = mlx5_core_peer_devlink_set(sf_dev, devlink);
+ if (err) {
+ mlx5_core_warn(mdev, "mlx5_core_peer_devlink_set err=%d\n", err);
+ goto peer_devlink_set_err;
+ }
+
if (MLX5_ESWITCH_MANAGER(sf_dev->parent_mdev))
err = mlx5_init_one_light(mdev);
else
@@ -69,20 +76,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
goto init_one_err;
}
- err = mlx5_core_peer_devlink_set(sf_dev, devlink);
- if (err) {
- mlx5_core_warn(mdev, "mlx5_core_peer_devlink_set err=%d\n", err);
- goto peer_devlink_set_err;
- }
-
return 0;
-peer_devlink_set_err:
- if (mlx5_dev_is_lightweight(sf_dev->mdev))
- mlx5_uninit_one_light(sf_dev->mdev);
- else
- mlx5_uninit_one(sf_dev->mdev);
init_one_err:
+peer_devlink_set_err:
iounmap(mdev->iseg);
remap_err:
mlx5_mdev_uninit(mdev);
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
2024-05-09 11:29 ` [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port Tariq Toukan
@ 2024-05-09 11:29 ` Tariq Toukan
2024-05-10 15:51 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 4/5] net/mlx5: Add a timeout to acquire the command queue semaphore Tariq Toukan
` (2 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
Maher Sanalla, Mark Bloch, Tariq Toukan
From: Maher Sanalla <msanalla@nvidia.com>
On lag disable, the bond IB device along with all of its
representors are destroyed, and then the slaves' representors get reloaded.
In case the slave IB representor load fails, the eswitch error flow
unloads all representors, including ethernet representors, where the
netdevs get detached and removed from lag bond. Such flow is inaccurate
as the lag driver is not responsible for loading/unloading ethernet
representors. Furthermore, the flow described above begins by holding
lag lock to prevent bond changes during disable flow. However, when
reaching the ethernet representors detachment from lag, the lag lock is
required again, triggering the following deadlock:
Call trace:
__switch_to+0xf4/0x148
__schedule+0x2c8/0x7d0
schedule+0x50/0xe0
schedule_preempt_disabled+0x18/0x28
__mutex_lock.isra.13+0x2b8/0x570
__mutex_lock_slowpath+0x1c/0x28
mutex_lock+0x4c/0x68
mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]
mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]
mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]
mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]
mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]
mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]
mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]
mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core]
mlx5_disable_lag+0x130/0x138 [mlx5_core]
mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock
mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]
devlink_nl_cmd_eswitch_set_doit+0xdc/0x180
genl_family_rcv_msg_doit.isra.17+0xe8/0x138
genl_rcv_msg+0xe4/0x220
netlink_rcv_skb+0x44/0x108
genl_rcv+0x40/0x58
netlink_unicast+0x198/0x268
netlink_sendmsg+0x1d4/0x418
sock_sendmsg+0x54/0x60
__sys_sendto+0xf4/0x120
__arm64_sys_sendto+0x30/0x40
el0_svc_common+0x8c/0x120
do_el0_svc+0x30/0xa0
el0_svc+0x20/0x30
el0_sync_handler+0x90/0xb8
el0_sync+0x160/0x180
Thus, upon lag enable/disable, load and unload only the IB representors
of the slaves preventing the deadlock mentioned above.
While at it, refactor the mlx5_esw_offloads_rep_load() function to have
a static helper method for its internal logic, in symmetry with the
representor unload design.
Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
Co-developed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 4 +--
.../mellanox/mlx5/core/eswitch_offloads.c | 28 ++++++++++++-------
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 6 ++--
.../ethernet/mellanox/mlx5/core/lag/mpesw.c | 4 +--
4 files changed, 25 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 349e28a6dd8d..ef55674876cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -833,7 +833,7 @@ int mlx5_eswitch_offloads_single_fdb_add_one(struct mlx5_eswitch *master_esw,
struct mlx5_eswitch *slave_esw, int max_slaves);
void mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw,
struct mlx5_eswitch *slave_esw);
-int mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw);
+int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw);
bool mlx5_eswitch_block_encap(struct mlx5_core_dev *dev);
void mlx5_eswitch_unblock_encap(struct mlx5_core_dev *dev);
@@ -925,7 +925,7 @@ mlx5_eswitch_offloads_single_fdb_del_one(struct mlx5_eswitch *master_esw,
static inline int mlx5_eswitch_get_npeers(struct mlx5_eswitch *esw) { return 0; }
static inline int
-mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
+mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw)
{
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 844d3e3a65dd..e8caf12f4c4f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -2502,6 +2502,16 @@ void esw_offloads_cleanup(struct mlx5_eswitch *esw)
esw_offloads_cleanup_reps(esw);
}
+static int __esw_offloads_load_rep(struct mlx5_eswitch *esw,
+ struct mlx5_eswitch_rep *rep, u8 rep_type)
+{
+ if (atomic_cmpxchg(&rep->rep_data[rep_type].state,
+ REP_REGISTERED, REP_LOADED) == REP_REGISTERED)
+ return esw->offloads.rep_ops[rep_type]->load(esw->dev, rep);
+
+ return 0;
+}
+
static void __esw_offloads_unload_rep(struct mlx5_eswitch *esw,
struct mlx5_eswitch_rep *rep, u8 rep_type)
{
@@ -2526,13 +2536,11 @@ static int mlx5_esw_offloads_rep_load(struct mlx5_eswitch *esw, u16 vport_num)
int err;
rep = mlx5_eswitch_get_rep(esw, vport_num);
- for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++)
- if (atomic_cmpxchg(&rep->rep_data[rep_type].state,
- REP_REGISTERED, REP_LOADED) == REP_REGISTERED) {
- err = esw->offloads.rep_ops[rep_type]->load(esw->dev, rep);
- if (err)
- goto err_reps;
- }
+ for (rep_type = 0; rep_type < NUM_REP_TYPES; rep_type++) {
+ err = __esw_offloads_load_rep(esw, rep, rep_type);
+ if (err)
+ goto err_reps;
+ }
return 0;
@@ -3277,7 +3285,7 @@ static void esw_destroy_offloads_acl_tables(struct mlx5_eswitch *esw)
esw_vport_destroy_offloads_acl_tables(esw, vport);
}
-int mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
+int mlx5_eswitch_reload_ib_reps(struct mlx5_eswitch *esw)
{
struct mlx5_eswitch_rep *rep;
unsigned long i;
@@ -3290,13 +3298,13 @@ int mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
if (atomic_read(&rep->rep_data[REP_ETH].state) != REP_LOADED)
return 0;
- ret = mlx5_esw_offloads_rep_load(esw, MLX5_VPORT_UPLINK);
+ ret = __esw_offloads_load_rep(esw, rep, REP_IB);
if (ret)
return ret;
mlx5_esw_for_each_rep(esw, i, rep) {
if (atomic_read(&rep->rep_data[REP_ETH].state) == REP_LOADED)
- mlx5_esw_offloads_rep_load(esw, rep->vport);
+ __esw_offloads_load_rep(esw, rep, REP_IB);
}
return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index 69d482f7c5a2..37598d116f3b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -814,7 +814,7 @@ void mlx5_disable_lag(struct mlx5_lag *ldev)
if (shared_fdb)
for (i = 0; i < ldev->ports; i++)
if (!(ldev->pf[i].dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV))
- mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+ mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
}
static bool mlx5_shared_fdb_supported(struct mlx5_lag *ldev)
@@ -922,7 +922,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
mlx5_rescan_drivers_locked(dev0);
for (i = 0; i < ldev->ports; i++) {
- err = mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+ err = mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
if (err)
break;
}
@@ -933,7 +933,7 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
mlx5_deactivate_lag(ldev);
mlx5_lag_add_devices(ldev);
for (i = 0; i < ldev->ports; i++)
- mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+ mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
mlx5_core_err(dev0, "Failed to enable lag\n");
return;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
index 82889f30506e..571ea26edd0c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
@@ -99,7 +99,7 @@ static int enable_mpesw(struct mlx5_lag *ldev)
dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
mlx5_rescan_drivers_locked(dev0);
for (i = 0; i < ldev->ports; i++) {
- err = mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+ err = mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
if (err)
goto err_rescan_drivers;
}
@@ -113,7 +113,7 @@ static int enable_mpesw(struct mlx5_lag *ldev)
err_add_devices:
mlx5_lag_add_devices(ldev);
for (i = 0; i < ldev->ports; i++)
- mlx5_eswitch_reload_reps(ldev->pf[i].dev->priv.eswitch);
+ mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
mlx5_mpesw_metadata_cleanup(ldev);
return err;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 4/5] net/mlx5: Add a timeout to acquire the command queue semaphore
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
` (2 preceding siblings ...)
2024-05-09 11:29 ` [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable Tariq Toukan
@ 2024-05-09 11:29 ` Tariq Toukan
2024-05-09 11:29 ` [PATCH net 5/5] net/mlx5: Discard command completions in internal error Tariq Toukan
2024-05-11 2:50 ` [PATCH net 0/5] mlx5 misc fixes patchwork-bot+netdevbpf
5 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
Akiva Goldberger, Moshe Shemesh, Tariq Toukan
From: Akiva Goldberger <agoldberger@nvidia.com>
Prevent forced completion handling on an entry that has not yet been
assigned an index, causing an out of bounds access on idx = -22.
Instead of waiting indefinitely for the sem, blocking flow now waits for
index to be allocated or a sem acquisition timeout before beginning the
timer for FW completion.
Kernel log example:
mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion
Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 41 +++++++++++++++----
include/linux/mlx5/driver.h | 1 +
2 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 4957412ff1f6..511e7fee39ac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -969,19 +969,32 @@ static void cmd_work_handler(struct work_struct *work)
bool poll_cmd = ent->polling;
struct mlx5_cmd_layout *lay;
struct mlx5_core_dev *dev;
- unsigned long cb_timeout;
- struct semaphore *sem;
+ unsigned long timeout;
unsigned long flags;
int alloc_ret;
int cmd_mode;
+ complete(&ent->handling);
+
dev = container_of(cmd, struct mlx5_core_dev, cmd);
- cb_timeout = msecs_to_jiffies(mlx5_tout_ms(dev, CMD));
+ timeout = msecs_to_jiffies(mlx5_tout_ms(dev, CMD));
- complete(&ent->handling);
- sem = ent->page_queue ? &cmd->vars.pages_sem : &cmd->vars.sem;
- down(sem);
if (!ent->page_queue) {
+ if (down_timeout(&cmd->vars.sem, timeout)) {
+ mlx5_core_warn(dev, "%s(0x%x) timed out while waiting for a slot.\n",
+ mlx5_command_str(ent->op), ent->op);
+ if (ent->callback) {
+ ent->callback(-EBUSY, ent->context);
+ mlx5_free_cmd_msg(dev, ent->out);
+ free_msg(dev, ent->in);
+ cmd_ent_put(ent);
+ } else {
+ ent->ret = -EBUSY;
+ complete(&ent->done);
+ }
+ complete(&ent->slotted);
+ return;
+ }
alloc_ret = cmd_alloc_index(cmd, ent);
if (alloc_ret < 0) {
mlx5_core_err_rl(dev, "failed to allocate command entry\n");
@@ -994,10 +1007,11 @@ static void cmd_work_handler(struct work_struct *work)
ent->ret = -EAGAIN;
complete(&ent->done);
}
- up(sem);
+ up(&cmd->vars.sem);
return;
}
} else {
+ down(&cmd->vars.pages_sem);
ent->idx = cmd->vars.max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);
clear_bit(ent->idx, &cmd->vars.bitmask);
@@ -1005,6 +1019,8 @@ static void cmd_work_handler(struct work_struct *work)
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
}
+ complete(&ent->slotted);
+
lay = get_inst(cmd, ent->idx);
ent->lay = lay;
memset(lay, 0, sizeof(*lay));
@@ -1023,7 +1039,7 @@ static void cmd_work_handler(struct work_struct *work)
ent->ts1 = ktime_get_ns();
cmd_mode = cmd->mode;
- if (ent->callback && schedule_delayed_work(&ent->cb_timeout_work, cb_timeout))
+ if (ent->callback && schedule_delayed_work(&ent->cb_timeout_work, timeout))
cmd_ent_get(ent);
set_bit(MLX5_CMD_ENT_STATE_PENDING_COMP, &ent->state);
@@ -1143,6 +1159,9 @@ static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent)
ent->ret = -ECANCELED;
goto out_err;
}
+
+ wait_for_completion(&ent->slotted);
+
if (cmd->mode == CMD_MODE_POLLING || ent->polling)
wait_for_completion(&ent->done);
else if (!wait_for_completion_timeout(&ent->done, timeout))
@@ -1157,6 +1176,9 @@ static int wait_func(struct mlx5_core_dev *dev, struct mlx5_cmd_work_ent *ent)
} else if (err == -ECANCELED) {
mlx5_core_warn(dev, "%s(0x%x) canceled on out of queue timeout.\n",
mlx5_command_str(ent->op), ent->op);
+ } else if (err == -EBUSY) {
+ mlx5_core_warn(dev, "%s(0x%x) timeout while waiting for command semaphore.\n",
+ mlx5_command_str(ent->op), ent->op);
}
mlx5_core_dbg(dev, "err %d, delivery status %s(%d)\n",
err, deliv_status_to_str(ent->status), ent->status);
@@ -1208,6 +1230,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
ent->polling = force_polling;
init_completion(&ent->handling);
+ init_completion(&ent->slotted);
if (!callback)
init_completion(&ent->done);
@@ -1225,7 +1248,7 @@ static int mlx5_cmd_invoke(struct mlx5_core_dev *dev, struct mlx5_cmd_msg *in,
return 0; /* mlx5_cmd_comp_handler() will put(ent) */
err = wait_func(dev, ent);
- if (err == -ETIMEDOUT || err == -ECANCELED)
+ if (err == -ETIMEDOUT || err == -ECANCELED || err == -EBUSY)
goto out_free;
ds = ent->ts2 - ent->ts1;
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index bf9324a31ae9..80452bd98253 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -862,6 +862,7 @@ struct mlx5_cmd_work_ent {
void *context;
int idx;
struct completion handling;
+ struct completion slotted;
struct completion done;
struct mlx5_cmd *cmd;
struct work_struct work;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH net 5/5] net/mlx5: Discard command completions in internal error
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
` (3 preceding siblings ...)
2024-05-09 11:29 ` [PATCH net 4/5] net/mlx5: Add a timeout to acquire the command queue semaphore Tariq Toukan
@ 2024-05-09 11:29 ` Tariq Toukan
2024-05-11 2:50 ` [PATCH net 0/5] mlx5 misc fixes patchwork-bot+netdevbpf
5 siblings, 0 replies; 10+ messages in thread
From: Tariq Toukan @ 2024-05-09 11:29 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
Cc: netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
Akiva Goldberger, Moshe Shemesh, Tariq Toukan
From: Akiva Goldberger <agoldberger@nvidia.com>
Fix use after free when FW completion arrives while device is in
internal error state. Avoid calling completion handler in this case,
since the device will flush the command interface and trigger all
completions manually.
Kernel log:
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
...
RIP: 0010:refcount_warn_saturate+0xd8/0xe0
...
Call Trace:
<IRQ>
? __warn+0x79/0x120
? refcount_warn_saturate+0xd8/0xe0
? report_bug+0x17c/0x190
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? refcount_warn_saturate+0xd8/0xe0
cmd_ent_put+0x13b/0x160 [mlx5_core]
mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core]
cmd_comp_notifier+0x1f/0x30 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
mlx5_eq_async_int+0xf6/0x290 [mlx5_core]
notifier_call_chain+0x35/0xb0
atomic_notifier_call_chain+0x16/0x20
irq_int_handler+0x19/0x30 [mlx5_core]
__handle_irq_event_percpu+0x4b/0x160
handle_irq_event+0x2e/0x80
handle_edge_irq+0x98/0x230
__common_interrupt+0x3b/0xa0
common_interrupt+0x7b/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 511e7fee39ac..20768ef2e9d2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1634,6 +1634,9 @@ static int cmd_comp_notifier(struct notifier_block *nb,
dev = container_of(cmd, struct mlx5_core_dev, cmd);
eqe = data;
+ if (dev->state == MLX5_DEVICE_STATE_INTERNAL_ERROR)
+ return NOTIFY_DONE;
+
mlx5_cmd_comp_handler(dev, be32_to_cpu(eqe->data.cmd.vector), false);
return NOTIFY_OK;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH net 1/5] net/mlx5e: Fix netif state handling
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
@ 2024-05-10 15:31 ` Simon Horman
0 siblings, 0 replies; 10+ messages in thread
From: Simon Horman @ 2024-05-10 15:31 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Shay Drory
On Thu, May 09, 2024 at 02:29:47PM +0300, Tariq Toukan wrote:
> From: Shay Drory <shayd@nvidia.com>
>
> mlx5e_suspend cleans resources only if netif_device_present() returns
> true. However, mlx5e_resume changes the state of netif, via
> mlx5e_nic_enable, only if reg_state == NETREG_REGISTERED.
> In the below case, the above leads to NULL-ptr Oops[1] and memory
> leaks:
>
> mlx5e_probe
> _mlx5e_resume
> mlx5e_attach_netdev
> mlx5e_nic_enable <-- netdev not reg, not calling netif_device_attach()
> register_netdev <-- failed for some reason.
> ERROR_FLOW:
> _mlx5e_suspend <-- netif_device_present return false, resources aren't freed :(
>
> Hence, clean resources in this case as well.
>
> [1]
> BUG: kernel NULL pointer dereference, address: 0000000000000000
...
> Fixes: 2c3b5beec46a ("net/mlx5e: More generic netdev management API")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Hi,
I think that this bug is caused by asymmetry in resource allocation/freeing
such that there are cases where _mlx5e_suspend() doesn't unwind
_mlx5e_resume().
It seems to me that asymmetry was introduced by the check for
reg_state != NETREG_REGISTERED in mlx5e_nic_enable() by:
610e89e05c3f ("net/mlx5e: Don't sync netdev state when not registered")
So perhaps that is a more appropriate commit for the Fixes tag.
I do note that commit was a fix for:
26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks")
So perhaps a second fixes tag for that commit is also appropriate.
Perhaps it's not important enough to revise things, I don't feel strongly
about it, so feel free to add the following regardless.
Reviewed-by: Simon Horman <horms@kernel.org>
All that said, I do wonder if it would be better in the long run to
implement things in such a way that there is symmetry in resource
allocation / deallocation. Passing flags to control how much cleanup is
performed does seem a bit awkward.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port
2024-05-09 11:29 ` [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port Tariq Toukan
@ 2024-05-10 15:38 ` Simon Horman
0 siblings, 0 replies; 10+ messages in thread
From: Simon Horman @ 2024-05-10 15:38 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky, Shay Drory,
Moshe Shemesh
On Thu, May 09, 2024 at 02:29:48PM +0300, Tariq Toukan wrote:
> From: Shay Drory <shayd@nvidia.com>
>
> The cited patch change register devlink flow, and neglect to reflect
> the changes for peer devlink set logic. Peer devlink set is
> triggering a call trace if done after devl_register.[1]
>
> Hence, align peer devlink set logic with register devlink flow.
>
> [1]
> WARNING: CPU: 4 PID: 3394 at net/devlink/core.c:155 devlink_rel_nested_in_add+0x177/0x180
> CPU: 4 PID: 3394 Comm: kworker/u40:1 Not tainted 6.9.0-rc4_for_linust_min_debug_2024_04_16_14_08 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> Workqueue: mlx5_vhca_event0 mlx5_vhca_state_work_handler [mlx5_core]
> RIP: 0010:devlink_rel_nested_in_add+0x177/0x180
> Call Trace:
> <TASK>
> ? __warn+0x78/0x120
> ? devlink_rel_nested_in_add+0x177/0x180
> ? report_bug+0x16d/0x180
> ? handle_bug+0x3c/0x60
> ? exc_invalid_op+0x14/0x70
> ? asm_exc_invalid_op+0x16/0x20
> ? devlink_port_init+0x30/0x30
> ? devlink_port_type_clear+0x50/0x50
> ? devlink_rel_nested_in_add+0x177/0x180
> ? devlink_rel_nested_in_add+0xdd/0x180
> mlx5_sf_mdev_event+0x74/0xb0 [mlx5_core]
> notifier_call_chain+0x35/0xb0
> blocking_notifier_call_chain+0x3d/0x60
> mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core]
> mlx5_sf_dev_probe+0x185/0x3e0 [mlx5_core]
> auxiliary_bus_probe+0x38/0x80
> ? driver_sysfs_add+0x51/0x80
> really_probe+0xc5/0x3a0
> ? driver_probe_device+0x90/0x90
> __driver_probe_device+0x80/0x160
> driver_probe_device+0x1e/0x90
> __device_attach_driver+0x7d/0x100
> bus_for_each_drv+0x80/0xd0
> __device_attach+0xbc/0x1f0
> bus_probe_device+0x86/0xa0
> device_add+0x64f/0x860
> __auxiliary_device_add+0x3b/0xa0
> mlx5_sf_dev_add+0x139/0x330 [mlx5_core]
> mlx5_sf_dev_state_change_handler+0x1e4/0x250 [mlx5_core]
> notifier_call_chain+0x35/0xb0
> blocking_notifier_call_chain+0x3d/0x60
> mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core]
> process_one_work+0x13f/0x2e0
> worker_thread+0x2bd/0x3c0
> ? rescuer_thread+0x410/0x410
> kthread+0xc4/0xf0
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x2d/0x50
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork_asm+0x11/0x20
> </TASK>
>
> Fixes: bf729988303a ("net/mlx5: Restore mistakenly dropped parts in register devlink flow")
> Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock")
Hi Tariq, Shay, all,
I agree that this patch addresses problems introduced by both of the
commits cited above. But I also note that they are both fixes for the
following commit. So I wonder if it should be cited in a Fixes tag too.
cf530217408e ("devlink: Notify users when objects are accessible")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
The above notwithstanding this looks good to me.
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable
2024-05-09 11:29 ` [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable Tariq Toukan
@ 2024-05-10 15:51 ` Simon Horman
0 siblings, 0 replies; 10+ messages in thread
From: Simon Horman @ 2024-05-10 15:51 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
netdev, Saeed Mahameed, Gal Pressman, Leon Romanovsky,
Maher Sanalla, Mark Bloch
On Thu, May 09, 2024 at 02:29:49PM +0300, Tariq Toukan wrote:
> From: Maher Sanalla <msanalla@nvidia.com>
>
> On lag disable, the bond IB device along with all of its
> representors are destroyed, and then the slaves' representors get reloaded.
>
> In case the slave IB representor load fails, the eswitch error flow
> unloads all representors, including ethernet representors, where the
> netdevs get detached and removed from lag bond. Such flow is inaccurate
> as the lag driver is not responsible for loading/unloading ethernet
> representors. Furthermore, the flow described above begins by holding
> lag lock to prevent bond changes during disable flow. However, when
> reaching the ethernet representors detachment from lag, the lag lock is
> required again, triggering the following deadlock:
>
> Call trace:
> __switch_to+0xf4/0x148
> __schedule+0x2c8/0x7d0
> schedule+0x50/0xe0
> schedule_preempt_disabled+0x18/0x28
> __mutex_lock.isra.13+0x2b8/0x570
> __mutex_lock_slowpath+0x1c/0x28
> mutex_lock+0x4c/0x68
> mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core]
> mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core]
> mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core]
> mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core]
> mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core]
> mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core]
> mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core]
> mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core]
> mlx5_disable_lag+0x130/0x138 [mlx5_core]
> mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock
> mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core]
> devlink_nl_cmd_eswitch_set_doit+0xdc/0x180
> genl_family_rcv_msg_doit.isra.17+0xe8/0x138
> genl_rcv_msg+0xe4/0x220
> netlink_rcv_skb+0x44/0x108
> genl_rcv+0x40/0x58
> netlink_unicast+0x198/0x268
> netlink_sendmsg+0x1d4/0x418
> sock_sendmsg+0x54/0x60
> __sys_sendto+0xf4/0x120
> __arm64_sys_sendto+0x30/0x40
> el0_svc_common+0x8c/0x120
> do_el0_svc+0x30/0xa0
> el0_svc+0x20/0x30
> el0_sync_handler+0x90/0xb8
> el0_sync+0x160/0x180
>
> Thus, upon lag enable/disable, load and unload only the IB representors
> of the slaves preventing the deadlock mentioned above.
>
> While at it, refactor the mlx5_esw_offloads_rep_load() function to have
> a static helper method for its internal logic, in symmetry with the
> representor unload design.
>
> Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
> Co-developed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH net 0/5] mlx5 misc fixes
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
` (4 preceding siblings ...)
2024-05-09 11:29 ` [PATCH net 5/5] net/mlx5: Discard command completions in internal error Tariq Toukan
@ 2024-05-11 2:50 ` patchwork-bot+netdevbpf
5 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-11 2:50 UTC (permalink / raw)
To: Tariq Toukan; +Cc: davem, kuba, pabeni, edumazet, netdev, saeedm, gal, leonro
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 9 May 2024 14:29:46 +0300 you wrote:
> Hi,
>
> This patchset provides bug fixes to mlx5 driver.
>
> Patch 1 by Shay fixes the error flow in mlx5e_suspend().
> Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow.
> Patch 3 by Maher solves a deadlock in lag enable/disable.
> Patches 4 and 5 by Akiva address issues in command interface corner cases.
>
> [...]
Here is the summary with links:
- [net,1/5] net/mlx5e: Fix netif state handling
https://git.kernel.org/netdev/net/c/3d5918477f94
- [net,2/5] net/mlx5: Fix peer devlink set for SF representor devlink port
https://git.kernel.org/netdev/net/c/3c453e8cc672
- [net,3/5] net/mlx5: Reload only IB representors upon lag disable/enable
https://git.kernel.org/netdev/net/c/0f06228d4a2d
- [net,4/5] net/mlx5: Add a timeout to acquire the command queue semaphore
https://git.kernel.org/netdev/net/c/485d65e13571
- [net,5/5] net/mlx5: Discard command completions in internal error
https://git.kernel.org/netdev/net/c/db9b31aa9bc5
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-05-11 2:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-09 11:29 [PATCH net 0/5] mlx5 misc fixes Tariq Toukan
2024-05-09 11:29 ` [PATCH net 1/5] net/mlx5e: Fix netif state handling Tariq Toukan
2024-05-10 15:31 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 2/5] net/mlx5: Fix peer devlink set for SF representor devlink port Tariq Toukan
2024-05-10 15:38 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 3/5] net/mlx5: Reload only IB representors upon lag disable/enable Tariq Toukan
2024-05-10 15:51 ` Simon Horman
2024-05-09 11:29 ` [PATCH net 4/5] net/mlx5: Add a timeout to acquire the command queue semaphore Tariq Toukan
2024-05-09 11:29 ` [PATCH net 5/5] net/mlx5: Discard command completions in internal error Tariq Toukan
2024-05-11 2:50 ` [PATCH net 0/5] mlx5 misc fixes patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).