* [PATCH net 0/2] mlx5 misc fixes 2025-03-18
@ 2025-03-18 20:51 Tariq Toukan
2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn
Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky,
Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
Mark Bloch
Hi,
This small patchset provides misc bug fixes to the mlx5 core driver.
Thanks,
Tariq.
Mark Bloch (1):
net/mlx5: LAG, reload representors on LAG creation failure
Moshe Shemesh (1):
net/mlx5: Start health poll after enable hca
drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++--------
2 files changed, 11 insertions(+), 8 deletions(-)
base-commit: daa624d3c2ddffdcbad140a9625a4064371db44f
--
2.31.1
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure 2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan @ 2025-03-18 20:51 ` Tariq Toukan 2025-03-19 7:13 ` Michal Swiatkowski 2025-03-19 11:36 ` Kalesh Anakkur Purayil 2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan 2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf 2 siblings, 2 replies; 8+ messages in thread From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw) To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch From: Mark Bloch <mbloch@nvidia.com> When LAG creation fails, the driver reloads the RDMA devices. If RDMA representors are present, they should also be reloaded. This step was missed in the cited commit. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c index ed2ba272946b..6c9737c53734 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c @@ -1052,6 +1052,10 @@ static void mlx5_do_bond(struct mlx5_lag *ldev) if (err) { if (shared_fdb || roce_lag) mlx5_lag_add_devices(ldev); + if (shared_fdb) { + mlx5_ldev_for_each(i, 0, ldev) + mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch); + } return; } else if (roce_lag) { -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure 2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan @ 2025-03-19 7:13 ` Michal Swiatkowski 2025-03-19 11:36 ` Kalesh Anakkur Purayil 1 sibling, 0 replies; 8+ messages in thread From: Michal Swiatkowski @ 2025-03-19 7:13 UTC (permalink / raw) To: Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch On Tue, Mar 18, 2025 at 10:51:16PM +0200, Tariq Toukan wrote: > From: Mark Bloch <mbloch@nvidia.com> > > When LAG creation fails, the driver reloads the RDMA devices. If RDMA > representors are present, they should also be reloaded. This step was > missed in the cited commit. > > Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c > index ed2ba272946b..6c9737c53734 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c > @@ -1052,6 +1052,10 @@ static void mlx5_do_bond(struct mlx5_lag *ldev) > if (err) { > if (shared_fdb || roce_lag) > mlx5_lag_add_devices(ldev); > + if (shared_fdb) { > + mlx5_ldev_for_each(i, 0, ldev) > + mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch); > + } > > return; > } else if (roce_lag) { > -- Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > 2.31.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure 2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan 2025-03-19 7:13 ` Michal Swiatkowski @ 2025-03-19 11:36 ` Kalesh Anakkur Purayil 1 sibling, 0 replies; 8+ messages in thread From: Kalesh Anakkur Purayil @ 2025-03-19 11:36 UTC (permalink / raw) To: Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch [-- Attachment #1: Type: text/plain, Size: 624 bytes --] On Wed, Mar 19, 2025 at 2:22 AM Tariq Toukan <tariqt@nvidia.com> wrote: > > From: Mark Bloch <mbloch@nvidia.com> > > When LAG creation fails, the driver reloads the RDMA devices. If RDMA > representors are present, they should also be reloaded. This step was > missed in the cited commit. > > Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") > Signed-off-by: Mark Bloch <mbloch@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> -- Regards, Kalesh AP [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4226 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net 2/2] net/mlx5: Start health poll after enable hca 2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan 2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan @ 2025-03-18 20:51 ` Tariq Toukan 2025-03-19 9:36 ` Michal Swiatkowski 2025-03-19 11:35 ` Kalesh Anakkur Purayil 2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf 2 siblings, 2 replies; 8+ messages in thread From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw) To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch From: Moshe Shemesh <moshe@nvidia.com> The health poll mechanism performs periodic checks to detect firmware errors. One of the checks verifies the function is still enabled on firmware side, but the function is enabled only after enable_hca command completed. Start health poll after enable_hca command to avoid a race between function enabled and first health polling. Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index ec956c4bcebd..7c3312d6aed9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev); mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP); - mlx5_start_health_poll(dev); - err = mlx5_core_enable_hca(dev, 0); if (err) { mlx5_core_err(dev, "enable hca failed\n"); - goto stop_health_poll; + goto err_cmd_cleanup; } + mlx5_start_health_poll(dev); + err = mlx5_core_set_issi(dev); if (err) { mlx5_core_err(dev, "failed to set issi\n"); - goto err_disable_hca; + goto stop_health_poll; } err = mlx5_satisfy_startup_pages(dev, 1); if (err) { mlx5_core_err(dev, "failed to allocate boot pages\n"); - goto err_disable_hca; + goto stop_health_poll; } err = mlx5_tout_query_dtor(dev); @@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou reclaim_boot_pages: mlx5_reclaim_startup_pages(dev); -err_disable_hca: - mlx5_core_disable_hca(dev, 0); stop_health_poll: mlx5_stop_health_poll(dev, boot); + mlx5_core_disable_hca(dev, 0); err_cmd_cleanup: mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); mlx5_cmd_disable(dev); @@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot) { mlx5_reclaim_startup_pages(dev); - mlx5_core_disable_hca(dev, 0); mlx5_stop_health_poll(dev, boot); + mlx5_core_disable_hca(dev, 0); mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); mlx5_cmd_disable(dev); } -- 2.31.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 2/2] net/mlx5: Start health poll after enable hca 2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan @ 2025-03-19 9:36 ` Michal Swiatkowski 2025-03-19 11:35 ` Kalesh Anakkur Purayil 1 sibling, 0 replies; 8+ messages in thread From: Michal Swiatkowski @ 2025-03-19 9:36 UTC (permalink / raw) To: Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch On Tue, Mar 18, 2025 at 10:51:17PM +0200, Tariq Toukan wrote: > From: Moshe Shemesh <moshe@nvidia.com> > > The health poll mechanism performs periodic checks to detect firmware > errors. One of the checks verifies the function is still enabled on > firmware side, but the function is enabled only after enable_hca command > completed. Start health poll after enable_hca command to avoid a race > between function enabled and first health polling. > > Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c > index ec956c4bcebd..7c3312d6aed9 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c > @@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev); > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP); > > - mlx5_start_health_poll(dev); > - > err = mlx5_core_enable_hca(dev, 0); > if (err) { > mlx5_core_err(dev, "enable hca failed\n"); > - goto stop_health_poll; > + goto err_cmd_cleanup; > } > > + mlx5_start_health_poll(dev); > + > err = mlx5_core_set_issi(dev); > if (err) { > mlx5_core_err(dev, "failed to set issi\n"); > - goto err_disable_hca; > + goto stop_health_poll; > } > > err = mlx5_satisfy_startup_pages(dev, 1); > if (err) { > mlx5_core_err(dev, "failed to allocate boot pages\n"); > - goto err_disable_hca; > + goto stop_health_poll; > } > > err = mlx5_tout_query_dtor(dev); > @@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > > reclaim_boot_pages: > mlx5_reclaim_startup_pages(dev); > -err_disable_hca: > - mlx5_core_disable_hca(dev, 0); > stop_health_poll: > mlx5_stop_health_poll(dev, boot); > + mlx5_core_disable_hca(dev, 0); > err_cmd_cleanup: > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); > mlx5_cmd_disable(dev); > @@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou > static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot) > { > mlx5_reclaim_startup_pages(dev); > - mlx5_core_disable_hca(dev, 0); > mlx5_stop_health_poll(dev, boot); > + mlx5_core_disable_hca(dev, 0); > mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN); > mlx5_cmd_disable(dev); > } Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> > -- > 2.31.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 2/2] net/mlx5: Start health poll after enable hca 2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan 2025-03-19 9:36 ` Michal Swiatkowski @ 2025-03-19 11:35 ` Kalesh Anakkur Purayil 1 sibling, 0 replies; 8+ messages in thread From: Kalesh Anakkur Purayil @ 2025-03-19 11:35 UTC (permalink / raw) To: Tariq Toukan Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh, Mark Bloch [-- Attachment #1: Type: text/plain, Size: 798 bytes --] On Wed, Mar 19, 2025 at 2:22 AM Tariq Toukan <tariqt@nvidia.com> wrote: > > From: Moshe Shemesh <moshe@nvidia.com> > > The health poll mechanism performs periodic checks to detect firmware > errors. One of the checks verifies the function is still enabled on > firmware side, but the function is enabled only after enable_hca command > completed. Start health poll after enable_hca command to avoid a race > between function enabled and first health polling. > > Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Reviewed-by: Shay Drori <shayd@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> -- Regards, Kalesh AP [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4226 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net 0/2] mlx5 misc fixes 2025-03-18 2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan 2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan 2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan @ 2025-03-24 22:30 ` patchwork-bot+netdevbpf 2 siblings, 0 replies; 8+ messages in thread From: patchwork-bot+netdevbpf @ 2025-03-24 22:30 UTC (permalink / raw) To: Tariq Toukan Cc: davem, kuba, pabeni, edumazet, andrew+netdev, gal, leonro, saeedm, leon, netdev, linux-rdma, linux-kernel, moshe, mbloch Hello: This series was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 18 Mar 2025 22:51:15 +0200 you wrote: > Hi, > > This small patchset provides misc bug fixes to the mlx5 core driver. > > Thanks, > Tariq. > > [...] Here is the summary with links: - [net,1/2] net/mlx5: LAG, reload representors on LAG creation failure https://git.kernel.org/netdev/net/c/bdf549a7a4d7 - [net,2/2] net/mlx5: Start health poll after enable hca https://git.kernel.org/netdev/net/c/1726ad035cb0 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-03-24 22:29 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan 2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan 2025-03-19 7:13 ` Michal Swiatkowski 2025-03-19 11:36 ` Kalesh Anakkur Purayil 2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan 2025-03-19 9:36 ` Michal Swiatkowski 2025-03-19 11:35 ` Kalesh Anakkur Purayil 2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).