netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] mlx5 misc fixes 2025-03-18
@ 2025-03-18 20:51 Tariq Toukan
  2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

Hi,

This small patchset provides misc bug fixes to the mlx5 core driver.

Thanks,
Tariq.


Mark Bloch (1):
  net/mlx5: LAG, reload representors on LAG creation failure

Moshe Shemesh (1):
  net/mlx5: Start health poll after enable hca

 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c |  4 ++++
 drivers/net/ethernet/mellanox/mlx5/core/main.c    | 15 +++++++--------
 2 files changed, 11 insertions(+), 8 deletions(-)


base-commit: daa624d3c2ddffdcbad140a9625a4064371db44f
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure
  2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan
@ 2025-03-18 20:51 ` Tariq Toukan
  2025-03-19  7:13   ` Michal Swiatkowski
  2025-03-19 11:36   ` Kalesh Anakkur Purayil
  2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan
  2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf
  2 siblings, 2 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

From: Mark Bloch <mbloch@nvidia.com>

When LAG creation fails, the driver reloads the RDMA devices. If RDMA
representors are present, they should also be reloaded. This step was
missed in the cited commit.

Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
index ed2ba272946b..6c9737c53734 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
@@ -1052,6 +1052,10 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
 		if (err) {
 			if (shared_fdb || roce_lag)
 				mlx5_lag_add_devices(ldev);
+			if (shared_fdb) {
+				mlx5_ldev_for_each(i, 0, ldev)
+					mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
+			}
 
 			return;
 		} else if (roce_lag) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 2/2] net/mlx5: Start health poll after enable hca
  2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan
  2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
@ 2025-03-18 20:51 ` Tariq Toukan
  2025-03-19  9:36   ` Michal Swiatkowski
  2025-03-19 11:35   ` Kalesh Anakkur Purayil
  2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf
  2 siblings, 2 replies; 8+ messages in thread
From: Tariq Toukan @ 2025-03-18 20:51 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Gal Pressman, Leon Romanovsky, Saeed Mahameed, Leon Romanovsky,
	Tariq Toukan, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

From: Moshe Shemesh <moshe@nvidia.com>

The health poll mechanism performs periodic checks to detect firmware
errors. One of the checks verifies the function is still enabled on
firmware side, but the function is enabled only after enable_hca command
completed. Start health poll after enable_hca command to avoid a race
between function enabled and first health polling.

Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ec956c4bcebd..7c3312d6aed9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
 	dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
 	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
 
-	mlx5_start_health_poll(dev);
-
 	err = mlx5_core_enable_hca(dev, 0);
 	if (err) {
 		mlx5_core_err(dev, "enable hca failed\n");
-		goto stop_health_poll;
+		goto err_cmd_cleanup;
 	}
 
+	mlx5_start_health_poll(dev);
+
 	err = mlx5_core_set_issi(dev);
 	if (err) {
 		mlx5_core_err(dev, "failed to set issi\n");
-		goto err_disable_hca;
+		goto stop_health_poll;
 	}
 
 	err = mlx5_satisfy_startup_pages(dev, 1);
 	if (err) {
 		mlx5_core_err(dev, "failed to allocate boot pages\n");
-		goto err_disable_hca;
+		goto stop_health_poll;
 	}
 
 	err = mlx5_tout_query_dtor(dev);
@@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
 
 reclaim_boot_pages:
 	mlx5_reclaim_startup_pages(dev);
-err_disable_hca:
-	mlx5_core_disable_hca(dev, 0);
 stop_health_poll:
 	mlx5_stop_health_poll(dev, boot);
+	mlx5_core_disable_hca(dev, 0);
 err_cmd_cleanup:
 	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
 	mlx5_cmd_disable(dev);
@@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
 static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot)
 {
 	mlx5_reclaim_startup_pages(dev);
-	mlx5_core_disable_hca(dev, 0);
 	mlx5_stop_health_poll(dev, boot);
+	mlx5_core_disable_hca(dev, 0);
 	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
 	mlx5_cmd_disable(dev);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure
  2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
@ 2025-03-19  7:13   ` Michal Swiatkowski
  2025-03-19 11:36   ` Kalesh Anakkur Purayil
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Swiatkowski @ 2025-03-19  7:13 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

On Tue, Mar 18, 2025 at 10:51:16PM +0200, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@nvidia.com>
> 
> When LAG creation fails, the driver reloads the RDMA devices. If RDMA
> representors are present, they should also be reloaded. This step was
> missed in the cited commit.
> 
> Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index ed2ba272946b..6c9737c53734 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1052,6 +1052,10 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
>  		if (err) {
>  			if (shared_fdb || roce_lag)
>  				mlx5_lag_add_devices(ldev);
> +			if (shared_fdb) {
> +				mlx5_ldev_for_each(i, 0, ldev)
> +					mlx5_eswitch_reload_ib_reps(ldev->pf[i].dev->priv.eswitch);
> +			}
>  
>  			return;
>  		} else if (roce_lag) {
> -- 

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

> 2.31.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/2] net/mlx5: Start health poll after enable hca
  2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan
@ 2025-03-19  9:36   ` Michal Swiatkowski
  2025-03-19 11:35   ` Kalesh Anakkur Purayil
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Swiatkowski @ 2025-03-19  9:36 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

On Tue, Mar 18, 2025 at 10:51:17PM +0200, Tariq Toukan wrote:
> From: Moshe Shemesh <moshe@nvidia.com>
> 
> The health poll mechanism performs periodic checks to detect firmware
> errors. One of the checks verifies the function is still enabled on
> firmware side, but the function is enabled only after enable_hca command
> completed. Start health poll after enable_hca command to avoid a race
> between function enabled and first health polling.
> 
> Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index ec956c4bcebd..7c3312d6aed9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1205,24 +1205,24 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
>  	dev->caps.embedded_cpu = mlx5_read_embedded_cpu(dev);
>  	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_UP);
>  
> -	mlx5_start_health_poll(dev);
> -
>  	err = mlx5_core_enable_hca(dev, 0);
>  	if (err) {
>  		mlx5_core_err(dev, "enable hca failed\n");
> -		goto stop_health_poll;
> +		goto err_cmd_cleanup;
>  	}
>  
> +	mlx5_start_health_poll(dev);
> +
>  	err = mlx5_core_set_issi(dev);
>  	if (err) {
>  		mlx5_core_err(dev, "failed to set issi\n");
> -		goto err_disable_hca;
> +		goto stop_health_poll;
>  	}
>  
>  	err = mlx5_satisfy_startup_pages(dev, 1);
>  	if (err) {
>  		mlx5_core_err(dev, "failed to allocate boot pages\n");
> -		goto err_disable_hca;
> +		goto stop_health_poll;
>  	}
>  
>  	err = mlx5_tout_query_dtor(dev);
> @@ -1235,10 +1235,9 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
>  
>  reclaim_boot_pages:
>  	mlx5_reclaim_startup_pages(dev);
> -err_disable_hca:
> -	mlx5_core_disable_hca(dev, 0);
>  stop_health_poll:
>  	mlx5_stop_health_poll(dev, boot);
> +	mlx5_core_disable_hca(dev, 0);
>  err_cmd_cleanup:
>  	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
>  	mlx5_cmd_disable(dev);
> @@ -1249,8 +1248,8 @@ static int mlx5_function_enable(struct mlx5_core_dev *dev, bool boot, u64 timeou
>  static void mlx5_function_disable(struct mlx5_core_dev *dev, bool boot)
>  {
>  	mlx5_reclaim_startup_pages(dev);
> -	mlx5_core_disable_hca(dev, 0);
>  	mlx5_stop_health_poll(dev, boot);
> +	mlx5_core_disable_hca(dev, 0);
>  	mlx5_cmd_set_state(dev, MLX5_CMDIF_STATE_DOWN);
>  	mlx5_cmd_disable(dev);
>  }

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 2/2] net/mlx5: Start health poll after enable hca
  2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan
  2025-03-19  9:36   ` Michal Swiatkowski
@ 2025-03-19 11:35   ` Kalesh Anakkur Purayil
  1 sibling, 0 replies; 8+ messages in thread
From: Kalesh Anakkur Purayil @ 2025-03-19 11:35 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Wed, Mar 19, 2025 at 2:22 AM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Moshe Shemesh <moshe@nvidia.com>
>
> The health poll mechanism performs periodic checks to detect firmware
> errors. One of the checks verifies the function is still enabled on
> firmware side, but the function is enabled only after enable_hca command
> completed. Start health poll after enable_hca command to avoid a race
> between function enabled and first health polling.
>
> Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load")
> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
-- 
Regards,
Kalesh AP

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4226 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure
  2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
  2025-03-19  7:13   ` Michal Swiatkowski
@ 2025-03-19 11:36   ` Kalesh Anakkur Purayil
  1 sibling, 0 replies; 8+ messages in thread
From: Kalesh Anakkur Purayil @ 2025-03-19 11:36 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Gal Pressman, Leon Romanovsky, Saeed Mahameed,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, Moshe Shemesh,
	Mark Bloch

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On Wed, Mar 19, 2025 at 2:22 AM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Mark Bloch <mbloch@nvidia.com>
>
> When LAG creation fails, the driver reloads the RDMA devices. If RDMA
> representors are present, they should also be reloaded. This step was
> missed in the cited commit.
>
> Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode")
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>


-- 
Regards,
Kalesh AP

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4226 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] mlx5 misc fixes 2025-03-18
  2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan
  2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
  2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan
@ 2025-03-24 22:30 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-03-24 22:30 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, gal, leonro, saeedm,
	leon, netdev, linux-rdma, linux-kernel, moshe, mbloch

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 18 Mar 2025 22:51:15 +0200 you wrote:
> Hi,
> 
> This small patchset provides misc bug fixes to the mlx5 core driver.
> 
> Thanks,
> Tariq.
> 
> [...]

Here is the summary with links:
  - [net,1/2] net/mlx5: LAG, reload representors on LAG creation failure
    https://git.kernel.org/netdev/net/c/bdf549a7a4d7
  - [net,2/2] net/mlx5: Start health poll after enable hca
    https://git.kernel.org/netdev/net/c/1726ad035cb0

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-24 22:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-18 20:51 [PATCH net 0/2] mlx5 misc fixes 2025-03-18 Tariq Toukan
2025-03-18 20:51 ` [PATCH net 1/2] net/mlx5: LAG, reload representors on LAG creation failure Tariq Toukan
2025-03-19  7:13   ` Michal Swiatkowski
2025-03-19 11:36   ` Kalesh Anakkur Purayil
2025-03-18 20:51 ` [PATCH net 2/2] net/mlx5: Start health poll after enable hca Tariq Toukan
2025-03-19  9:36   ` Michal Swiatkowski
2025-03-19 11:35   ` Kalesh Anakkur Purayil
2025-03-24 22:30 ` [PATCH net 0/2] mlx5 misc fixes 2025-03-18 patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).