Re: [PATCH net-next 1/7] net/mlx5: Lag: refactor representor reload handling

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Mark Bloch <mbloch@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Shay Drory <shayd@nvidia.com>,
	Or Har-Toov <ohartoov@nvidia.com>,
	Edward Srouji <edwards@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Simon Horman <horms@kernel.org>, Moshe Shemesh <moshe@nvidia.com>,
	Kees Cook <kees@kernel.org>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	Parav Pandit <parav@nvidia.com>, Cosmin Ratiu <cratiu@nvidia.com>,
	Carolina Jubran <cjubran@nvidia.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next 1/7] net/mlx5: Lag: refactor representor reload handling
Date: Thu, 9 Apr 2026 20:57:04 +0300	[thread overview]
Message-ID: <f31e93c7-660f-4321-8db6-5c8e15689595@nvidia.com> (raw)
In-Reply-To: <20260409115550.156419-2-tariqt@nvidia.com>



On 09/04/2026 14:55, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@nvidia.com>
> 
> Representor reload during LAG/MPESW transitions has to be repeated in
> several flows, and each open‑coded loop was easy to get out of sync
> when adding new flags or tweaking error handling. Move the sequencing
> into a single helper so that all call sites share the same ordering
> and checks
> 
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Reviewed-by: Shay Drori <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 44 +++++++++++--------
>  .../net/ethernet/mellanox/mlx5/core/lag/lag.h |  1 +
>  .../ethernet/mellanox/mlx5/core/lag/mpesw.c   | 12 ++---
>  3 files changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> index 449e4bd86c06..c402a8463081 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> @@ -1093,6 +1093,27 @@ void mlx5_lag_remove_devices(struct mlx5_lag *ldev)
>  	}
>  }
>  
> +int mlx5_lag_reload_ib_reps(struct mlx5_lag *ldev, u32 flags)
> +{
> +	struct lag_func *pf;
> +	int ret;
> +	int i;
> +
> +	mlx5_ldev_for_each(i, 0, ldev) {
> +		pf = mlx5_lag_pf(ldev, i);
> +		if (!(pf->dev->priv.flags & flags)) {
> +			struct mlx5_eswitch *esw;
> +
> +			esw = pf->dev->priv.eswitch;
> +			ret = mlx5_eswitch_reload_ib_reps(esw);
> +			if (ret)
> +				return ret;

Sashiko says:
"Does this early return break best-effort teardown and error recovery paths?
The new helper aborts on the first error, but the open-coded loops it
replaces used to run unconditionally."

Aware of this behavioral change, it is intentional.

In practice, if reloading the reps fails on one device,
continuing the loop does not result in a functional system.
The remaining devices will not operate correctly, and attempting
to proceed may further destabilize the driver state.

By aborting early, we avoid partially reinitializing the system
and instead leave it in a consistent (albeit degraded) state. In
this case, IB devices are not (re)created, which prevents user
access to a driver that is already in a broken state.

Mark

> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  void mlx5_disable_lag(struct mlx5_lag *ldev)
>  {
>  	bool shared_fdb = test_bit(MLX5_LAG_MODE_FLAG_SHARED_FDB, &ldev->mode_flags);
> @@ -1130,9 +1151,7 @@ void mlx5_disable_lag(struct mlx5_lag *ldev)
>  		mlx5_lag_add_devices(ldev);
>  
>  	if (shared_fdb)
> -		mlx5_ldev_for_each(i, 0, ldev)
> -			if (!(mlx5_lag_pf(ldev, i)->dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV))
> -				mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> +		mlx5_lag_reload_ib_reps(ldev, MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV);
>  }
>  
>  bool mlx5_lag_shared_fdb_supported(struct mlx5_lag *ldev)
> @@ -1388,10 +1407,8 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
>  		if (err) {
>  			if (shared_fdb || roce_lag)
>  				mlx5_lag_add_devices(ldev);
> -			if (shared_fdb) {
> -				mlx5_ldev_for_each(i, 0, ldev)
> -					mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> -			}
> +			if (shared_fdb)
> +				mlx5_lag_reload_ib_reps(ldev, 0);
>  
>  			return;
>  		}
> @@ -1409,24 +1426,15 @@ static void mlx5_do_bond(struct mlx5_lag *ldev)
>  					mlx5_nic_vport_enable_roce(dev);
>  			}
>  		} else if (shared_fdb) {
> -			int i;
> -
>  			dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
>  			mlx5_rescan_drivers_locked(dev0);
> -
> -			mlx5_ldev_for_each(i, 0, ldev) {
> -				err = mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> -				if (err)
> -					break;
> -			}
> -
> +			err = mlx5_lag_reload_ib_reps(ldev, 0);
>  			if (err) {
>  				dev0->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
>  				mlx5_rescan_drivers_locked(dev0);
>  				mlx5_deactivate_lag(ldev);
>  				mlx5_lag_add_devices(ldev);
> -				mlx5_ldev_for_each(i, 0, ldev)
> -					mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> +				mlx5_lag_reload_ib_reps(ldev, 0);
>  				mlx5_core_err(dev0, "Failed to enable lag\n");
>  				return;
>  			}
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
> index 6c911374f409..db561e306fc7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.h
> @@ -199,4 +199,5 @@ int mlx5_get_next_ldev_func(struct mlx5_lag *ldev, int start_idx);
>  int mlx5_lag_get_dev_index_by_seq(struct mlx5_lag *ldev, int seq);
>  int mlx5_lag_num_devs(struct mlx5_lag *ldev);
>  int mlx5_lag_num_netdevs(struct mlx5_lag *ldev);
> +int mlx5_lag_reload_ib_reps(struct mlx5_lag *ldev, u32 flags);
>  #endif /* __MLX5_LAG_H__ */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
> index 5eea12a6887a..4d68e3092a56 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/mpesw.c
> @@ -70,7 +70,6 @@ static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
>  	int idx = mlx5_lag_get_dev_index_by_seq(ldev, MLX5_LAG_P1);
>  	struct mlx5_core_dev *dev0;
>  	int err;
> -	int i;
>  
>  	if (ldev->mode == MLX5_LAG_MODE_MPESW)
>  		return 0;
> @@ -103,11 +102,9 @@ static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
>  
>  	dev0->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
>  	mlx5_rescan_drivers_locked(dev0);
> -	mlx5_ldev_for_each(i, 0, ldev) {
> -		err = mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> -		if (err)
> -			goto err_rescan_drivers;
> -	}
> +	err = mlx5_lag_reload_ib_reps(ldev, 0);
> +	if (err)
> +		goto err_rescan_drivers;
>  
>  	mlx5_lag_set_vports_agg_speed(ldev);
>  
> @@ -119,8 +116,7 @@ static int mlx5_lag_enable_mpesw(struct mlx5_lag *ldev)
>  	mlx5_deactivate_lag(ldev);
>  err_add_devices:
>  	mlx5_lag_add_devices(ldev);
> -	mlx5_ldev_for_each(i, 0, ldev)
> -		mlx5_eswitch_reload_ib_reps(mlx5_lag_pf(ldev, i)->dev->priv.eswitch);
> +	mlx5_lag_reload_ib_reps(ldev, 0);
>  	mlx5_mpesw_metadata_cleanup(ldev);
>  	return err;
>  }

next prev parent reply	other threads:[~2026-04-09 17:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 11:55 [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-04-09 17:57   ` Mark Bloch [this message]
2026-04-09 11:55 ` [PATCH net-next 2/7] net/mlx5: E-Switch, move work queue generation counter Tariq Toukan
2026-04-09 17:58   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 3/7] net/mlx5: E-Switch, introduce generic work queue dispatch helper Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 4/7] net/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq Tariq Toukan
2026-04-09 18:01   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 5/7] net/mlx5: E-Switch, block representors during reconfiguration Tariq Toukan
2026-04-09 18:02   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-04-09 18:02   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-04-09 18:02   ` Mark Bloch
2026-04-09 18:20 ` [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Mark Bloch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f31e93c7-660f-4321-8db6-5c8e15689595@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox