public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Bloch <mbloch@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	Saeed Mahameed <saeedm@nvidia.com>, Shay Drory <shayd@nvidia.com>,
	Or Har-Toov <ohartoov@nvidia.com>,
	Edward Srouji <edwards@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Simon Horman <horms@kernel.org>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	Moshe Shemesh <moshe@nvidia.com>, Kees Cook <kees@kernel.org>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Parav Pandit <parav@nvidia.com>,
	Carolina Jubran <cjubran@nvidia.com>,
	Cosmin Ratiu <cratiu@nvidia.com>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration
Date: Sat, 2 May 2026 23:07:34 +0300	[thread overview]
Message-ID: <ae38e991-9b17-47db-b22b-9147335617c4@nvidia.com> (raw)
In-Reply-To: <20260501041633.231662-7-tariqt@nvidia.com>



On 01/05/2026 7:16, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@nvidia.com>
> 
> mlx5_eswitch_register_vport_reps() only installs representor callbacks and
> marks the rep type as registered. If the E-Switch is already in switchdev
> mode, the newly registered rep type must then be loaded for already enabled
> vports.
> 
> That load path needs to run under the devlink lock, which is not held by
> the auxiliary driver registration context. Queue the reload to the E-Switch
> workqueue, whose handler acquires the devlink lock, and load the relevant
> representors from there.
> 
> The unregister path is unchanged and still unloads representors
> synchronously while tearing down the registered callbacks.
> 
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---

sashiko.dev says:
"
[Note: The format of this report is altered due to recitation restrictions. Code quoting and inline diffs have been omitted.]
commit f4f3e4872e599ef6d13a1f2bacffd52e6dbc8d37
Author: Mark Bloch <mbloch@nvidia.com>
net/mlx5: E-switch, load reps via work queue after registration
This commit defers the loading of representors for already enabled vports to
a workqueue when the E-Switch is in switchdev mode. This ensures the load
path runs under the devlink lock, which is not held during auxiliary driver
registration.
I have a couple of questions regarding error handling in this update.
Looking at mlx5_eswitch_register_vport_reps_locked(), the return value of
mlx5_esw_add_work() is not checked. Since mlx5_esw_add_work() uses
kzalloc_obj() with GFP_ATOMIC, it can fail under memory pressure and return
-ENOMEM. If this allocation fails, the background work to load the
representors is never queued, but the representors are already marked as
REP_REGISTERED. Could this result in a NULL pointer dereference if callers
attempt to access the representor data under the assumption that it was
successfully initialized?
Additionally, in mlx5_eswitch_reload_reps_blocked(), failures from
mlx5_esw_offloads_rep_load() appear to be silently ignored. If the call
fails for MLX5_VPORT_UPLINK, the function aborts early without loading the
other vports. Within the loop for the remaining vports, the return value is
also ignored. Does this leave the system in a partially loaded and
inconsistent state, where representors are registered but uninitialized for
some vports?
"

REP_REGISTERED does not imply initialized representor data. Users that need
private data only proceed on REP_LOADED, and failed loads leave the state
REP_REGISTERED. The queued reload is for late rep-ops registration while already
in switchdev, where there is no synchronous error path back to the auxiliary
probe, so it is intentionally opportunistic. The strict path is still the
switchdev transition: there, representor load errors are propagated and the
mode change is aborted, so this does not change the consistency model.

Mark

>  .../mellanox/mlx5/core/eswitch_offloads.c     | 34 +++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 8f656253981b..f26d1652dd05 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -4563,6 +4563,38 @@ mlx5_eswitch_register_vport_reps_blocked(struct mlx5_eswitch *esw,
>  	}
>  }
>  
> +static void mlx5_eswitch_reload_reps_blocked(struct mlx5_eswitch *esw)
> +{
> +	struct mlx5_vport *vport;
> +	unsigned long i;
> +
> +	if (esw->mode != MLX5_ESWITCH_OFFLOADS)
> +		return;
> +
> +	if (mlx5_esw_offloads_rep_load(esw, MLX5_VPORT_UPLINK))
> +		return;
> +
> +	mlx5_esw_for_each_vport(esw, i, vport) {
> +		if (!vport)
> +			continue;
> +		if (!vport->enabled)
> +			continue;
> +		if (vport->vport == MLX5_VPORT_UPLINK)
> +			continue;
> +		if (!mlx5_eswitch_vport_has_rep(esw, vport->vport))
> +			continue;
> +
> +		mlx5_esw_offloads_rep_load(esw, vport->vport);
> +	}
> +}
> +
> +static void mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
> +{
> +	mlx5_esw_reps_block(esw);
> +	mlx5_eswitch_reload_reps_blocked(esw);
> +	mlx5_esw_reps_unblock(esw);
> +}
> +
>  static void
>  mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
>  					const struct mlx5_eswitch_rep_ops *ops,
> @@ -4574,6 +4606,8 @@ mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
>  		mlx5_esw_reps_block(esw);
>  	mlx5_eswitch_register_vport_reps_blocked(esw, ops, rep_type);
>  	mlx5_esw_reps_unblock(esw);
> +
> +	mlx5_esw_add_work(esw, mlx5_eswitch_reload_reps);
>  }
>  
>  void mlx5_eswitch_register_vport_reps(struct mlx5_eswitch *esw,


  reply	other threads:[~2026-05-02 20:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01  4:16 [PATCH net-next V2 0/7] net/mlx5: Improve representor lifecycle and allow switchdev by default Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 2/7] net/mlx5: E-Switch, add representor lifecycle lock Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 3/7] net/mlx5: Lag, avoid LAG and representor lock cycles Tariq Toukan
2026-05-02 20:04   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle Tariq Toukan
2026-05-02 20:05   ` Mark Bloch
2026-05-03  1:42   ` Jakub Kicinski
2026-05-03  8:18     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 5/7] net/mlx5: E-Switch, unwind only newly loaded representor types Tariq Toukan
2026-05-02 20:06   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-05-02 20:07   ` Mark Bloch [this message]
2026-05-03  1:42   ` Jakub Kicinski
2026-05-03  8:01     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-05-02 20:08   ` Mark Bloch
2026-05-03  1:41     ` Jakub Kicinski
2026-05-03  7:51       ` Mark Bloch
2026-05-05  1:21         ` Jakub Kicinski
2026-05-05  2:00           ` Mark Bloch
2026-05-05  2:19             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae38e991-9b17-47db-b22b-9147335617c4@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox