public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, leon@kernel.org, jgg@ziepe.ca,
	saeedm@nvidia.com, mbloch@nvidia.com, shayd@nvidia.com,
	ohartoov@nvidia.com, edwards@nvidia.com, msanalla@nvidia.com,
	horms@kernel.org, gbayer@linux.ibm.com, moshe@nvidia.com,
	kees@kernel.org, phaddad@nvidia.com, parav@nvidia.com,
	cjubran@nvidia.com, cratiu@nvidia.com,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, gal@nvidia.com, dtatulea@nvidia.com
Subject: Re: [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle
Date: Sat,  2 May 2026 18:42:24 -0700	[thread overview]
Message-ID: <20260503014224.4096089-1-kuba@kernel.org> (raw)
In-Reply-To: <20260501041633.231662-5-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: E-Switch, serialize representor lifecycle

This change replaces the ad-hoc bit/wait-queue coordination of E-Switch
representor lifecycle with the per-E-Switch reps_lock, adds nested
wrappers for the shared-FDB master IB path, and clears the rep_ops
pointer on unregister.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 66a773a99876..f70737437954 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c

[ ... ]

> @@ -1825,7 +1831,11 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
>  
>  	devl_assert_locked(priv_to_devlink(esw->dev));
>  	mlx5_lag_disable_change(esw->dev);
> +
> +	mlx5_esw_reps_block(esw);
>  	mlx5_eswitch_disable_locked(esw);
> +	mlx5_esw_reps_unblock(esw);
> +
>  	esw->mode = MLX5_ESWITCH_LEGACY;
>  	mlx5_lag_enable_change(esw->dev);
>  }

The commit message says "Take the representor lock around the mode
transition", but the assignment esw->mode = MLX5_ESWITCH_LEGACY happens
here after reps_lock has been dropped.

Meanwhile mlx5_eswitch_unregister_vport_reps_blocked() now reads
esw->mode under reps_lock to decide whether to call
__unload_reps_all_vport():

    if (esw->mode == MLX5_ESWITCH_OFFLOADS)
            __unload_reps_all_vport(esw, rep_type);

Should the mode store be moved inside the reps_block/reps_unblock
region so the reader's assumption about stability under reps_lock
holds?

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 6a5143b63dfd..d4ac07c995b9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

[ ... ]

> @@ -2425,21 +2436,22 @@ void mlx5_esw_reps_unblock(struct mlx5_eswitch *esw)
>  
>  static void esw_mode_change(struct mlx5_eswitch *esw, u16 mode)
>  {
> +	mlx5_esw_reps_unblock(esw);
>  	mlx5_devcom_comp_lock(esw->dev->priv.hca_devcom_comp);
>  	if (esw->dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_IB_ADEV ||
>  	    mlx5_core_mp_enabled(esw->dev)) {
>  		esw->mode = mode;
> -		mlx5_rescan_drivers_locked(esw->dev);
> -		mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
> -		return;
> +		goto out;
>  	}
>  
>  	esw->dev->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
>  	mlx5_rescan_drivers_locked(esw->dev);
>  	esw->mode = mode;
>  	esw->dev->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
> +out:
>  	mlx5_rescan_drivers_locked(esw->dev);
>  	mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
> +	mlx5_esw_reps_block(esw);
>  }

Both assignments to esw->mode in this function are reached after
reps_lock has been dropped at the top of the function.

mlx5_rescan_drivers_locked() can invoke driver bind/unbind which routes
through mlx5_eswitch_unregister_vport_reps(), and that path now takes
reps_lock and reads esw->mode to decide whether to call
__unload_reps_all_vport().

Can a rep-unregister that runs during the rescan observe a stale
esw->mode here and make the wrong unload decision?

[ ... ]

> @@ -4496,9 +4524,10 @@ mlx5_eswitch_vport_has_rep(const struct mlx5_eswitch *esw, u16 vport_num)

[ ... ]

> @@ -4525,9 +4585,35 @@ void mlx5_eswitch_unregister_vport_reps(struct mlx5_eswitch *esw, u8 rep_type)
>  
>  	mlx5_esw_for_each_rep(esw, i, rep)
>  		atomic_set(&rep->rep_data[rep_type].state, REP_UNREGISTERED);
> +
> +	esw->offloads.rep_ops[rep_type] = NULL;
> +}

This new NULL store on unregister appears to introduce a concurrent
NULL dereference in readers that do not hold reps_lock. Before this
patch, rep_ops[rep_type] was set once at register and never cleared,
so a REP_LOADED state check was sufficient to know the pointer was
valid.

mlx5_eswitch_get_proto_dev() is one such reader:

    rep = mlx5_eswitch_get_rep(esw, vport);
    if (atomic_read(&rep->rep_data[rep_type].state) == REP_LOADED &&
        esw->offloads.rep_ops[rep_type]->get_proto_dev)
            return esw->offloads.rep_ops[rep_type]->get_proto_dev(rep);

The state check and each pointer load here are independent reads. Can
a concurrent unregister on another CPU run between the state load and
the ->get_proto_dev dereference, transitioning the rep through
REP_REGISTERED/REP_UNREGISTERED and then storing NULL into rep_ops,
so the second load observes NULL?

mlx5_esw_offloads_rep_event_unpair() and mlx5_esw_offloads_pair() have
the same shape:

    ops = esw->offloads.rep_ops[rep_type];
    if (atomic_read(&rep->rep_data[rep_type].state) == REP_LOADED &&
        ops->event)
            ops->event(esw, rep, ...);

Neither path holds reps_lock; they run from the devcom PAIR/UNPAIR
event handler. Can ops be NULL here once an unregister has cleared
rep_ops?

If readers are expected to be safe, should they either take reps_lock,
be converted to RCU, or skip the NULL clear on unregister?

[ ... ]
-- 
pw-bot: cr

  parent reply	other threads:[~2026-05-03  1:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01  4:16 [PATCH net-next V2 0/7] net/mlx5: Improve representor lifecycle and allow switchdev by default Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 2/7] net/mlx5: E-Switch, add representor lifecycle lock Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 3/7] net/mlx5: Lag, avoid LAG and representor lock cycles Tariq Toukan
2026-05-02 20:04   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle Tariq Toukan
2026-05-02 20:05   ` Mark Bloch
2026-05-03  1:42   ` Jakub Kicinski [this message]
2026-05-03  8:18     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 5/7] net/mlx5: E-Switch, unwind only newly loaded representor types Tariq Toukan
2026-05-02 20:06   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-05-02 20:07   ` Mark Bloch
2026-05-03  1:42   ` Jakub Kicinski
2026-05-03  8:01     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-05-02 20:08   ` Mark Bloch
2026-05-03  1:41     ` Jakub Kicinski
2026-05-03  7:51       ` Mark Bloch
2026-05-05  1:21         ` Jakub Kicinski
2026-05-05  2:00           ` Mark Bloch
2026-05-05  2:19             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503014224.4096089-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kees@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox