From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
davem@davemloft.net, leon@kernel.org, jgg@ziepe.ca,
saeedm@nvidia.com, mbloch@nvidia.com, shayd@nvidia.com,
ohartoov@nvidia.com, edwards@nvidia.com, msanalla@nvidia.com,
horms@kernel.org, gbayer@linux.ibm.com, moshe@nvidia.com,
kees@kernel.org, phaddad@nvidia.com, parav@nvidia.com,
cjubran@nvidia.com, cratiu@nvidia.com,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, gal@nvidia.com, dtatulea@nvidia.com
Subject: Re: [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle
Date: Sat, 2 May 2026 18:42:24 -0700 [thread overview]
Message-ID: <20260503014224.4096089-1-kuba@kernel.org> (raw)
In-Reply-To: <20260501041633.231662-5-tariqt@nvidia.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: E-Switch, serialize representor lifecycle
This change replaces the ad-hoc bit/wait-queue coordination of E-Switch
representor lifecycle with the per-E-Switch reps_lock, adds nested
wrappers for the shared-FDB master IB path, and clears the rep_ops
pointer on unregister.
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 66a773a99876..f70737437954 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
[ ... ]
> @@ -1825,7 +1831,11 @@ void mlx5_eswitch_disable(struct mlx5_eswitch *esw)
>
> devl_assert_locked(priv_to_devlink(esw->dev));
> mlx5_lag_disable_change(esw->dev);
> +
> + mlx5_esw_reps_block(esw);
> mlx5_eswitch_disable_locked(esw);
> + mlx5_esw_reps_unblock(esw);
> +
> esw->mode = MLX5_ESWITCH_LEGACY;
> mlx5_lag_enable_change(esw->dev);
> }
The commit message says "Take the representor lock around the mode
transition", but the assignment esw->mode = MLX5_ESWITCH_LEGACY happens
here after reps_lock has been dropped.
Meanwhile mlx5_eswitch_unregister_vport_reps_blocked() now reads
esw->mode under reps_lock to decide whether to call
__unload_reps_all_vport():
if (esw->mode == MLX5_ESWITCH_OFFLOADS)
__unload_reps_all_vport(esw, rep_type);
Should the mode store be moved inside the reps_block/reps_unblock
region so the reader's assumption about stability under reps_lock
holds?
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 6a5143b63dfd..d4ac07c995b9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
[ ... ]
> @@ -2425,21 +2436,22 @@ void mlx5_esw_reps_unblock(struct mlx5_eswitch *esw)
>
> static void esw_mode_change(struct mlx5_eswitch *esw, u16 mode)
> {
> + mlx5_esw_reps_unblock(esw);
> mlx5_devcom_comp_lock(esw->dev->priv.hca_devcom_comp);
> if (esw->dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_IB_ADEV ||
> mlx5_core_mp_enabled(esw->dev)) {
> esw->mode = mode;
> - mlx5_rescan_drivers_locked(esw->dev);
> - mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
> - return;
> + goto out;
> }
>
> esw->dev->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
> mlx5_rescan_drivers_locked(esw->dev);
> esw->mode = mode;
> esw->dev->priv.flags &= ~MLX5_PRIV_FLAGS_DISABLE_IB_ADEV;
> +out:
> mlx5_rescan_drivers_locked(esw->dev);
> mlx5_devcom_comp_unlock(esw->dev->priv.hca_devcom_comp);
> + mlx5_esw_reps_block(esw);
> }
Both assignments to esw->mode in this function are reached after
reps_lock has been dropped at the top of the function.
mlx5_rescan_drivers_locked() can invoke driver bind/unbind which routes
through mlx5_eswitch_unregister_vport_reps(), and that path now takes
reps_lock and reads esw->mode to decide whether to call
__unload_reps_all_vport().
Can a rep-unregister that runs during the rescan observe a stale
esw->mode here and make the wrong unload decision?
[ ... ]
> @@ -4496,9 +4524,10 @@ mlx5_eswitch_vport_has_rep(const struct mlx5_eswitch *esw, u16 vport_num)
[ ... ]
> @@ -4525,9 +4585,35 @@ void mlx5_eswitch_unregister_vport_reps(struct mlx5_eswitch *esw, u8 rep_type)
>
> mlx5_esw_for_each_rep(esw, i, rep)
> atomic_set(&rep->rep_data[rep_type].state, REP_UNREGISTERED);
> +
> + esw->offloads.rep_ops[rep_type] = NULL;
> +}
This new NULL store on unregister appears to introduce a concurrent
NULL dereference in readers that do not hold reps_lock. Before this
patch, rep_ops[rep_type] was set once at register and never cleared,
so a REP_LOADED state check was sufficient to know the pointer was
valid.
mlx5_eswitch_get_proto_dev() is one such reader:
rep = mlx5_eswitch_get_rep(esw, vport);
if (atomic_read(&rep->rep_data[rep_type].state) == REP_LOADED &&
esw->offloads.rep_ops[rep_type]->get_proto_dev)
return esw->offloads.rep_ops[rep_type]->get_proto_dev(rep);
The state check and each pointer load here are independent reads. Can
a concurrent unregister on another CPU run between the state load and
the ->get_proto_dev dereference, transitioning the rep through
REP_REGISTERED/REP_UNREGISTERED and then storing NULL into rep_ops,
so the second load observes NULL?
mlx5_esw_offloads_rep_event_unpair() and mlx5_esw_offloads_pair() have
the same shape:
ops = esw->offloads.rep_ops[rep_type];
if (atomic_read(&rep->rep_data[rep_type].state) == REP_LOADED &&
ops->event)
ops->event(esw, rep, ...);
Neither path holds reps_lock; they run from the devcom PAIR/UNPAIR
event handler. Can ops be NULL here once an unregister has cleared
rep_ops?
If readers are expected to be safe, should they either take reps_lock,
be converted to RCU, or skip the NULL clear on unregister?
[ ... ]
--
pw-bot: cr
next prev parent reply other threads:[~2026-05-03 1:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 4:16 [PATCH net-next V2 0/7] net/mlx5: Improve representor lifecycle and allow switchdev by default Tariq Toukan
2026-05-01 4:16 ` [PATCH net-next V2 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-05-01 4:16 ` [PATCH net-next V2 2/7] net/mlx5: E-Switch, add representor lifecycle lock Tariq Toukan
2026-05-01 4:16 ` [PATCH net-next V2 3/7] net/mlx5: Lag, avoid LAG and representor lock cycles Tariq Toukan
2026-05-02 20:04 ` Mark Bloch
2026-05-01 4:16 ` [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle Tariq Toukan
2026-05-02 20:05 ` Mark Bloch
2026-05-03 1:42 ` Jakub Kicinski [this message]
2026-05-03 8:18 ` Mark Bloch
2026-05-01 4:16 ` [PATCH net-next V2 5/7] net/mlx5: E-Switch, unwind only newly loaded representor types Tariq Toukan
2026-05-02 20:06 ` Mark Bloch
2026-05-01 4:16 ` [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-05-02 20:07 ` Mark Bloch
2026-05-03 1:42 ` Jakub Kicinski
2026-05-03 8:01 ` Mark Bloch
2026-05-01 4:16 ` [PATCH net-next V2 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-05-02 20:08 ` Mark Bloch
2026-05-03 1:41 ` Jakub Kicinski
2026-05-03 7:51 ` Mark Bloch
2026-05-05 1:21 ` Jakub Kicinski
2026-05-05 2:00 ` Mark Bloch
2026-05-05 2:19 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260503014224.4096089-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=cjubran@nvidia.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=edwards@nvidia.com \
--cc=gal@nvidia.com \
--cc=gbayer@linux.ibm.com \
--cc=horms@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kees@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=ohartoov@nvidia.com \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox