From: Mark Bloch <mbloch@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Shay Drory <shayd@nvidia.com>,
Or Har-Toov <ohartoov@nvidia.com>,
Edward Srouji <edwards@nvidia.com>,
Maher Sanalla <msanalla@nvidia.com>,
Simon Horman <horms@kernel.org>, Moshe Shemesh <moshe@nvidia.com>,
Kees Cook <kees@kernel.org>,
Patrisious Haddad <phaddad@nvidia.com>,
Gerd Bayer <gbayer@linux.ibm.com>,
Parav Pandit <parav@nvidia.com>, Cosmin Ratiu <cratiu@nvidia.com>,
Carolina Jubran <cjubran@nvidia.com>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next 6/7] net/mlx5: E-switch, load reps via work queue after registration
Date: Thu, 9 Apr 2026 21:02:19 +0300 [thread overview]
Message-ID: <99a9bf58-a2d9-4857-9f98-8ab771e94178@nvidia.com> (raw)
In-Reply-To: <20260409115550.156419-7-tariqt@nvidia.com>
On 09/04/2026 14:55, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@nvidia.com>
>
> mlx5_eswitch_register_vport_reps() merely sets the callbacks. The actual
> representor load/unload requires devlink locking and shouldn’t run from
> the registration context. Queue a work that acquires the devlink lock,
> loads all relevant reps. This lets load happen where the needed locks
> can be taken.
>
Sashiko writes:
> (Note: Due to system recitation restrictions, this report does not quote the
> original patch code directly. Instead, it provides a free-form summary of
> the questions and observations regarding the changes.)
> commit 8bc15ac3cf1f185bce88ba5499055d1546dae606
> Author: Mark Bloch <mbloch@nvidia.com>
> net/mlx5: E-switch, load reps via work queue after registration
> This commit shifts the representor load operation to an asynchronous
> workqueue so that the necessary devlink locks can be properly acquired,
> avoiding locking issues in the registration context.
> The commit message notes that actual representor load and unload require
> devlink locking and shouldn't run from the registration context. The patch
> successfully defers the load operation, but it appears the unload path
> in mlx5_eswitch_unregister_vport_reps() was not updated. Will
> mlx5_eswitch_unregister_vport_reps_blocked() and __unload_reps_all_vport()
> still be called synchronously without the devlink lock, leaving the
> teardown path unprotected?
This is not worse than the current behavior. The block/unblock mechanism
ensures correctness even without holding the devlink lock, although this
is not ideal.
Fixing the unload path to follow the same model is planned as a follow-up
series
> When mlx5_eswitch_register_vport_reps() queues the asynchronous work item,
> there doesn't seem to be a corresponding cancellation or flush in
> mlx5_eswitch_unregister_vport_reps(). If the registering module
> unregisters the representors and unloads before the work executes, could
> the delayed mlx5_eswitch_reload_reps() run and dereference stale callbacks,
> causing a use-after-free?
Each representor maintains state per rep type, and together with the
E-Switch generation counter we avoid executing work on stale instances.
If the E-Switch generation changes, the queued work will not proceed
with loading representors for the old state.
> The call to mlx5_esw_add_work() to enqueue the representor load operation
> might fail and return an error code like -ENOMEM. Since the return value
> isn't checked in mlx5_eswitch_register_vport_reps(), will a failure
> leave the representors in the REP_REGISTERED state but completely unloaded,
> without notifying the caller?
This change addresses the case where the eswitch is already in
switchdev mode and the IB module is loaded afterwards. Prior to this
patch, no representors would be created in that scenario.
After this change, representors are created when possible, improving the
overall user experience. If workqueue allocation fails (e.g. -ENOMEM),
there is little meaningful recovery we can provide in this context.
> In the new mlx5_eswitch_reload_reps_blocked() function, if loading the
> uplink representor fails, the function returns early. Similarly, any
The uplink representor is a prerequisite for all other representors.
If it fails to load, the rest cannot be meaningfully initialized, so it
is treated as a critical failure.
> failures when loading individual VF representors in the loop are not
> checked. Since this now runs asynchronously, are these errors permanently
> lost, potentially leaving the subsystem in a partially initialized state
> with no fallback?
Failures when loading individual vf representors are intentionally not
treated as fatal. The operation is best-effort, and partial
initialization is acceptable in this context.
Mark
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> .../mellanox/mlx5/core/eswitch_offloads.c | 34 +++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 4b626ffcfa8e..279490c0074c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -4535,6 +4535,38 @@ mlx5_eswitch_register_vport_reps_blocked(struct mlx5_eswitch *esw,
> }
> }
>
> +static void mlx5_eswitch_reload_reps_blocked(struct mlx5_eswitch *esw)
> +{
> + struct mlx5_vport *vport;
> + unsigned long i;
> +
> + if (esw->mode != MLX5_ESWITCH_OFFLOADS)
> + return;
> +
> + if (mlx5_esw_offloads_rep_load(esw, MLX5_VPORT_UPLINK))
> + return;
> +
> + mlx5_esw_for_each_vport(esw, i, vport) {
> + if (!vport)
> + continue;
> + if (!vport->enabled)
> + continue;
> + if (vport->vport == MLX5_VPORT_UPLINK)
> + continue;
> + if (!mlx5_eswitch_vport_has_rep(esw, vport->vport))
> + continue;
> +
> + mlx5_esw_offloads_rep_load(esw, vport->vport);
> + }
> +}
> +
> +static void mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
> +{
> + mlx5_esw_reps_block(esw);
> + mlx5_eswitch_reload_reps_blocked(esw);
> + mlx5_esw_reps_unblock(esw);
> +}
> +
> void mlx5_eswitch_register_vport_reps(struct mlx5_eswitch *esw,
> const struct mlx5_eswitch_rep_ops *ops,
> u8 rep_type)
> @@ -4542,6 +4574,8 @@ void mlx5_eswitch_register_vport_reps(struct mlx5_eswitch *esw,
> mlx5_esw_reps_block(esw);
> mlx5_eswitch_register_vport_reps_blocked(esw, ops, rep_type);
> mlx5_esw_reps_unblock(esw);
> +
> + mlx5_esw_add_work(esw, mlx5_eswitch_reload_reps);
> }
> EXPORT_SYMBOL(mlx5_eswitch_register_vport_reps);
>
next prev parent reply other threads:[~2026-04-09 18:02 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-09 11:55 [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-04-09 17:57 ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 2/7] net/mlx5: E-Switch, move work queue generation counter Tariq Toukan
2026-04-09 17:58 ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 3/7] net/mlx5: E-Switch, introduce generic work queue dispatch helper Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 4/7] net/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq Tariq Toukan
2026-04-09 18:01 ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 5/7] net/mlx5: E-Switch, block representors during reconfiguration Tariq Toukan
2026-04-09 18:02 ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-04-09 18:02 ` Mark Bloch [this message]
2026-04-09 11:55 ` [PATCH net-next 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-04-09 18:02 ` Mark Bloch
2026-04-09 18:20 ` [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Mark Bloch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=99a9bf58-a2d9-4857-9f98-8ab771e94178@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=cjubran@nvidia.com \
--cc=cratiu@nvidia.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=edwards@nvidia.com \
--cc=gal@nvidia.com \
--cc=gbayer@linux.ibm.com \
--cc=horms@kernel.org \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=moshe@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=ohartoov@nvidia.com \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox