All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, leon@kernel.org, jgg@ziepe.ca,
	saeedm@nvidia.com, mbloch@nvidia.com, shayd@nvidia.com,
	ohartoov@nvidia.com, edwards@nvidia.com, msanalla@nvidia.com,
	horms@kernel.org, gbayer@linux.ibm.com, moshe@nvidia.com,
	kees@kernel.org, phaddad@nvidia.com, parav@nvidia.com,
	cjubran@nvidia.com, cratiu@nvidia.com,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, gal@nvidia.com, dtatulea@nvidia.com
Subject: Re: [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration
Date: Sat,  2 May 2026 18:42:27 -0700	[thread overview]
Message-ID: <20260503014231.4096128-1-kuba@kernel.org> (raw)
In-Reply-To: <20260501041633.231662-7-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: E-switch, load reps via work queue after registration

This change defers loading of newly registered representors to the
E-Switch work queue so the handler can acquire the devlink lock that is
not held by the auxiliary driver registration context. The unregister
path continues to unload representors synchronously.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 8f656253981b..f26d1652dd05 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -4563,6 +4563,38 @@ mlx5_eswitch_register_vport_reps_blocked(struct mlx5_eswitch *esw,

[ ... ]

> +static void mlx5_eswitch_reload_reps(struct mlx5_eswitch *esw)
> +{
> +	mlx5_esw_reps_block(esw);
> +	mlx5_eswitch_reload_reps_blocked(esw);
> +	mlx5_esw_reps_unblock(esw);
> +}
> +
>  static void
>  mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
>  					const struct mlx5_eswitch_rep_ops *ops,
> @@ -4574,6 +4606,8 @@ mlx5_eswitch_register_vport_reps_locked(struct mlx5_eswitch *esw,
>  		mlx5_esw_reps_block(esw);
>  	mlx5_eswitch_register_vport_reps_blocked(esw, ops, rep_type);
>  	mlx5_esw_reps_unblock(esw);
> +
> +	mlx5_esw_add_work(esw, mlx5_eswitch_reload_reps);

The return value of mlx5_esw_add_work() is discarded here.

Looking at mlx5_esw_add_work():

    static int mlx5_esw_add_work(struct mlx5_eswitch *esw,
                                 void (*func)(struct mlx5_eswitch *esw))
    {
            struct mlx5_host_work *host_work;

            host_work = kzalloc_obj(*host_work, GFP_ATOMIC);
            if (!host_work)
                    return -ENOMEM;
            ...
    }

the allocation is GFP_ATOMIC and can fail, in which case no work is
queued.

If mlx5_esw_add_work() returns -ENOMEM while the E-Switch is already in
MLX5_ESWITCH_OFFLOADS mode, the reload that mlx5_eswitch_reload_reps()
was meant to perform never runs, so the newly registered rep type will
not be loaded for the already-enabled vports — which is the exact case
this commit is introducing infrastructure to handle.

Registration still returns success to the auxiliary driver
(mlx5e_rep_probe / mlx5r_rep_probe), with no log message, no retry, and
no way for the caller to notice the dropped reload.

The other existing caller, mlx5_esw_funcs_changed_handler(), does check
the return value of mlx5_esw_add_work(), suggesting allocation failure
is expected to be handled.

Should this call check the return value and at least emit a warning or
retry, so that a GFP_ATOMIC allocation failure during switchdev-mode
registration does not silently leave the representors unloaded?

>  }
>  
>  void mlx5_eswitch_register_vport_reps(struct mlx5_eswitch *esw,

  parent reply	other threads:[~2026-05-03  1:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-01  4:16 [PATCH net-next V2 0/7] net/mlx5: Improve representor lifecycle and allow switchdev by default Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 2/7] net/mlx5: E-Switch, add representor lifecycle lock Tariq Toukan
2026-05-01  4:16 ` [PATCH net-next V2 3/7] net/mlx5: Lag, avoid LAG and representor lock cycles Tariq Toukan
2026-05-02 20:04   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 4/7] net/mlx5: E-Switch, serialize representor lifecycle Tariq Toukan
2026-05-02 20:05   ` Mark Bloch
2026-05-03  1:42   ` Jakub Kicinski
2026-05-03  8:18     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 5/7] net/mlx5: E-Switch, unwind only newly loaded representor types Tariq Toukan
2026-05-02 20:06   ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-05-02 20:07   ` Mark Bloch
2026-05-03  1:42   ` Jakub Kicinski [this message]
2026-05-03  8:01     ` Mark Bloch
2026-05-01  4:16 ` [PATCH net-next V2 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-05-02 20:08   ` Mark Bloch
2026-05-03  1:41     ` Jakub Kicinski
2026-05-03  7:51       ` Mark Bloch
2026-05-05  1:21         ` Jakub Kicinski
2026-05-05  2:00           ` Mark Bloch
2026-05-05  2:19             ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503014231.4096128-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kees@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.