From: Shay Drori <shayd@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
"Andrew Lunn" <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Mark Bloch <mbloch@nvidia.com>,
"Leon Romanovsky" <leon@kernel.org>,
Simon Horman <horms@kernel.org>, Kees Cook <kees@kernel.org>,
Patrisious Haddad <phaddad@nvidia.com>,
Parav Pandit <parav@nvidia.com>, Gal Pressman <gal@nvidia.com>,
<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
<linux-kernel@vger.kernel.org>,
Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [PATCH net V3 1/4] net/mlx5: SD: Serialize init/cleanup
Date: Sun, 26 Apr 2026 13:46:07 +0300 [thread overview]
Message-ID: <60dbc1e0-97b8-497c-86bf-90a0f75d6d18@nvidia.com> (raw)
In-Reply-To: <20260423123104.201552-2-tariqt@nvidia.com>
On 23/04/2026 15:31, Tariq Toukan wrote:
> From: Shay Drory <shayd@nvidia.com>
>
> mlx5_sd_init() / mlx5_sd_cleanup() may run from multiple PFs in the same
> Socket-Direct group. This can cause the SD bring-up/tear-down sequence
> to be executed more than once or interleaved across PFs.
>
> Protect SD init/cleanup with mlx5_devcom_comp_lock() and track the SD
> group state on the primary device. Skip init if the primary is already
> UP, and skip cleanup unless the primary is UP.
Sashiko:
"The commit message mentions skipping cleanup unless the primary is UP.
However, it appears this state check is missing from mlx5_sd_cleanup()
in the diff below."
The above sentence is leftover and should be removed.
will drop in next version.
>
> In addition, move mlx5_devcom_comp_set_ready(false) from sd_unregister()
> into the cleanup's locked section. A concurrent init acquiring the
> devcom lock will now observe devcom is no longer ready and bail out
> immediately.
>
> Fixes: 381978d28317 ("net/mlx5e: Create single netdev per SD group")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
> .../net/ethernet/mellanox/mlx5/core/lib/sd.c | 32 +++++++++++++++----
> 1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> index 762c783156b4..96b4316f570e 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> @@ -18,6 +18,7 @@ struct mlx5_sd {
> u8 host_buses;
> struct mlx5_devcom_comp_dev *devcom;
> struct dentry *dfs;
> + u8 state;
> bool primary;
> union {
> struct { /* primary */
> @@ -31,6 +32,11 @@ struct mlx5_sd {
> };
> };
>
> +enum mlx5_sd_state {
> + MLX5_SD_STATE_DOWN = 0,
> + MLX5_SD_STATE_UP,
> +};
> +
> static int mlx5_sd_get_host_buses(struct mlx5_core_dev *dev)
> {
> struct mlx5_sd *sd = mlx5_get_sd(dev);
> @@ -270,9 +276,6 @@ static void sd_unregister(struct mlx5_core_dev *dev)
> {
> struct mlx5_sd *sd = mlx5_get_sd(dev);
>
> - mlx5_devcom_comp_lock(sd->devcom);
> - mlx5_devcom_comp_set_ready(sd->devcom, false);
> - mlx5_devcom_comp_unlock(sd->devcom);
> mlx5_devcom_unregister_component(sd->devcom);
> }
>
> @@ -426,6 +429,7 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
> struct mlx5_core_dev *primary, *pos, *to;
> struct mlx5_sd *sd = mlx5_get_sd(dev);
> u8 alias_key[ACCESS_KEY_LEN];
> + struct mlx5_sd *primary_sd;
> int err, i;
>
> err = sd_init(dev);
> @@ -440,10 +444,15 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
> if (err)
> goto err_sd_cleanup;
>
> + mlx5_devcom_comp_lock(sd->devcom);
> if (!mlx5_devcom_comp_is_ready(sd->devcom))
> - return 0;
> + goto out;
Sashiko:
"Can primary be NULL here?
In sd_register(), the devcom ready state is published under the devcom
lock, but the lock is then released before the peer_sd->primary_dev
pointers are initialized.
If a concurrent thread executing mlx5_sd_init() or mlx5_sd_cleanup()
acquires the lock and observes the ready state, could it read an
uninitialized primary_dev before the loop in sd_register() completes?"
No, this is impossible. concurrent init will always set primary before
accessing it, and cleanup is always after successful init, so again-
primary is set.
and the next comment is also impossible seqence
>
> primary = mlx5_sd_get_primary(dev);
> + primary_sd = mlx5_get_sd(primary);
> +
> + if (primary_sd->state != MLX5_SD_STATE_DOWN)
> + goto out;
>
> for (i = 0; i < ACCESS_KEY_LEN; i++)
> alias_key[i] = get_random_u8();
> @@ -472,6 +481,9 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
> sd->group_id, mlx5_devcom_comp_get_size(sd->devcom));
> sd_print_group(primary);
>
> + primary_sd->state = MLX5_SD_STATE_UP;
> +out:
> + mlx5_devcom_comp_unlock(sd->devcom);
> return 0;
>
> err_unset_secondaries:
> @@ -481,6 +493,8 @@ int mlx5_sd_init(struct mlx5_core_dev *dev)
> sd_cmd_unset_primary(primary);
> debugfs_remove_recursive(sd->dfs);
> err_sd_unregister:
> + mlx5_devcom_comp_set_ready(sd->devcom, false);
> + mlx5_devcom_comp_unlock(sd->devcom);
> sd_unregister(dev);
> err_sd_cleanup:
> sd_cleanup(dev);
> @@ -491,22 +505,28 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
> {
> struct mlx5_sd *sd = mlx5_get_sd(dev);
> struct mlx5_core_dev *primary, *pos;
> + struct mlx5_sd *primary_sd;
> int i;
>
> if (!sd)
> return;
>
> + mlx5_devcom_comp_lock(sd->devcom);
> if (!mlx5_devcom_comp_is_ready(sd->devcom))
> - goto out;
> + goto out_unlock;
>
> primary = mlx5_sd_get_primary(dev);
> + primary_sd = mlx5_get_sd(primary);
> mlx5_sd_for_each_secondary(i, primary, pos)
> sd_cmd_unset_secondary(pos);
> sd_cmd_unset_primary(primary);
> debugfs_remove_recursive(sd->dfs);
>
> sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
> -out:
> + primary_sd->state = MLX5_SD_STATE_DOWN;
> + mlx5_devcom_comp_set_ready(sd->devcom, false);
> +out_unlock:
> + mlx5_devcom_comp_unlock(sd->devcom);
> sd_unregister(dev);
> sd_cleanup(dev);
> }
next prev parent reply other threads:[~2026-04-26 10:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 12:31 [PATCH net V3 0/4] net/mlx5: Fixes for Socket-Direct Tariq Toukan
2026-04-23 12:31 ` [PATCH net V3 1/4] net/mlx5: SD: Serialize init/cleanup Tariq Toukan
2026-04-26 10:46 ` Shay Drori [this message]
2026-04-23 12:31 ` [PATCH net V3 2/4] net/mlx5: SD, Keep multi-pf debugfs entries on primary Tariq Toukan
2026-04-23 12:31 ` [PATCH net V3 3/4] net/mlx5e: SD, Fix missing cleanup on probe/resume error Tariq Toukan
2026-04-26 10:45 ` Shay Drori
2026-04-23 12:31 ` [PATCH net V3 4/4] net/mlx5e: SD, Fix race condition in secondary device probe/remove Tariq Toukan
2026-04-26 13:26 ` Shay Drori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=60dbc1e0-97b8-497c-86bf-90a0f75d6d18@nvidia.com \
--to=shayd@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox