From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
davem@davemloft.net, saeedm@nvidia.com, mbloch@nvidia.com,
leon@kernel.org, shayd@nvidia.com, horms@kernel.org,
phaddad@nvidia.com, kees@kernel.org, parav@nvidia.com,
gal@nvidia.com, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
dtatulea@nvidia.com
Subject: Re: [PATCH net V4 1/4] net/mlx5: SD: Serialize init/cleanup
Date: Wed, 29 Apr 2026 18:42:07 -0700 [thread overview]
Message-ID: <20260430014207.2375703-1-kuba@kernel.org> (raw)
In-Reply-To: <20260428060111.221086-2-tariqt@nvidia.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: SD: Serialize init/cleanup
This commit protects the Socket-Direct (SD) init/cleanup paths with the
devcom component lock and tracks the SD group state on the primary device.
It skips init if the primary is already UP, and skips cleanup unless the
primary is UP, serializing the bring-up/tear-down sequence across PFs.
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> index 762c783156b4..d42c283cbb38 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c
[ ... ]
> @@ -491,22 +507,34 @@ void mlx5_sd_cleanup(struct mlx5_core_dev *dev)
> {
> struct mlx5_sd *sd = mlx5_get_sd(dev);
> struct mlx5_core_dev *primary, *pos;
> + struct mlx5_sd *primary_sd;
> int i;
>
> if (!sd)
> return;
>
> + mlx5_devcom_comp_lock(sd->devcom);
> if (!mlx5_devcom_comp_is_ready(sd->devcom))
> - goto out;
> + goto out_unlock;
>
> primary = mlx5_sd_get_primary(dev);
> + if (!primary)
> + goto out_unlock;
> +
> + primary_sd = mlx5_get_sd(primary);
> + if (primary_sd->state != MLX5_SD_STATE_UP)
> + goto out_unlock;
If the device is unbound during the race window where the devcom component
is ready but the state is still MLX5_SD_STATE_DOWN (for example, just after
sd_register() finishes but before mlx5_sd_init() sets the state to UP), does
this early exit leave the component marked as ready?
Since mlx5_devcom_comp_set_ready(sd->devcom, false) was moved out of
sd_unregister() and into this function, skipping it here means the device will
unregister and free its struct mlx5_sd while the group is still ready.
Could a concurrent mlx5_sd_init() on another peer acquire the devcom lock,
observe that it is ready, and proceed? If the unregistered device was the
primary, mlx5_get_sd(primary) would return NULL, causing a NULL pointer
dereference at primary_sd->state. If it was a secondary, mlx5_sd_init()
would iterate over primary_sd->secondaries and access the freed device
structure, resulting in a use-after-free.
> +
> mlx5_sd_for_each_secondary(i, primary, pos)
> sd_cmd_unset_secondary(pos);
> sd_cmd_unset_primary(primary);
> debugfs_remove_recursive(sd->dfs);
This isn't a regression introduced in this patch, but does storing sd->dfs
locally instead of on the primary device cause debugfs entries to leak?
If a different peer is unbound first, its sd->dfs is NULL, making the call
to debugfs_remove_recursive() a no-op. The cleanup then finishes and marks
the group as unready.
When the original device that created the debugfs directory is later unbound,
it sees the group is no longer ready and skips this block entirely, never
freeing its sd->dfs. Since these files hold pointers to the mlx5_core_dev
structs, accessing them from userspace after the devices are freed could
trigger a use-after-free.
I note this was fixed later in the series by commit "net/mlx5: SD, Keep
multi-pf debugfs entries on primary", but wanted to highlight it here.
>
> sd_info(primary, "group id %#x, uncombined\n", sd->group_id);
> -out:
> + primary_sd->state = MLX5_SD_STATE_DOWN;
> + mlx5_devcom_comp_set_ready(sd->devcom, false);
> +out_unlock:
> + mlx5_devcom_comp_unlock(sd->devcom);
> sd_unregister(dev);
> sd_cleanup(dev);
> }
--
pw-bot: cr
next prev parent reply other threads:[~2026-04-30 1:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 6:01 [PATCH net V4 0/4] net/mlx5: Fixes for Socket-Direct Tariq Toukan
2026-04-28 6:01 ` [PATCH net V4 1/4] net/mlx5: SD: Serialize init/cleanup Tariq Toukan
2026-04-30 1:42 ` Jakub Kicinski [this message]
2026-04-30 9:38 ` Shay Drori
2026-04-28 6:01 ` [PATCH net V4 2/4] net/mlx5: SD, Keep multi-pf debugfs entries on primary Tariq Toukan
2026-04-28 6:01 ` [PATCH net V4 3/4] net/mlx5e: SD, Fix missing cleanup on probe error Tariq Toukan
2026-04-30 1:42 ` Jakub Kicinski
2026-04-30 12:52 ` Shay Drori
2026-04-28 6:01 ` [PATCH net V4 4/4] net/mlx5e: SD, Fix race condition in secondary device probe/remove Tariq Toukan
2026-04-30 1:42 ` Jakub Kicinski
2026-04-30 13:03 ` Shay Drori
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260430014207.2375703-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=kees@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox