From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
"Mark Bloch" <mbloch@nvidia.com>,
Leon Romanovsky <leon@kernel.org>, Shay Drory <shayd@nvidia.com>,
Simon Horman <horms@kernel.org>,
Patrisious Haddad <phaddad@nvidia.com>,
Parav Pandit <parav@nvidia.com>, Kees Cook <kees@kernel.org>,
Gal Pressman <gal@nvidia.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
Dragos Tatulea <dtatulea@nvidia.com>
Subject: [PATCH net V5 0/4] net/mlx5: Fixes for Socket-Direct
Date: Mon, 4 May 2026 21:02:02 +0300 [thread overview]
Message-ID: <20260504180206.268568-1-tariqt@nvidia.com> (raw)
Hi,
This series fixes several race conditions and bugs in the mlx5
Socket-Direct (SD) single netdev flow.
Patch 1 serializes mlx5_sd_init()/mlx5_sd_cleanup() with
mlx5_devcom_comp_lock() and tracks the SD group state on the primary
device, preventing concurrent or duplicate bring-up/tear-down.
Patch 2 fixes the debugfs "multi-pf" directory being stored on the
calling device's sd struct instead of the primary's, which caused
memory leaks and recreation errors when cleanup ran from a different PF.
Patch 3 fixes a race where a secondary PF could access the primary's
auxiliary device after it had been unbound, by holding the primary's
device lock while operating on its auxiliary device.
Patch 4 fixes missing cleanup on ETH probe errors. The analogous gap on
the resume path requires introducing sd_suspend/resume APIs that only
destroy FW resources and is left for a follow-up series.
Regards,
Tariq
---
V5:
- Adjust "net/mlx5: SD: Serialize init/cleanup" to clear each peer's
primary_dev pointer and the primary's secondaries[] under the comp
lock, and to set devcom not-ready in the !primary and state != UP
early-exit paths so the device cannot unregister while devcom is
still marked ready.
- Adjust "net/mlx5e: SD, Fix race condition in secondary device
probe/remove" to also take get_device()/put_device() on the primary's
adev, since device_lock() alone does not pin the kobject.
V4: https://lore.kernel.org/all/20260428060111.221086-1-tariqt@nvidia.com/
V3: https://lore.kernel.org/all/20260423123104.201552-1-tariqt@nvidia.com/
Shay Drory (4):
net/mlx5: SD: Serialize init/cleanup
net/mlx5: SD, Keep multi-pf debugfs entries on primary
net/mlx5e: SD, Fix missing cleanup on probe error
net/mlx5e: SD, Fix race condition in secondary device probe/remove
.../net/ethernet/mellanox/mlx5/core/en_main.c | 26 +++-
.../net/ethernet/mellanox/mlx5/core/lib/sd.c | 114 +++++++++++++++---
.../net/ethernet/mellanox/mlx5/core/lib/sd.h | 2 +
3 files changed, 122 insertions(+), 20 deletions(-)
base-commit: bd3a4795d5744f59a1f485379f1303e5e606f377
--
2.44.0
next reply other threads:[~2026-05-04 18:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 18:02 Tariq Toukan [this message]
2026-05-04 18:02 ` [PATCH net V5 1/4] net/mlx5: SD: Serialize init/cleanup Tariq Toukan
2026-05-04 18:02 ` [PATCH net V5 2/4] net/mlx5: SD, Keep multi-pf debugfs entries on primary Tariq Toukan
2026-05-04 18:02 ` [PATCH net V5 3/4] net/mlx5e: SD, Fix missing cleanup on probe error Tariq Toukan
2026-05-04 18:02 ` [PATCH net V5 4/4] net/mlx5e: SD, Fix race condition in secondary device probe/remove Tariq Toukan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260504180206.268568-1-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=edumazet@google.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=shayd@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox