* [PATCH net v3] net/mana: Fix auxiliary device double-delete race
@ 2026-05-04 14:27 Konstantin Taranov
2026-05-04 20:13 ` Long Li
2026-05-06 1:28 ` Jakub Kicinski
0 siblings, 2 replies; 3+ messages in thread
From: Konstantin Taranov @ 2026-05-04 14:27 UTC (permalink / raw)
To: shirazsaleem, kotaranov, pabeni, haiyangz, kys, edumazet, kuba,
davem, decui, wei.liu, longli, jgg, leon
Cc: linux-rdma, linux-kernel, netdev
From: Shiraz Saleem <shirazsaleem@microsoft.com>
Make remove_adev() safe to call concurrently from the service reset
and PCI eject paths by using xchg() to atomically claim the adev
pointer. This prevents double auxiliary_device_delete/uninit when
hv_eject_device_work races with the service reset workqueue.
Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing events")
Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
---
v3: Separate the xchg() call from the variable declaration in remove_adev()
to avoid calling functions with side effects as variable initializers
v2: rebased on the latest net
drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a654b3699..dd4f4215a 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -3465,14 +3465,19 @@ static void adev_release(struct device *dev)
static void remove_adev(struct gdma_dev *gd)
{
- struct auxiliary_device *adev = gd->adev;
- int id = adev->id;
+ struct auxiliary_device *adev;
+ int id;
+
+ adev = xchg(&gd->adev, NULL);
+ if (!adev)
+ return;
+
+ id = adev->id;
auxiliary_device_delete(adev);
auxiliary_device_uninit(adev);
mana_adev_idx_free(id);
- gd->adev = NULL;
}
static int add_adev(struct gdma_dev *gd, const char *name)
@@ -3538,7 +3543,7 @@ static void mana_rdma_service_handle(struct work_struct *work)
switch (serv_work->event) {
case GDMA_SERVICE_TYPE_RDMA_SUSPEND:
- if (!gd->adev || gd->is_suspended)
+ if (gd->is_suspended)
break;
remove_adev(gd);
@@ -3753,8 +3758,7 @@ void mana_remove(struct gdma_dev *gd, bool suspending)
cancel_delayed_work_sync(&ac->gf_stats_work);
/* adev currently doesn't support suspending, always remove it */
- if (gd->adev)
- remove_adev(gd);
+ remove_adev(gd);
for (i = 0; i < ac->num_ports; i++) {
ndev = ac->ports[i];
@@ -3843,8 +3847,7 @@ void mana_rdma_remove(struct gdma_dev *gd)
if (gc->service_wq)
flush_workqueue(gc->service_wq);
- if (gd->adev)
- remove_adev(gd);
+ remove_adev(gd);
mana_gd_deregister_device(gd);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* RE: [PATCH net v3] net/mana: Fix auxiliary device double-delete race
2026-05-04 14:27 [PATCH net v3] net/mana: Fix auxiliary device double-delete race Konstantin Taranov
@ 2026-05-04 20:13 ` Long Li
2026-05-06 1:28 ` Jakub Kicinski
1 sibling, 0 replies; 3+ messages in thread
From: Long Li @ 2026-05-04 20:13 UTC (permalink / raw)
To: Konstantin Taranov, Shiraz Saleem, Konstantin Taranov,
pabeni@redhat.com, Haiyang Zhang, KY Srinivasan,
edumazet@google.com, kuba@kernel.org, davem@davemloft.net,
Dexuan Cui, wei.liu@kernel.org, jgg@ziepe.ca, leon@kernel.org
Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org
> From: Shiraz Saleem <shirazsaleem@microsoft.com>
>
> Make remove_adev() safe to call concurrently from the service reset and PCI
> eject paths by using xchg() to atomically claim the adev pointer. This prevents
> double auxiliary_device_delete/uninit when hv_eject_device_work races with
> the service reset workqueue.
>
> Fixes: 505cc26bcae0 ("net: mana: Add support for auxiliary device servicing
> events")
> Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
> Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
> ---
> v3: Separate the xchg() call from the variable declaration in remove_adev() to
> avoid calling functions with side effects as variable initializers
> v2: rebased on the latest net
> drivers/net/ethernet/microsoft/mana/mana_en.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index a654b3699..dd4f4215a 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3465,14 +3465,19 @@ static void adev_release(struct device *dev)
>
> static void remove_adev(struct gdma_dev *gd) {
> - struct auxiliary_device *adev = gd->adev;
> - int id = adev->id;
> + struct auxiliary_device *adev;
> + int id;
> +
> + adev = xchg(&gd->adev, NULL);
> + if (!adev)
> + return;
> +
> + id = adev->id;
>
> auxiliary_device_delete(adev);
> auxiliary_device_uninit(adev);
>
> mana_adev_idx_free(id);
> - gd->adev = NULL;
> }
>
> static int add_adev(struct gdma_dev *gd, const char *name) @@ -3538,7
> +3543,7 @@ static void mana_rdma_service_handle(struct work_struct *work)
>
> switch (serv_work->event) {
> case GDMA_SERVICE_TYPE_RDMA_SUSPEND:
> - if (!gd->adev || gd->is_suspended)
> + if (gd->is_suspended)
> break;
>
> remove_adev(gd);
> @@ -3753,8 +3758,7 @@ void mana_remove(struct gdma_dev *gd, bool
> suspending)
> cancel_delayed_work_sync(&ac->gf_stats_work);
>
> /* adev currently doesn't support suspending, always remove it */
> - if (gd->adev)
> - remove_adev(gd);
> + remove_adev(gd);
>
> for (i = 0; i < ac->num_ports; i++) {
> ndev = ac->ports[i];
> @@ -3843,8 +3847,7 @@ void mana_rdma_remove(struct gdma_dev *gd)
> if (gc->service_wq)
> flush_workqueue(gc->service_wq);
>
> - if (gd->adev)
> - remove_adev(gd);
> + remove_adev(gd);
>
> mana_gd_deregister_device(gd);
> }
> --
> 2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net v3] net/mana: Fix auxiliary device double-delete race
2026-05-04 14:27 [PATCH net v3] net/mana: Fix auxiliary device double-delete race Konstantin Taranov
2026-05-04 20:13 ` Long Li
@ 2026-05-06 1:28 ` Jakub Kicinski
1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-05-06 1:28 UTC (permalink / raw)
To: kotaranov
Cc: Jakub Kicinski, shirazsaleem, kotaranov, pabeni, haiyangz, kys,
edumazet, davem, decui, wei.liu, longli, jgg, leon, linux-rdma,
linux-kernel, netdev
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mana: Fix auxiliary device double-delete race
This commit modifies remove_adev() to use xchg() to atomically claim the
auxiliary device pointer. This change is intended to prevent a double-free
race between the PCI eject path and the service reset workqueue.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index a654b3699c4c5..dd4f4215ab434 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -3465,14 +3465,19 @@ static void adev_release(struct device *dev)
>
> static void remove_adev(struct gdma_dev *gd)
> {
> - struct auxiliary_device *adev = gd->adev;
> - int id = adev->id;
> + struct auxiliary_device *adev;
> + int id;
> +
> + adev = xchg(&gd->adev, NULL);
> + if (!adev)
> + return;
Does this xchg() operation act as an ad-hoc synchronization gate rather
than properly synchronizing the concurrent teardown paths?
While using xchg() ensures the adev pointer itself is only freed once, the
rest of the teardown sequence appears to remain unprotected. If the PCI
remove path and the service reset workqueue run concurrently, one thread
might successfully claim and process the adev, causing the other thread to
return early from remove_adev().
However, the thread returning early will then continue its execution into
the rest of mana_remove() or mana_rdma_remove(). This could allow the first
thread to proceed and free critical structures:
mana_remove() {
...
remove_adev(gd);
...
free_netdev(ndev);
...
kfree(ac);
}
while the second thread concurrently iterates and accesses those same
structures.
Could this concurrent execution lead to a use-after-free on ndev or ac?
Should the teardown sequence be synchronized using a proper lock, such as
the device lock, or by synchronously canceling the service work in the
remove path, instead of relying on an atomic operation?
[ ... ]
--
pw-bot: cr
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-06 1:28 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 14:27 [PATCH net v3] net/mana: Fix auxiliary device double-delete race Konstantin Taranov
2026-05-04 20:13 ` Long Li
2026-05-06 1:28 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox