[PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg()
@ 2026-06-14 13:04 Michael Bommarito
  2026-06-15  1:28 ` Zhu Yanjun
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Bommarito @ 2026-06-14 13:04 UTC (permalink / raw)
  To: Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky
  Cc: Bob Pearson, linux-rdma, linux-kernel

rxe_get_mcg() inserts the new mcg into rxe->mcg_tree and takes the tree
reference before calling rxe_mcast_add() outside mcg_lock. On failure
the error path frees the mcg with a bare kfree() without erasing the
tree node or dropping the tree reference, so the freed mcg stays linked
in mcg_tree and the next __rxe_lookup_mcg() on the same mgid uses it
after free. rxe_mcast_add() fails reachably from an unprivileged caller:
-ENODEV when the backing netdev is removed, or a propagated dev_mc_add()
error.

Tear the mcg down with __rxe_destroy_mcg() on the failure path, as
rxe_attach_mcast() already does.

Reproduced under KASAN on QEMU by forcing the rxe_mcast_add() failure;
the use-after-free in __rxe_lookup_mcg() is gone after this change.

Fixes: a926a903b7dc ("RDMA/rxe: Do not call  dev_mc_add/del() under a spinlock")
Cc: stable@vger.kernel.org # v5.18+
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Reproduction (v7.1-rc4, x86_64 QEMU/KVM, KASAN, Soft-RoCE):

Forcing rxe_mcast_add() to return -ENODEV, an unprivileged ATTACH_MCAST
on a UD QP leaves the freed mcg linked in mcg_tree. On the stock kernel
the next lookup reports

  BUG: KASAN: slab-use-after-free in __rxe_lookup_mcg

and the subsequent rb_erase() panics. Patched, the forced failure
returns cleanly. Control: with injection disabled, re-attach and detach
of the same MGID and a two-QP join/leave are KASAN-clean on both trees.

tools/testing/selftests/rdma has no rxe_mcast coverage; harness off-list
on request.

 drivers/infiniband/sw/rxe/rxe_mcast.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 5cad720..7f148d4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -196,6 +196,8 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
 	__rxe_insert_mcg(mcg);
 }
 
+static void __rxe_destroy_mcg(struct rxe_mcg *mcg);
+
 /**
  * rxe_get_mcg - lookup or allocate a mcg
  * @rxe: rxe device object
@@ -247,7 +249,13 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
 	if (!err)
 		return mcg;
 
-	kfree(mcg);
+	/* mcg was made visible in mcg_tree; unwind the insert before freeing. */
+	spin_lock_bh(&rxe->mcg_lock);
+	__rxe_destroy_mcg(mcg);
+	spin_unlock_bh(&rxe->mcg_lock);
+	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+	return ERR_PTR(err);
+
 err_dec:
 	atomic_dec(&rxe->mcg_num);
 	return ERR_PTR(err);
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg()
  2026-06-14 13:04 [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg() Michael Bommarito
@ 2026-06-15  1:28 ` Zhu Yanjun
  0 siblings, 0 replies; 2+ messages in thread
From: Zhu Yanjun @ 2026-06-15  1:28 UTC (permalink / raw)
  To: Michael Bommarito, Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky,
	yanjun.zhu@linux.dev
  Cc: Bob Pearson, linux-rdma, linux-kernel

在 2026/6/14 6:04, Michael Bommarito 写道:
> rxe_get_mcg() inserts the new mcg into rxe->mcg_tree and takes the tree
> reference before calling rxe_mcast_add() outside mcg_lock. On failure
> the error path frees the mcg with a bare kfree() without erasing the
> tree node or dropping the tree reference, so the freed mcg stays linked
> in mcg_tree and the next __rxe_lookup_mcg() on the same mgid uses it
> after free. rxe_mcast_add() fails reachably from an unprivileged caller:
> -ENODEV when the backing netdev is removed, or a propagated dev_mc_add()
> error.
> 
> Tear the mcg down with __rxe_destroy_mcg() on the failure path, as
> rxe_attach_mcast() already does.
> 
> Reproduced under KASAN on QEMU by forcing the rxe_mcast_add() failure;
> the use-after-free in __rxe_lookup_mcg() is gone after this change.
> 
> Fixes: a926a903b7dc ("RDMA/rxe: Do not call  dev_mc_add/del() under a spinlock")
> Cc: stable@vger.kernel.org # v5.18+
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Reproduction (v7.1-rc4, x86_64 QEMU/KVM, KASAN, Soft-RoCE):
> 
> Forcing rxe_mcast_add() to return -ENODEV, an unprivileged ATTACH_MCAST
> on a UD QP leaves the freed mcg linked in mcg_tree. On the stock kernel
> the next lookup reports
> 
>    BUG: KASAN: slab-use-after-free in __rxe_lookup_mcg
> 
> and the subsequent rb_erase() panics. Patched, the forced failure
> returns cleanly. Control: with injection disabled, re-attach and detach
> of the same MGID and a two-QP join/leave are KASAN-clean on both trees.
> 
> tools/testing/selftests/rdma has no rxe_mcast coverage; harness off-list
> on request.
> 
>   drivers/infiniband/sw/rxe/rxe_mcast.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 5cad720..7f148d4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mcast.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -196,6 +196,8 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
>   	__rxe_insert_mcg(mcg);
>   }
>   
> +static void __rxe_destroy_mcg(struct rxe_mcg *mcg);
> +
>   /**
>    * rxe_get_mcg - lookup or allocate a mcg
>    * @rxe: rxe device object
> @@ -247,7 +249,13 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
>   	if (!err)
>   		return mcg;
>   
> -	kfree(mcg);
> +	/* mcg was made visible in mcg_tree; unwind the insert before freeing. */
> +	spin_lock_bh(&rxe->mcg_lock);
> +	__rxe_destroy_mcg(mcg);
> +	spin_unlock_bh(&rxe->mcg_lock);
> +	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
> +	return ERR_PTR(err);
> +

Thanks for fixing the UAF. While this patch resolves the single-threaded 
issue, it introduces a severe race condition in concurrent environments.

Because rxe_mcast_add() runs outside the mcg_lock, a concurrent thread 
can find this mcg in the tree and successfully attach its own QPs during 
this window.

If the creator thread unconditionally erases the mcg from the tree on 
failure, those concurrent QPs become "orphaned." Future 
rxe_detach_mcast() calls will fail to find the erased mcg, causing these 
QPs and the mcg memory to leak permanently.

Attempting to simplify the rollback by unconditionally destroying the 
node or merging unlock paths can easily lead to executing kfree or a 
nested lock acquisition while still holding the mcg_lock spinlock, 
triggering a kernel deadlock or a double rb_erase panic.

The error path must conditionally destroy the mcg. After re-acquiring 
rxe->mcg_lock, check if mcg->qp_list is empty:

If empty: Safe to dismantle. Call __rxe_destroy_mcg(), drop the lock, 
and put the final reference.

If NOT empty: Concurrent threads have adopted it. Do not erase the tree 
node; simply release the lock and drop the creator's reference.

Please consider submitting a v2 addressing this concurrency gap.

Zhu Yanjun

>   err_dec:
>   	atomic_dec(&rxe->mcg_num);
>   	return ERR_PTR(err);
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-06-15  1:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-14 13:04 [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg() Michael Bommarito
2026-06-15  1:28 ` Zhu Yanjun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox