Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Zhu Yanjun <yanjun.zhu@linux.dev>
To: Michael Bommarito <michael.bommarito@gmail.com>,
	Zhu Yanjun <zyjzyj2000@gmail.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	Leon Romanovsky <leon@kernel.org>,
	"yanjun.zhu@linux.dev" <yanjun.zhu@linux.dev>
Cc: Bob Pearson <rpearsonhpe@gmail.com>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg()
Date: Sun, 14 Jun 2026 18:28:11 -0700	[thread overview]
Message-ID: <1158f1de-4469-463c-91de-c5e24d2add4f@linux.dev> (raw)
In-Reply-To: <20260614130443.2517578-1-michael.bommarito@gmail.com>

在 2026/6/14 6:04, Michael Bommarito 写道:
> rxe_get_mcg() inserts the new mcg into rxe->mcg_tree and takes the tree
> reference before calling rxe_mcast_add() outside mcg_lock. On failure
> the error path frees the mcg with a bare kfree() without erasing the
> tree node or dropping the tree reference, so the freed mcg stays linked
> in mcg_tree and the next __rxe_lookup_mcg() on the same mgid uses it
> after free. rxe_mcast_add() fails reachably from an unprivileged caller:
> -ENODEV when the backing netdev is removed, or a propagated dev_mc_add()
> error.
> 
> Tear the mcg down with __rxe_destroy_mcg() on the failure path, as
> rxe_attach_mcast() already does.
> 
> Reproduced under KASAN on QEMU by forcing the rxe_mcast_add() failure;
> the use-after-free in __rxe_lookup_mcg() is gone after this change.
> 
> Fixes: a926a903b7dc ("RDMA/rxe: Do not call  dev_mc_add/del() under a spinlock")
> Cc: stable@vger.kernel.org # v5.18+
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Reproduction (v7.1-rc4, x86_64 QEMU/KVM, KASAN, Soft-RoCE):
> 
> Forcing rxe_mcast_add() to return -ENODEV, an unprivileged ATTACH_MCAST
> on a UD QP leaves the freed mcg linked in mcg_tree. On the stock kernel
> the next lookup reports
> 
>    BUG: KASAN: slab-use-after-free in __rxe_lookup_mcg
> 
> and the subsequent rb_erase() panics. Patched, the forced failure
> returns cleanly. Control: with injection disabled, re-attach and detach
> of the same MGID and a two-QP join/leave are KASAN-clean on both trees.
> 
> tools/testing/selftests/rdma has no rxe_mcast coverage; harness off-list
> on request.
> 
>   drivers/infiniband/sw/rxe/rxe_mcast.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 5cad720..7f148d4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mcast.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -196,6 +196,8 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
>   	__rxe_insert_mcg(mcg);
>   }
>   
> +static void __rxe_destroy_mcg(struct rxe_mcg *mcg);
> +
>   /**
>    * rxe_get_mcg - lookup or allocate a mcg
>    * @rxe: rxe device object
> @@ -247,7 +249,13 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
>   	if (!err)
>   		return mcg;
>   
> -	kfree(mcg);
> +	/* mcg was made visible in mcg_tree; unwind the insert before freeing. */
> +	spin_lock_bh(&rxe->mcg_lock);
> +	__rxe_destroy_mcg(mcg);
> +	spin_unlock_bh(&rxe->mcg_lock);
> +	kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
> +	return ERR_PTR(err);
> +

Thanks for fixing the UAF. While this patch resolves the single-threaded 
issue, it introduces a severe race condition in concurrent environments.

Because rxe_mcast_add() runs outside the mcg_lock, a concurrent thread 
can find this mcg in the tree and successfully attach its own QPs during 
this window.

If the creator thread unconditionally erases the mcg from the tree on 
failure, those concurrent QPs become "orphaned." Future 
rxe_detach_mcast() calls will fail to find the erased mcg, causing these 
QPs and the mcg memory to leak permanently.

Attempting to simplify the rollback by unconditionally destroying the 
node or merging unlock paths can easily lead to executing kfree or a 
nested lock acquisition while still holding the mcg_lock spinlock, 
triggering a kernel deadlock or a double rb_erase panic.

The error path must conditionally destroy the mcg. After re-acquiring 
rxe->mcg_lock, check if mcg->qp_list is empty:

If empty: Safe to dismantle. Call __rxe_destroy_mcg(), drop the lock, 
and put the final reference.

If NOT empty: Concurrent threads have adopted it. Do not erase the tree 
node; simply release the lock and drop the creator's reference.

Please consider submitting a v2 addressing this concurrency gap.

Zhu Yanjun

>   err_dec:
>   	atomic_dec(&rxe->mcg_num);
>   	return ERR_PTR(err);
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8


      reply	other threads:[~2026-06-15  1:28 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-14 13:04 [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg() Michael Bommarito
2026-06-15  1:28 ` Zhu Yanjun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1158f1de-4469-463c-91de-c5e24d2add4f@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michael.bommarito@gmail.com \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox