* [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg()
@ 2026-06-14 13:04 Michael Bommarito
2026-06-15 1:28 ` Zhu Yanjun
0 siblings, 1 reply; 4+ messages in thread
From: Michael Bommarito @ 2026-06-14 13:04 UTC (permalink / raw)
To: Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky
Cc: Bob Pearson, linux-rdma, linux-kernel
rxe_get_mcg() inserts the new mcg into rxe->mcg_tree and takes the tree
reference before calling rxe_mcast_add() outside mcg_lock. On failure
the error path frees the mcg with a bare kfree() without erasing the
tree node or dropping the tree reference, so the freed mcg stays linked
in mcg_tree and the next __rxe_lookup_mcg() on the same mgid uses it
after free. rxe_mcast_add() fails reachably from an unprivileged caller:
-ENODEV when the backing netdev is removed, or a propagated dev_mc_add()
error.
Tear the mcg down with __rxe_destroy_mcg() on the failure path, as
rxe_attach_mcast() already does.
Reproduced under KASAN on QEMU by forcing the rxe_mcast_add() failure;
the use-after-free in __rxe_lookup_mcg() is gone after this change.
Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock")
Cc: stable@vger.kernel.org # v5.18+
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
Reproduction (v7.1-rc4, x86_64 QEMU/KVM, KASAN, Soft-RoCE):
Forcing rxe_mcast_add() to return -ENODEV, an unprivileged ATTACH_MCAST
on a UD QP leaves the freed mcg linked in mcg_tree. On the stock kernel
the next lookup reports
BUG: KASAN: slab-use-after-free in __rxe_lookup_mcg
and the subsequent rb_erase() panics. Patched, the forced failure
returns cleanly. Control: with injection disabled, re-attach and detach
of the same MGID and a two-QP join/leave are KASAN-clean on both trees.
tools/testing/selftests/rdma has no rxe_mcast coverage; harness off-list
on request.
drivers/infiniband/sw/rxe/rxe_mcast.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 5cad720..7f148d4 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -196,6 +196,8 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
__rxe_insert_mcg(mcg);
}
+static void __rxe_destroy_mcg(struct rxe_mcg *mcg);
+
/**
* rxe_get_mcg - lookup or allocate a mcg
* @rxe: rxe device object
@@ -247,7 +249,13 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
if (!err)
return mcg;
- kfree(mcg);
+ /* mcg was made visible in mcg_tree; unwind the insert before freeing. */
+ spin_lock_bh(&rxe->mcg_lock);
+ __rxe_destroy_mcg(mcg);
+ spin_unlock_bh(&rxe->mcg_lock);
+ kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
+ return ERR_PTR(err);
+
err_dec:
atomic_dec(&rxe->mcg_num);
return ERR_PTR(err);
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg()
2026-06-14 13:04 [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg() Michael Bommarito
@ 2026-06-15 1:28 ` Zhu Yanjun
2026-06-17 2:27 ` [PATCH v2] RDMA/rxe: insert mcg into mcg_tree only after rxe_mcast_add() succeeds Michael Bommarito
0 siblings, 1 reply; 4+ messages in thread
From: Zhu Yanjun @ 2026-06-15 1:28 UTC (permalink / raw)
To: Michael Bommarito, Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky,
yanjun.zhu@linux.dev
Cc: Bob Pearson, linux-rdma, linux-kernel
在 2026/6/14 6:04, Michael Bommarito 写道:
> rxe_get_mcg() inserts the new mcg into rxe->mcg_tree and takes the tree
> reference before calling rxe_mcast_add() outside mcg_lock. On failure
> the error path frees the mcg with a bare kfree() without erasing the
> tree node or dropping the tree reference, so the freed mcg stays linked
> in mcg_tree and the next __rxe_lookup_mcg() on the same mgid uses it
> after free. rxe_mcast_add() fails reachably from an unprivileged caller:
> -ENODEV when the backing netdev is removed, or a propagated dev_mc_add()
> error.
>
> Tear the mcg down with __rxe_destroy_mcg() on the failure path, as
> rxe_attach_mcast() already does.
>
> Reproduced under KASAN on QEMU by forcing the rxe_mcast_add() failure;
> the use-after-free in __rxe_lookup_mcg() is gone after this change.
>
> Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock")
> Cc: stable@vger.kernel.org # v5.18+
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> Reproduction (v7.1-rc4, x86_64 QEMU/KVM, KASAN, Soft-RoCE):
>
> Forcing rxe_mcast_add() to return -ENODEV, an unprivileged ATTACH_MCAST
> on a UD QP leaves the freed mcg linked in mcg_tree. On the stock kernel
> the next lookup reports
>
> BUG: KASAN: slab-use-after-free in __rxe_lookup_mcg
>
> and the subsequent rb_erase() panics. Patched, the forced failure
> returns cleanly. Control: with injection disabled, re-attach and detach
> of the same MGID and a two-QP join/leave are KASAN-clean on both trees.
>
> tools/testing/selftests/rdma has no rxe_mcast coverage; harness off-list
> on request.
>
> drivers/infiniband/sw/rxe/rxe_mcast.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 5cad720..7f148d4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mcast.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -196,6 +196,8 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
> __rxe_insert_mcg(mcg);
> }
>
> +static void __rxe_destroy_mcg(struct rxe_mcg *mcg);
> +
> /**
> * rxe_get_mcg - lookup or allocate a mcg
> * @rxe: rxe device object
> @@ -247,7 +249,13 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
> if (!err)
> return mcg;
>
> - kfree(mcg);
> + /* mcg was made visible in mcg_tree; unwind the insert before freeing. */
> + spin_lock_bh(&rxe->mcg_lock);
> + __rxe_destroy_mcg(mcg);
> + spin_unlock_bh(&rxe->mcg_lock);
> + kref_put(&mcg->ref_cnt, rxe_cleanup_mcg);
> + return ERR_PTR(err);
> +
Thanks for fixing the UAF. While this patch resolves the single-threaded
issue, it introduces a severe race condition in concurrent environments.
Because rxe_mcast_add() runs outside the mcg_lock, a concurrent thread
can find this mcg in the tree and successfully attach its own QPs during
this window.
If the creator thread unconditionally erases the mcg from the tree on
failure, those concurrent QPs become "orphaned." Future
rxe_detach_mcast() calls will fail to find the erased mcg, causing these
QPs and the mcg memory to leak permanently.
Attempting to simplify the rollback by unconditionally destroying the
node or merging unlock paths can easily lead to executing kfree or a
nested lock acquisition while still holding the mcg_lock spinlock,
triggering a kernel deadlock or a double rb_erase panic.
The error path must conditionally destroy the mcg. After re-acquiring
rxe->mcg_lock, check if mcg->qp_list is empty:
If empty: Safe to dismantle. Call __rxe_destroy_mcg(), drop the lock,
and put the final reference.
If NOT empty: Concurrent threads have adopted it. Do not erase the tree
node; simply release the lock and drop the creator's reference.
Please consider submitting a v2 addressing this concurrency gap.
Zhu Yanjun
> err_dec:
> atomic_dec(&rxe->mcg_num);
> return ERR_PTR(err);
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] RDMA/rxe: insert mcg into mcg_tree only after rxe_mcast_add() succeeds
2026-06-15 1:28 ` Zhu Yanjun
@ 2026-06-17 2:27 ` Michael Bommarito
2026-06-17 15:50 ` Zhu Yanjun
0 siblings, 1 reply; 4+ messages in thread
From: Michael Bommarito @ 2026-06-17 2:27 UTC (permalink / raw)
To: Zhu Yanjun, Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky
Cc: Bob Pearson, linux-rdma, linux-kernel
rxe_get_mcg() publishes a newly allocated multicast group in
rxe->mcg_tree before programming the backing Ethernet multicast address
with rxe_mcast_add(), which runs outside mcg_lock. A local userspace
RDMA client reaches this path with ATTACH_MCAST on a UD QP; if
rxe_mcast_add() then returns an error (for example -ENODEV when the
backing netdev has been removed, or a propagated dev_mc_add() error),
the unwind frees the published group without removing it from the tree.
A later lookup of the same MGID dereferences the freed struct rxe_mcg
from __rxe_lookup_mcg().
Fix this by keeping the new mcg private until rxe_mcast_add() succeeds.
Split the tree publication into __rxe_publish_mcg(), call rxe_mcast_add()
before taking the tree reference, and free the still-private mcg on
failure. Because the group is never visible in mcg_tree until the
multicast address is programmed, no concurrent caller can look it up or
attach a QP to a group that is about to be torn down, so the error path
needs no conditional unwind. If another caller publishes the same MGID
while the address is being programmed, the post-add re-check under
mcg_lock finds the winner; this caller then drops its private object and
balances its own rxe_mcast_add() with rxe_mcast_del() before returning
the winner.
Reproduced by forcing the rxe_mcast_add() error return under KASAN:
without the change the next attach to the same MGID reports a
slab-use-after-free in __rxe_lookup_mcg(); with it the forced failure
returns cleanly. A no-injection attach/detach regression, including a
two-QP shared join/leave and re-attach, stays KASAN- and leak-clean.
Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock")
Cc: stable@vger.kernel.org # v5.18+
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
v2: switch approach in response to Zhu Yanjun's review of v1
(https://lore.kernel.org/all/1158f1de-4469-463c-91de-c5e24d2add4f@linux.dev/). v1 unwound the
already-published mcg on the failure path; Zhu noted that because
rxe_mcast_add() runs outside mcg_lock, a concurrent caller can find
the published mcg and attach a QP in that window, so an unconditional
teardown would orphan those QPs and leak the mcg. Rather than make the
teardown conditional, v2 does not publish the mcg into mcg_tree until
after rxe_mcast_add() has succeeded, which removes the window entirely
(the safety of the two-CPU same-MGID race and the loser-side
rxe_mcast_del() balance is analysed in the commit log above). No
Fixes/stable change; this is still the same UAF.
v1: https://lore.kernel.org/all/20260614130443.2517578-1-michael.bommarito@gmail.com/
Reproduction
============
Tested on v7.1-rc4 (5200f5f493f7) on x86_64 QEMU/KVM with KASAN,
rxe, rdma_rxe, and userspace verbs against a Soft-RoCE device.
Conditions: the caller needs access to the rxe uverbs device and a UD
QP. The demonstrated error path requires rxe_mcast_add() to fail after
the mcg allocation; natural failures include netdev removal returning
-ENODEV and dev_mc_add() errors such as -ENOMEM. The test made that
return deterministic by forcing rxe_mcast_add() to return -ENODEV.
The private harness creates a UD QP, attaches an MGID, forces the
rxe_mcast_add() failure, and repeats attach on the same MGID so the
lookup walks the stale rb-node.
Stock: KASAN reports a slab-use-after-free in __rxe_lookup_mcg() while
comparing mcg->mgid on the next attach.
Patched: the forced-failure path returns without a KASAN report.
Regression: no-injection attach/detach, two-QP shared join/leave, and
re-attach complete without KASAN, WARN, or leak reports.
Mitigations: restrict access to the rxe uverbs device or avoid loading
the rxe driver where untrusted local users can create RDMA objects.
The harness is available off-list on request. The RDMA selftest gate was
checked; tools/testing/selftests/rdma does not contain a matching
rxe_mcast or ATTACH_MCAST coverage test.
drivers/infiniband/sw/rxe/rxe_mcast.c | 50 +++++++++++++++++++--------
1 file changed, 36 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
index 5cad72073eca1..eaa259cc39ea9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mcast.c
+++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
@@ -175,7 +175,9 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
* @mgid: multicast address as a gid
* @mcg: new mcg object
*
- * Context: caller should hold rxe->mcg lock
+ * Initializes the mcg fields. The mcg is private and not yet visible in
+ * mcg_tree, so this may run without rxe->mcg_lock; __rxe_publish_mcg()
+ * makes it visible under the lock once it is ready.
*/
static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
struct rxe_mcg *mcg)
@@ -184,13 +186,22 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
INIT_LIST_HEAD(&mcg->qp_list);
mcg->rxe = rxe;
+}
- /* caller holds a ref on mcg but that will be
- * dropped when mcg goes out of scope. We need to take a ref
- * on the pointer that will be saved in the red-black tree
- * by __rxe_insert_mcg and used to lookup mcg from mgid later.
- * Inserting mcg makes it visible to outside so this should
- * be done last after the object is ready.
+/**
+ * __rxe_publish_mcg - make a fully initialized mcg visible in mcg_tree
+ * @mcg: the mcg object
+ *
+ * Context: caller must hold rxe->mcg_lock and a reference on mcg
+ */
+static void __rxe_publish_mcg(struct rxe_mcg *mcg)
+{
+ /* caller holds a ref on mcg but that will be dropped when mcg goes
+ * out of scope. We need to take a ref on the pointer that will be
+ * saved in the red-black tree by __rxe_insert_mcg and used to lookup
+ * mcg from mgid later. Inserting mcg makes it visible to outside so
+ * this is done last after the object is ready and the multicast
+ * address has been programmed.
*/
kref_get(&mcg->ref_cnt);
__rxe_insert_mcg(mcg);
@@ -228,26 +239,37 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
err = -ENOMEM;
goto err_dec;
}
+ __rxe_init_mcg(rxe, mgid, mcg);
+
+ /* program the multicast address while mcg is still private, before
+ * it is inserted into mcg_tree. dev_mc_add() may sleep so this must
+ * run outside mcg_lock. On failure mcg was never published, so a
+ * plain free is correct and the tree is untouched.
+ */
+ err = rxe_mcast_add(rxe, mgid);
+ if (err) {
+ kfree(mcg);
+ goto err_dec;
+ }
spin_lock_bh(&rxe->mcg_lock);
- /* re-check to see if someone else just added it */
+ /* re-check to see if someone else just added it while we were adding
+ * the multicast address; if so use theirs and drop ours
+ */
tmp = __rxe_lookup_mcg(rxe, mgid);
if (tmp) {
spin_unlock_bh(&rxe->mcg_lock);
+ rxe_mcast_del(rxe, mgid);
atomic_dec(&rxe->mcg_num);
kfree(mcg);
return tmp;
}
- __rxe_init_mcg(rxe, mgid, mcg);
+ __rxe_publish_mcg(mcg);
spin_unlock_bh(&rxe->mcg_lock);
- /* add mcast address outside of lock */
- err = rxe_mcast_add(rxe, mgid);
- if (!err)
- return mcg;
+ return mcg;
- kfree(mcg);
err_dec:
atomic_dec(&rxe->mcg_num);
return ERR_PTR(err);
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] RDMA/rxe: insert mcg into mcg_tree only after rxe_mcast_add() succeeds
2026-06-17 2:27 ` [PATCH v2] RDMA/rxe: insert mcg into mcg_tree only after rxe_mcast_add() succeeds Michael Bommarito
@ 2026-06-17 15:50 ` Zhu Yanjun
0 siblings, 0 replies; 4+ messages in thread
From: Zhu Yanjun @ 2026-06-17 15:50 UTC (permalink / raw)
To: Michael Bommarito, Zhu Yanjun, Jason Gunthorpe, Leon Romanovsky,
yanjun.zhu@linux.dev
Cc: Bob Pearson, linux-rdma, linux-kernel
在 2026/6/16 19:27, Michael Bommarito 写道:
>
> rxe_get_mcg() publishes a newly allocated multicast group in
> rxe->mcg_tree before programming the backing Ethernet multicast address
> with rxe_mcast_add(), which runs outside mcg_lock. A local userspace
> RDMA client reaches this path with ATTACH_MCAST on a UD QP; if
> rxe_mcast_add() then returns an error (for example -ENODEV when the
> backing netdev has been removed, or a propagated dev_mc_add() error),
> the unwind frees the published group without removing it from the tree.
> A later lookup of the same MGID dereferences the freed struct rxe_mcg
> from __rxe_lookup_mcg().
>
> Fix this by keeping the new mcg private until rxe_mcast_add() succeeds.
> Split the tree publication into __rxe_publish_mcg(), call rxe_mcast_add()
> before taking the tree reference, and free the still-private mcg on
> failure. Because the group is never visible in mcg_tree until the
> multicast address is programmed, no concurrent caller can look it up or
> attach a QP to a group that is about to be torn down, so the error path
> needs no conditional unwind. If another caller publishes the same MGID
> while the address is being programmed, the post-add re-check under
> mcg_lock finds the winner; this caller then drops its private object and
> balances its own rxe_mcast_add() with rxe_mcast_del() before returning
> the winner.
>
> Reproduced by forcing the rxe_mcast_add() error return under KASAN:
> without the change the next attach to the same MGID reports a
> slab-use-after-free in __rxe_lookup_mcg(); with it the forced failure
> returns cleanly. A no-injection attach/detach regression, including a
> two-QP shared join/leave and re-attach, stays KASAN- and leak-clean.
>
> Fixes: a926a903b7dc ("RDMA/rxe: Do not call dev_mc_add/del() under a spinlock")
> Cc: stable@vger.kernel.org # v5.18+
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
> ---
> v2: switch approach in response to Zhu Yanjun's review of v1
> (https://lore.kernel.org/all/1158f1de-4469-463c-91de-c5e24d2add4f@linux.dev/). v1 unwound the
> already-published mcg on the failure path; Zhu noted that because
> rxe_mcast_add() runs outside mcg_lock, a concurrent caller can find
> the published mcg and attach a QP in that window, so an unconditional
> teardown would orphan those QPs and leak the mcg. Rather than make the
> teardown conditional, v2 does not publish the mcg into mcg_tree until
> after rxe_mcast_add() has succeeded, which removes the window entirely
> (the safety of the two-CPU same-MGID race and the loser-side
> rxe_mcast_del() balance is analysed in the commit log above). No
> Fixes/stable change; this is still the same UAF.
> v1: https://lore.kernel.org/all/20260614130443.2517578-1-michael.bommarito@gmail.com/
>
> Reproduction
> ============
>
> Tested on v7.1-rc4 (5200f5f493f7) on x86_64 QEMU/KVM with KASAN,
> rxe, rdma_rxe, and userspace verbs against a Soft-RoCE device.
>
> Conditions: the caller needs access to the rxe uverbs device and a UD
> QP. The demonstrated error path requires rxe_mcast_add() to fail after
> the mcg allocation; natural failures include netdev removal returning
> -ENODEV and dev_mc_add() errors such as -ENOMEM. The test made that
> return deterministic by forcing rxe_mcast_add() to return -ENODEV.
>
> The private harness creates a UD QP, attaches an MGID, forces the
> rxe_mcast_add() failure, and repeats attach on the same MGID so the
> lookup walks the stale rb-node.
>
> Stock: KASAN reports a slab-use-after-free in __rxe_lookup_mcg() while
> comparing mcg->mgid on the next attach.
> Patched: the forced-failure path returns without a KASAN report.
> Regression: no-injection attach/detach, two-QP shared join/leave, and
> re-attach complete without KASAN, WARN, or leak reports.
>
> Mitigations: restrict access to the rxe uverbs device or avoid loading
> the rxe driver where untrusted local users can create RDMA objects.
>
> The harness is available off-list on request. The RDMA selftest gate was
> checked; tools/testing/selftests/rdma does not contain a matching
> rxe_mcast or ATTACH_MCAST coverage test.
The transition from V1 conditional rollbacks to V2 ensure 'underlying
resource readiness before object publication' architecture is an elegant
and robust design choice.
By keeping the newly allocated mcg private until rxe_mcast_add()
successfully completes outside the lock, this commit completely closes
the race window where a concurrent caller could look up a half-baked or
failing multicast group.
Thanks a lot. I am fine with this commit. Please Jason and Leon comment
on this commit.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> drivers/infiniband/sw/rxe/rxe_mcast.c | 50 +++++++++++++++++++--------
> 1 file changed, 36 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_mcast.c b/drivers/infiniband/sw/rxe/rxe_mcast.c
> index 5cad72073eca1..eaa259cc39ea9 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mcast.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mcast.c
> @@ -175,7 +175,9 @@ struct rxe_mcg *rxe_lookup_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
> * @mgid: multicast address as a gid
> * @mcg: new mcg object
> *
> - * Context: caller should hold rxe->mcg lock
> + * Initializes the mcg fields. The mcg is private and not yet visible in
> + * mcg_tree, so this may run without rxe->mcg_lock; __rxe_publish_mcg()
> + * makes it visible under the lock once it is ready.
> */
> static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
> struct rxe_mcg *mcg)
> @@ -184,13 +186,22 @@ static void __rxe_init_mcg(struct rxe_dev *rxe, union ib_gid *mgid,
> memcpy(&mcg->mgid, mgid, sizeof(mcg->mgid));
> INIT_LIST_HEAD(&mcg->qp_list);
> mcg->rxe = rxe;
> +}
>
> - /* caller holds a ref on mcg but that will be
> - * dropped when mcg goes out of scope. We need to take a ref
> - * on the pointer that will be saved in the red-black tree
> - * by __rxe_insert_mcg and used to lookup mcg from mgid later.
> - * Inserting mcg makes it visible to outside so this should
> - * be done last after the object is ready.
> +/**
> + * __rxe_publish_mcg - make a fully initialized mcg visible in mcg_tree
> + * @mcg: the mcg object
> + *
> + * Context: caller must hold rxe->mcg_lock and a reference on mcg
> + */
> +static void __rxe_publish_mcg(struct rxe_mcg *mcg)
> +{
> + /* caller holds a ref on mcg but that will be dropped when mcg goes
> + * out of scope. We need to take a ref on the pointer that will be
> + * saved in the red-black tree by __rxe_insert_mcg and used to lookup
> + * mcg from mgid later. Inserting mcg makes it visible to outside so
> + * this is done last after the object is ready and the multicast
> + * address has been programmed.
> */
> kref_get(&mcg->ref_cnt);
> __rxe_insert_mcg(mcg);
> @@ -228,26 +239,37 @@ static struct rxe_mcg *rxe_get_mcg(struct rxe_dev *rxe, union ib_gid *mgid)
> err = -ENOMEM;
> goto err_dec;
> }
> + __rxe_init_mcg(rxe, mgid, mcg);
> +
> + /* program the multicast address while mcg is still private, before
> + * it is inserted into mcg_tree. dev_mc_add() may sleep so this must
> + * run outside mcg_lock. On failure mcg was never published, so a
> + * plain free is correct and the tree is untouched.
> + */
> + err = rxe_mcast_add(rxe, mgid);
> + if (err) {
> + kfree(mcg);
> + goto err_dec;
> + }
>
> spin_lock_bh(&rxe->mcg_lock);
> - /* re-check to see if someone else just added it */
> + /* re-check to see if someone else just added it while we were adding
> + * the multicast address; if so use theirs and drop ours
> + */
> tmp = __rxe_lookup_mcg(rxe, mgid);
> if (tmp) {
> spin_unlock_bh(&rxe->mcg_lock);
> + rxe_mcast_del(rxe, mgid);
> atomic_dec(&rxe->mcg_num);
> kfree(mcg);
> return tmp;
> }
>
> - __rxe_init_mcg(rxe, mgid, mcg);
> + __rxe_publish_mcg(mcg);
> spin_unlock_bh(&rxe->mcg_lock);
>
> - /* add mcast address outside of lock */
> - err = rxe_mcast_add(rxe, mgid);
> - if (!err)
> - return mcg;
> + return mcg;
>
> - kfree(mcg);
> err_dec:
> atomic_dec(&rxe->mcg_num);
> return ERR_PTR(err);
>
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-17 15:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-14 13:04 [PATCH] RDMA/rxe: destroy the mcg when rxe_mcast_add() fails in rxe_get_mcg() Michael Bommarito
2026-06-15 1:28 ` Zhu Yanjun
2026-06-17 2:27 ` [PATCH v2] RDMA/rxe: insert mcg into mcg_tree only after rxe_mcast_add() succeeds Michael Bommarito
2026-06-17 15:50 ` Zhu Yanjun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox