* [PATCH rdma-next v2 01/11] RDMA/mlx5: Remove DCT restrack tracking
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 02/11] RDMA/mlx5: Remove raw RSS QP " Edward Srouji
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji
From: Patrisious Haddad <phaddad@nvidia.com>
DCT restrack tracking wasn't working to begin with as it was only
tracking the first DCT which was added, since at creation the DCT number
isn't yet initialized because the DCT FW object is only created during
modify. The following DCT additions were failing silently.
Remove DCT tracking so a later patch which WARNS about restrack addition
failures doesn't WARN about it.
Fixes: fd3af5e21866 ("RDMA/mlx5: Track DCT, DCI and REG_UMR QPs as diver_detail resources.")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/hw/mlx5/qp.c | 1 +
drivers/infiniband/hw/mlx5/restrack.c | 3 ---
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 59f9ddb35d4620737980b2bc2179e0a11e6be29f..c54e7655763844b10943e12a70431da291c58b8a 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3110,6 +3110,7 @@ static int create_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
switch (qp->type) {
case MLX5_IB_QPT_DCT:
+ rdma_restrack_no_track(&qp->ibqp.res);
err = create_dct(dev, pd, qp, params);
break;
case MLX5_IB_QPT_DCI:
diff --git a/drivers/infiniband/hw/mlx5/restrack.c b/drivers/infiniband/hw/mlx5/restrack.c
index 67841922c7b8770c86fb5a47588e09560d0004f5..00a9bcb2603f0b094bcef8a4ffe6564699a85769 100644
--- a/drivers/infiniband/hw/mlx5/restrack.c
+++ b/drivers/infiniband/hw/mlx5/restrack.c
@@ -178,9 +178,6 @@ static int fill_res_qp_entry(struct sk_buff *msg, struct ib_qp *ibqp)
ret = nla_put_string(msg, RDMA_NLDEV_ATTR_RES_SUBTYPE,
"REG_UMR");
break;
- case MLX5_IB_QPT_DCT:
- ret = nla_put_string(msg, RDMA_NLDEV_ATTR_RES_SUBTYPE, "DCT");
- break;
case MLX5_IB_QPT_DCI:
ret = nla_put_string(msg, RDMA_NLDEV_ATTR_RES_SUBTYPE, "DCI");
break;
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 02/11] RDMA/mlx5: Remove raw RSS QP restrack tracking
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 01/11] RDMA/mlx5: Remove DCT restrack tracking Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion Edward Srouji
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji
From: Patrisious Haddad <phaddad@nvidia.com>
Raw RSS QP restrack tracking wasn't working to begin with as it was
only tracking the first raw RSS QP which was added, since at creation
the raw RSS QP number is reserved so the QP number for this qp type
was always zero.
The following raw RSS QP additions were always failing silently.
Hence remove the tracking for the raw RSS QP, support can be added
later for it if needed by using the tirn for tracking.
Fixes: 968f0b6f9c01 ("RDMA/mlx5: Consolidate into special function all create QP calls")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Chiara Meiohas <cmeiohas@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/hw/mlx5/qp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index c54e7655763844b10943e12a70431da291c58b8a..69914406156c448e9f1cafbc8165d04e120e36bd 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3104,6 +3104,7 @@ static int create_qp(struct mlx5_ib_dev *dev, struct ib_pd *pd,
int err;
if (params->is_rss_raw) {
+ rdma_restrack_no_track(&qp->ibqp.res);
err = create_rss_raw_qp_tir(dev, pd, qp, params);
goto out;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 01/11] RDMA/mlx5: Remove DCT restrack tracking Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 02/11] RDMA/mlx5: Remove raw RSS QP " Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 22:23 ` Jason Gunthorpe
2026-04-06 9:11 ` [PATCH rdma-next v2 04/11] RDMA/core: Fix use after free in ib_query_qp() Edward Srouji
` (7 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji
From: Patrisious Haddad <phaddad@nvidia.com>
rdma_restrack_add() currently always allocates a new ID via
xa_alloc_cyclic(), regardless of whether res->id is already set.
This change makes sure that the object’s ID remains the same across
removal and reinsertion to restrack.
This is a preparatory change for subsequent patches in the series
which will do rdma restrack removal and reinsertion.
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/restrack.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index ac3688952cabbff1ebb899bacb78421f2515231b..485e7357c90a5ff9660feac38a0ec01c0deb0000 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -32,7 +32,7 @@ int rdma_restrack_init(struct ib_device *dev)
rt = dev->res;
for (i = 0; i < RDMA_RESTRACK_MAX; i++)
- xa_init_flags(&rt[i].xa, XA_FLAGS_ALLOC);
+ xa_init_flags(&rt[i].xa, XA_FLAGS_ALLOC1);
return 0;
}
@@ -71,6 +71,8 @@ int rdma_restrack_count(struct ib_device *dev, enum rdma_restrack_type type,
xa_lock(&rt->xa);
xas_for_each(&xas, e, U32_MAX) {
+ if (xa_is_zero(e))
+ continue;
if (xa_get_mark(&rt->xa, e->id, RESTRACK_DD) && !show_details)
continue;
cnt++;
@@ -216,14 +218,24 @@ void rdma_restrack_add(struct rdma_restrack_entry *res)
ret = xa_insert(&rt->xa, counter->id, res, GFP_KERNEL);
res->id = ret ? 0 : counter->id;
} else {
- ret = xa_alloc_cyclic(&rt->xa, &res->id, res, xa_limit_32b,
- &rt->next_id, GFP_KERNEL);
- ret = (ret < 0) ? ret : 0;
+ /* If res->id is valid, try to reinsert at res->id index in
+ * order to maintain the same id in case of a reinsertion.
+ */
+ if (res->id) {
+ ret = xa_insert(&rt->xa, res->id, res, GFP_KERNEL);
+ } else {
+ ret = xa_alloc_cyclic(&rt->xa, &res->id, res,
+ xa_limit_32b, &rt->next_id,
+ GFP_KERNEL);
+ ret = (ret < 0) ? ret : 0;
+ }
}
out:
if (!ret)
res->valid = true;
+ else
+ WARN_ONCE(true, "Failed to insert restrack entry at res->id %u", res->id);
}
EXPORT_SYMBOL(rdma_restrack_add);
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion
2026-04-06 9:11 ` [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion Edward Srouji
@ 2026-04-06 22:23 ` Jason Gunthorpe
2026-04-07 9:18 ` Patrisious Haddad
0 siblings, 1 reply; 15+ messages in thread
From: Jason Gunthorpe @ 2026-04-06 22:23 UTC (permalink / raw)
To: Edward Srouji
Cc: Leon Romanovsky, Chiara Meiohas, Dennis Dalessandro, Gal Pressman,
Mark Bloch, Steve Wise, Mark Zhang, Neta Ostrovsky,
Patrisious Haddad, Doug Ledford, Matan Barak, majd, Maor Gottlieb,
linux-rdma, linux-kernel
On Mon, Apr 06, 2026 at 12:11:14PM +0300, Edward Srouji wrote:
> From: Patrisious Haddad <phaddad@nvidia.com>
>
> rdma_restrack_add() currently always allocates a new ID via
> xa_alloc_cyclic(), regardless of whether res->id is already set.
> This change makes sure that the object’s ID remains the same across
> removal and reinsertion to restrack.
It would be better to somehow pre-delete it so it is still in the
xarray but somehow blocked and then allow un pre-deleting. del/add
pairs are not a good design.
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion
2026-04-06 22:23 ` Jason Gunthorpe
@ 2026-04-07 9:18 ` Patrisious Haddad
2026-04-07 14:29 ` Jason Gunthorpe
0 siblings, 1 reply; 15+ messages in thread
From: Patrisious Haddad @ 2026-04-07 9:18 UTC (permalink / raw)
To: Jason Gunthorpe, Edward Srouji
Cc: Leon Romanovsky, Chiara Meiohas, Dennis Dalessandro, Gal Pressman,
Mark Bloch, Steve Wise, Mark Zhang, Neta Ostrovsky, Doug Ledford,
Matan Barak, majd, Maor Gottlieb, linux-rdma, linux-kernel
On 4/7/2026 1:23 AM, Jason Gunthorpe wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, Apr 06, 2026 at 12:11:14PM +0300, Edward Srouji wrote:
>> From: Patrisious Haddad <phaddad@nvidia.com>
>>
>> rdma_restrack_add() currently always allocates a new ID via
>> xa_alloc_cyclic(), regardless of whether res->id is already set.
>> This change makes sure that the object’s ID remains the same across
>> removal and reinsertion to restrack.
> It would be better to somehow pre-delete it so it is still in the
> xarray but somehow blocked and then allow un pre-deleting. del/add
> pairs are not a good design.
Usually del/add pairs not good due to re-addition possibility of failure
, here that cant happen ... so any reason why it is still considered bad ?
The problem with marking as deletion here is that it is not only the
xarray that is being done at the delete operation (there is restrack_put
and wait_for_completion inside the restrack del to sync with other
threads that are ongoing).
I don't really see how to pre-delete it correctly without actually
deleting in this case.
- Patrisious
>
> Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion
2026-04-07 9:18 ` Patrisious Haddad
@ 2026-04-07 14:29 ` Jason Gunthorpe
0 siblings, 0 replies; 15+ messages in thread
From: Jason Gunthorpe @ 2026-04-07 14:29 UTC (permalink / raw)
To: Patrisious Haddad
Cc: Edward Srouji, Leon Romanovsky, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Doug Ledford, Matan Barak, majd,
Maor Gottlieb, linux-rdma, linux-kernel
On Tue, Apr 07, 2026 at 12:18:07PM +0300, Patrisious Haddad wrote:
>
> On 4/7/2026 1:23 AM, Jason Gunthorpe wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Mon, Apr 06, 2026 at 12:11:14PM +0300, Edward Srouji wrote:
> > > From: Patrisious Haddad <phaddad@nvidia.com>
> > >
> > > rdma_restrack_add() currently always allocates a new ID via
> > > xa_alloc_cyclic(), regardless of whether res->id is already set.
> > > This change makes sure that the object’s ID remains the same across
> > > removal and reinsertion to restrack.
> > It would be better to somehow pre-delete it so it is still in the
> > xarray but somehow blocked and then allow un pre-deleting. del/add
> > pairs are not a good design.
> Usually del/add pairs not good due to re-addition possibility of failure ,
> here that cant happen ... so any reason why it is still considered bad ?
xa_insert can fail, so it's still a bad idea.
I do not want to see random calls to restrack_add ignoring the return
code. Some kind of restrack_abort_delete() with a void return and no
possibility for failure is required.
> The problem with marking as deletion here is that it is not only the xarray
> that is being done at the delete operation (there is restrack_put and
> wait_for_completion inside the restrack del to sync with other threads that
> are ongoing).
I think the main point of pre-delete is to fence the concurrency.
So what you probably want is to leave the entry in the xarray, or
perhaps set it to XA_ZERO and drive the refcount to zero so that none
of the xa_load patterns can return it. This is enough to fence the
concurrency while allowing abort to not require any memory allocation.
I remember looking at this once and it was complex to unravel all the
things that rdma_restrack_del with valid and no_track so I gave up..
Jason
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH rdma-next v2 04/11] RDMA/core: Fix use after free in ib_query_qp()
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (2 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 03/11] RDMA/core: Preserve restrack resource ID on reinsertion Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 05/11] RDMA/core: Fix potential use after free in ib_destroy_cq_user() Edward Srouji
` (6 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
From: Patrisious Haddad <phaddad@nvidia.com>
When querying a QP via the netlink flow the only synchronization
mechanism for the said QP is rdma_restrack_get(), meanwhile during the
QP destroy path rdma_restrack_del() is called at the end of the
ib_destroy_qp_user() function which is too late, since by then the
vendor-specific resources for said QP would already be destroyed, and
until the rdma_restrack_del() is called this QP can still be accessed,
which could cause the use after free below.
Fix this by moving the rdma_restrack_del() to the start of the
ib_destroy_qp_user(), which in turn waits for all usages of the QP to be
done, then removes it from the database to prevent access to it while it
is being destroyed.
RIP: 0010:ib_query_qp+0x15/0x50 [ib_core]
Code: 48 83 05 5d 8e b9 ff 01 eb b5 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 c7 46 40 00 00 00 00 48 c7 46 78 00 00 00 00 <48> 8b 07 48 8b 80 88 01 00 00 48 85 c0 74 1a 48 83 05 54 91 b9 ff
RSP: 0018:ff11000108a8f2f0 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ff11000108a8f370 RCX: ff11000108a8f370
RDX: 0000000000000000 RSI: ff11000108a8f3d8 RDI: 0000000000000000
RBP: ff1100010de5a000 R08: 0000000000000e80 R09: 0000000000000004
R10: ff110001057a604c R11: 0000000000000000 R12: ff11000108a8f370
R13: ff110001090e8000 R14: 0000000000000000 R15: ff110001057a602c
FS: 00007f2ffd8db6c0(0000) GS:ff110008dc90b000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000010b9a7004 CR4: 0000000000373eb0
Call Trace:
<TASK>
mlx5_ib_gsi_query_qp+0x21/0x50 [mlx5_ib]
mlx5_ib_query_qp+0x689/0x9d0 [mlx5_ib]
ib_query_qp+0x35/0x50 [ib_core]
fill_res_qp_entry_query.isra.0+0x47/0x280 [ib_core]
? __wake_up+0x40/0x50
? netlink_broadcast_filtered+0x15a/0x550
? kobject_uevent_env+0x562/0x710
? ep_poll_callback+0x242/0x270
? __nla_put+0xc/0x20
? nla_put+0x28/0x40
? nla_put_string+0x2e/0x40 [ib_core]
fill_res_qp_entry+0x138/0x190 [ib_core]
res_get_common_dumpit+0x4a5/0x800 [ib_core]
? fill_res_qp_entry_query.isra.0+0x280/0x280 [ib_core]
nldev_res_get_qp_dumpit+0x1e/0x30 [ib_core]
netlink_dump+0x16f/0x450
__netlink_dump_start+0x1ce/0x2e0
rdma_nl_rcv_msg+0x1d3/0x330 [ib_core]
? nldev_res_get_qp_raw_dumpit+0x30/0x30 [ib_core]
rdma_nl_rcv_skb.constprop.0.isra.0+0x108/0x180 [ib_core]
rdma_nl_rcv+0x12/0x20 [ib_core]
netlink_unicast+0x255/0x380
? __alloc_skb+0xfa/0x1e0
netlink_sendmsg+0x1f3/0x420
__sock_sendmsg+0x38/0x60
____sys_sendmsg+0x1e8/0x230
? copy_msghdr_from_user+0xea/0x170
___sys_sendmsg+0x7c/0xb0
? __futex_wait+0x95/0xf0
? __futex_wake_mark+0x40/0x40
? futex_wait+0x67/0x100
? futex_wake+0xac/0x1b0
__sys_sendmsg+0x5f/0xb0
do_syscall_64+0x55/0xb90
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Fixes: 514aee660df4 ("RDMA: Globally allocate and release QP memory")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/verbs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac87de9cc6735c5d25420a7fac8facdd77d5f09..f1438d5802a3e97e22cdb607cf90a097d041a162 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2157,6 +2157,8 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
if (qp->real_qp != qp)
return __ib_destroy_shared_qp(qp);
+ rdma_restrack_del(&qp->res);
+
sec = qp->qp_sec;
if (sec)
ib_destroy_qp_security_begin(sec);
@@ -2169,6 +2171,8 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
if (ret) {
if (sec)
ib_destroy_qp_security_abort(sec);
+ rdma_restrack_new(&qp->res, RDMA_RESTRACK_QP);
+ rdma_restrack_add(&qp->res);
return ret;
}
@@ -2181,7 +2185,6 @@ int ib_destroy_qp_user(struct ib_qp *qp, struct ib_udata *udata)
if (sec)
ib_destroy_qp_security_end(sec);
- rdma_restrack_del(&qp->res);
kfree(qp);
return ret;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 05/11] RDMA/core: Fix potential use after free in ib_destroy_cq_user()
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (3 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 04/11] RDMA/core: Fix use after free in ib_query_qp() Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 06/11] RDMA/core: Fix potential use after free in ib_destroy_srq_user() Edward Srouji
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
From: Patrisious Haddad <phaddad@nvidia.com>
When accessing a CQ via the netlink path the only synchronization
mechanism for the said CQ is rdma_restrack_get().
Currently, rdma_restrack_del() is invoked at the end of
ib_destroy_cq_user(), which is too late, since by that point
vendor-specific resources associated with the CQ might already be
freed. This can leave a short window where the CQ remains accessible
through restrack, leading to a potential use-after-free.
Fix this by moving the rdma_restrack_del() call to the start of
ib_destroy_cq_user(), ensuring that the CQ is removed from restrack
before its internal resources are released. This guarantees that no new
users hold references to a CQ that is in the process of destruction.
In addition, this change preserves the intended asymmetric behavior
between create and destroy routines: resources are added to the
restrack at the end of successful creation, and hence shall be removed
from the restrack first thing during the destruction flow, which keeps
the lifecycle management consistent and predictable.
Fixes: 08f294a1524b ("RDMA/core: Add resource tracking for create and destroy CQs")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/verbs.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f1438d5802a3e97e22cdb607cf90a097d041a162..0e8f99807c7c0ce063ed0c1561f4ba42b485b69d 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2256,12 +2256,16 @@ int ib_destroy_cq_user(struct ib_cq *cq, struct ib_udata *udata)
if (atomic_read(&cq->usecnt))
return -EBUSY;
+ rdma_restrack_del(&cq->res);
+
ret = cq->device->ops.destroy_cq(cq, udata);
- if (ret)
+ if (ret) {
+ rdma_restrack_new(&cq->res, RDMA_RESTRACK_CQ);
+ rdma_restrack_add(&cq->res);
return ret;
+ }
ib_umem_release(cq->umem);
- rdma_restrack_del(&cq->res);
kfree(cq);
return ret;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 06/11] RDMA/core: Fix potential use after free in ib_destroy_srq_user()
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (4 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 05/11] RDMA/core: Fix potential use after free in ib_destroy_cq_user() Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 07/11] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
From: Patrisious Haddad <phaddad@nvidia.com>
When accessing a SRQ via the netlink path the only synchronization
mechanism for the said SRQ is rdma_restrack_get().
Currently, rdma_restrack_del() is invoked at the end of
ib_destroy_srq_user(), which is too late, since by that point
vendor-specific resources associated with the SRQ might already be
freed. This can leave a short window where the SRQ remains accessible
through restrack, leading to a potential use-after-free.
Fix this by moving the rdma_restrack_del() call to the start of
ib_destroy_srq_user(), ensuring that the SRQ is removed from restrack
before its internal resources are released. This guarantees that no new
users hold references to a SRQ that is in the process of destruction.
In addition, this change preserves the intended asymmetric behavior
between create and destroy routines: resources are added to
restrack at the end of successful creation, and hence shall be removed
from the restrack first thing during the destruction flow, which keeps
the lifecycle management consistent and predictable.
Fixes: 48f8a70e899f ("RDMA/restrack: Add support to get resource tracking for SRQ")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/verbs.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 0e8f99807c7c0ce063ed0c1561f4ba42b485b69d..5921c6d008bb10bcce5f3b9bcc99de72193941db 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1139,16 +1139,20 @@ int ib_destroy_srq_user(struct ib_srq *srq, struct ib_udata *udata)
if (atomic_read(&srq->usecnt))
return -EBUSY;
+ rdma_restrack_del(&srq->res);
+
ret = srq->device->ops.destroy_srq(srq, udata);
- if (ret)
+ if (ret) {
+ rdma_restrack_new(&srq->res, RDMA_RESTRACK_SRQ);
+ rdma_restrack_add(&srq->res);
return ret;
+ }
atomic_dec(&srq->pd->usecnt);
if (srq->srq_type == IB_SRQT_XRC && srq->ext.xrc.xrcd)
atomic_dec(&srq->ext.xrc.xrcd->usecnt);
if (ib_srq_has_cq(srq->srq_type))
atomic_dec(&srq->ext.cq->usecnt);
- rdma_restrack_del(&srq->res);
kfree(srq);
return ret;
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 07/11] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (5 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 06/11] RDMA/core: Fix potential use after free in ib_destroy_srq_user() Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 08/11] RDMA/mlx5: Fix UAF in DCT " Edward Srouji
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].
After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.
[1] RIP: 0010:mlx5_cmd_destroy_srq+0xd8/0x110 [mlx5_ib]
Code: 89 e1 ba 06 04 00 00 4c 89 f6 48 89 ef e8 80 19 70 e1 c6 83 a0 0f 00 00 00 fb 5b 44 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc <0f> 0b 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 08
RSP: 0018:ff110001037b7d08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff1100010bb9c000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff110001037b7c90
RBP: ff1100010bb9cfa0 R08: 0000000000000000 R09: 0000000000000000
R10: ff110001037b7da0 R11: ff11000104f29580 R12: ff1100010e2ac090
R13: 000000000000000d R14: 0000000000000001 R15: ff11000105336300
FS: 00007fa24787c740(0000) GS:ff1100046eb8d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa247984e90 CR3: 0000000109d59005 CR4: 0000000000373eb0
Call Trace:
<TASK>
mlx5_ib_destroy_srq+0x25/0xa0 [mlx5_ib]
ib_destroy_srq_user+0x21/0x90 [ib_core]
uverbs_free_srq+0x1b/0x50 [ib_uverbs]
destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x35/0x180 [ib_uverbs]
__uverbs_cleanup_ufile+0xdd/0x140 [ib_uverbs]
uverbs_destroy_ufile_hw+0x38/0xf0 [ib_uverbs]
ib_uverbs_close+0x17/0xa0 [ib_uverbs]
__fput+0xe0/0x2a0
__x64_sys_close+0x3a/0x80
do_syscall_64+0x55/0xac0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa247984ea4
Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d a5 51 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
RSP: 002b:00007ffecfa79498 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000200000000080 RCX: 00007fa247984ea4
RDX: 0000000000000040 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007ffecfa794e0 R08: 00007ffecfa794e0 R09: 00007ffecfa794e0
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 0000200000000000 R15: 0000200000000009
</TASK>
---[ end trace 0000000000000000 ]---
Fixes: fd89099d635e ("RDMA/mlx5: Issue FW command to destroy SRQ on reentry")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
drivers/infiniband/hw/mlx5/srq_cmd.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
index 8b338539659933aef94a3e2c056e9400c3fb9bb0..c1a088120915c5741f37ed44fd2e8139bcb6802e 100644
--- a/drivers/infiniband/hw/mlx5/srq_cmd.c
+++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
@@ -683,7 +683,14 @@ int mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, srq, 0);
return err;
}
- xa_erase_irq(&table->array, srq->srqn);
+
+ /*
+ * A race can occur where a concurrent create gets the same srqn
+ * (after hardware released it) and overwrites XA_ZERO_ENTRY with
+ * its new SRQ before we reach here. In that case, we must not erase
+ * the entry as it now belongs to the new SRQ.
+ */
+ xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, NULL, 0);
mlx5_core_res_put(&srq->common);
wait_for_completion(&srq->common.free);
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 08/11] RDMA/mlx5: Fix UAF in DCT destroy due to race with create
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (6 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 07/11] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 09/11] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
A potential race condition exists between mlx5_core_destroy_dct() and
mlx5_core_create_dct() that can lead to a use-after-free.
After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN
can be immediately reallocated for a new DCT being created concurrently.
If the create path stores the new DCT in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new DCT's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created DCT.
Fixes: afff24899846 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
drivers/infiniband/hw/mlx5/qpc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index 146d03ae40bd9fd9650530fba77eb7e942d5fe79..a7a4f9420271a228e161aaac1ffa432d304ce431 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -314,7 +314,14 @@ int mlx5_core_destroy_dct(struct mlx5_ib_dev *dev,
xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, XA_ZERO_ENTRY, dct, 0);
return err;
}
- xa_erase_irq(&table->dct_xa, dct->mqp.qpn);
+
+ /*
+ * A race can occur where a concurrent create gets the same dctn
+ * (after hardware released it) and overwrites XA_ZERO_ENTRY with
+ * its new DCT before we reach here. In that case, we must not erase
+ * the entry as it now belongs to the new DCT.
+ */
+ xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, XA_ZERO_ENTRY, NULL, 0);
return 0;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 09/11] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (7 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 08/11] RDMA/mlx5: Fix UAF in DCT " Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 10/11] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 11/11] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a
netlink request to the userspace daemon to perform IP-to-GID
resolution in certain cases. The function allocates the netlink message
buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the
size of size_t) instead of 16 bytes (the size of an IPv6 address).
This results in an 8-byte under-allocation.
This is currently masked by nlmsg_new() over-allocation of the skb
in its internal logic. However, the code remains incorrect.
Fix the issue by supplying the proper IPv6 address length to
nla_total_size().
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/addr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 866746695712aeae425100eefb231e44d52d52d4..01c8e8806eebe511b405d17604cca28e3ed92571 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -150,7 +150,7 @@ static int ib_nl_ip_send_msg(struct rdma_dev_addr *dev_addr,
attrtype = RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_IPV6;
}
- len = nla_total_size(sizeof(size));
+ len = nla_total_size(size);
len += NLMSG_ALIGN(sizeof(*header));
skb = nlmsg_new(len, GFP_KERNEL);
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 10/11] RDMA/core: Fix rereg_mr use-after-free race
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (8 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 09/11] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
2026-04-06 9:11 ` [PATCH rdma-next v2 11/11] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik,
Maher Sanalla
From: Michael Guralnik <michaelgur@nvidia.com>
When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.
A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.
Racing flow between two rereg_mr calls:
CPU0 CPU1
---- ----
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr0
mr1 = driver→rereg()
rdma_alloc_commit_uobject(mr1)
// mr1 replaced mr0 and is unlocked
uobj_put_destroy(mr0)
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr1
mr2 = driver→rereg()
rdma_alloc_commit_uobject(mr2)
// mr2 replaced mr1 and is unlocked
uobj_put_destroy(mr1)
// Destroys mr1!
resp.lkey = mr1->lkey; // UAF - mr1 was freed!
resp.rkey = mr1->rkey; // UAF - mr1 was freed!
Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.
Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/uverbs_cmd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a768436ba46805a81ab5a0b8acd4d64b4f2b1b51..91a62d2ade4dd0ce402604ec283f8cdc70d2ef06 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -778,6 +778,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
struct ib_pd *orig_pd;
struct ib_pd *new_pd;
struct ib_mr *new_mr;
+ u32 lkey, rkey;
ret = uverbs_request(attrs, &cmd, sizeof(cmd));
if (ret)
@@ -846,6 +847,8 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
new_mr->uobject = uobj;
atomic_inc(&new_pd->usecnt);
new_uobj->object = new_mr;
+ lkey = new_mr->lkey;
+ rkey = new_mr->rkey;
rdma_restrack_new(&new_mr->res, RDMA_RESTRACK_MR);
rdma_restrack_set_name(&new_mr->res, NULL);
@@ -871,11 +874,13 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
mr->iova = cmd.hca_va;
mr->length = cmd.length;
}
+ lkey = mr->lkey;
+ rkey = mr->rkey;
}
memset(&resp, 0, sizeof(resp));
- resp.lkey = mr->lkey;
- resp.rkey = mr->rkey;
+ resp.lkey = lkey;
+ resp.rkey = rkey;
ret = uverbs_response(attrs, &resp, sizeof(resp));
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH rdma-next v2 11/11] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
2026-04-06 9:11 [PATCH rdma-next v2 00/11] RDMA: Stability and race condition fixes Edward Srouji
` (9 preceding siblings ...)
2026-04-06 9:11 ` [PATCH rdma-next v2 10/11] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
@ 2026-04-06 9:11 ` Edward Srouji
10 siblings, 0 replies; 15+ messages in thread
From: Edward Srouji @ 2026-04-06 9:11 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik,
Maher Sanalla
From: Michael Guralnik <michaelgur@nvidia.com>
Raw Packet QPs are unique in that they support separate send and receive
queues, using 2 different user-provided buffers.
They can also be created with one of the queues having size 0, allowing
a send-only or receive-only QP.
The Raw Packet RQ umem is created in the common user QP creation path,
which allows zero-length queues. Add a later validation of the RQ umem
in Raw Packet QP creation path when an RQ was requested.
This prevents possible null-ptr dereference crashes, as seen in the
below trace:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 6 UID: 0 PID: 3539 Comm: raw_packet_umem Not tainted 6.19.0-rc1+ #166 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]
Code: ff df 41 57 49 89 ff 41 56 41 55 41 89 d5 41 54 4d 89 cc 4c 8d 4f 30 55 4c 89 ca 48 89 f5 53 48 c1 ea 03 48 89 cb 48 83 ec 18 <80> 3c 02 00 44 89 04 24 0f 85 01 02 00 00 48 ba 00 00 00 00 00 fc
RSP: 0018:ff1100013966f4e0 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: 00000000ffffffc0 RCX: 00000000ffffffc0
RDX: 0000000000000006 RSI: 00000ffffffff000 RDI: 0000000000000000
RBP: 00000ffffffff000 R08: 0000000000000040 R09: 0000000000000030
R10: 0000000000000000 R11: 0000000000000000 R12: ff1100013966f648
R13: 0000000000000005 R14: ff1100013966f980 R15: 0000000000000000
FS: 00007fae6c82f740(0000) GS:ff11000898ba1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000000 CR3: 000000010f96c005 CR4: 0000000000373eb0
Call Trace:
<TASK>
create_qp+0x747d/0xc740 [mlx5_ib]
? is_module_address+0x18/0x110
? _create_user_qp.constprop.0+0x18e0/0x18e0 [mlx5_ib]
? __module_address+0x49/0x210
? is_module_address+0x68/0x110
? static_obj+0x67/0x90
? lockdep_init_map_type+0x58/0x200
mlx5_ib_create_qp+0xc85/0x2620 [mlx5_ib]
? find_held_lock+0x2b/0x80
? create_qp+0xc740/0xc740 [mlx5_ib]
? lock_release+0xcb/0x260
? lockdep_init_map_type+0x58/0x200
? __init_swait_queue_head+0xcb/0x150
create_qp.part.0+0x558/0x7c0 [ib_core]
ib_create_qp_user+0xa0/0x4f0 [ib_core]
? rdma_lookup_get_uobject+0x1e4/0x400 [ib_uverbs]
create_qp+0xe4f/0x1d10 [ib_uverbs]
? ib_uverbs_rereg_mr+0xd40/0xd40 [ib_uverbs]
? ib_uverbs_cq_event_handler+0x120/0x120 [ib_uverbs]
? __might_fault+0x81/0x100
? lock_release+0xcb/0x260
? _copy_from_user+0x3e/0x90
ib_uverbs_create_qp+0x10a/0x150 [ib_uverbs]
? ib_uverbs_ex_create_qp+0xe0/0xe0 [ib_uverbs]
? __might_fault+0x81/0x100
? lock_release+0xcb/0x260
ib_uverbs_write+0x7e5/0xc90 [ib_uverbs]
? uverbs_devnode+0xc0/0xc0 [ib_uverbs]
? lock_acquire+0xfa/0x2b0
? find_held_lock+0x2b/0x80
? finish_task_switch.isra.0+0x189/0x6c0
vfs_write+0x1c0/0xf70
? lockdep_hardirqs_on_prepare+0xde/0x170
? kernel_write+0x5a0/0x5a0
? __switch_to+0x527/0xe60
? __schedule+0x10a3/0x3950
? io_schedule_timeout+0x110/0x110
ksys_write+0x170/0x1c0
? __x64_sys_read+0xb0/0xb0
? trace_hardirqs_off.part.0+0x4e/0xe0
do_syscall_64+0x70/0x1360
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fae6ca3118d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b cc 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe678ca308 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ffe678ca448 RCX: 00007fae6ca3118d
RDX: 0000000000000070 RSI: 0000200000000280 RDI: 0000000000000003
RBP: 00007ffe678ca320 R08: 00000000ffffffff R09: 00007fae6c8ec5b8
R10: 0000000000000064 R11: 0000000000000213 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fae6cb71000 R15: 0000000000404df0
</TASK>
Modules linked in: mlx5_ib mlx5_fwctl mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_ucm ib_uverbs rdma_cm iw_cm ib_ipoib ib_cm ib_umad ib_core rpcsec_gss_krb5 auth_rpcgss oid_registry overlay nfnetlink zram zsmalloc fuse scsi_transport_iscsi [last unloaded: mlx5_core]
---[ end trace 0000000000000000 ]---
RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]
Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/hw/mlx5/qp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 69914406156c448e9f1cafbc8165d04e120e36bd..95229fd3627447510dafcc798c36158ed6991233 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1603,6 +1603,11 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
}
if (qp->rq.wqe_cnt) {
+ if (!rq->base.ubuffer.umem) {
+ err = -EINVAL;
+ goto err_destroy_sq;
+ }
+
rq->base.container_mibqp = qp;
if (qp->flags & IB_QP_CREATE_CVLAN_STRIPPING)
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread