* [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes
@ 2026-04-27 11:02 Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 1/5] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik,
Maher Sanalla
This series addresses several stability issues in RDMA core and the
mlx5 driver.
Patches 1-2 fix xarray race conditions in the mlx5 SRQ and DCT destroy
paths where a concurrent create can reuse the same firmware object
number right after firmware releases it, causing the destroy path to
incorrectly erase the newly created entry.
The remaining patches are independent fixes.
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
Changes in v3:
- Dropped restrack destroy-ordering fixes (patches 1-6 in v2) to
rework them in a dedicated series based on code-review feedback
- Rebased onto latest for-next
- Link to v2: https://lore.kernel.org/r/20260406-security-bug-fixes-v2-0-ee8815fa81b7@nvidia.com
Changes in v2:
- Added patch "RDMA/mlx5: Remove raw RSS QP restrack tracking" to
also suppress broken tracking for raw RSS QPs, which suffer from
the same silent failures as DCTs
- Link to v1: https://lore.kernel.org/r/20260325-security-bug-fixes-v1-0-c8332981ad26@nvidia.com
---
Edward Srouji (2):
RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
RDMA/mlx5: Fix UAF in DCT destroy due to race with create
Maher Sanalla (1):
IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
Michael Guralnik (2):
RDMA/core: Fix rereg_mr use-after-free race
RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
drivers/infiniband/core/addr.c | 2 +-
drivers/infiniband/core/uverbs_cmd.c | 9 +++++++--
drivers/infiniband/hw/mlx5/qp.c | 5 +++++
drivers/infiniband/hw/mlx5/qpc.c | 9 ++++++++-
drivers/infiniband/hw/mlx5/srq_cmd.c | 9 ++++++++-
5 files changed, 29 insertions(+), 5 deletions(-)
---
base-commit: 9091e3b59f2bef11c0a841096327565ae0ca220b
change-id: 20260325-security-bug-fixes-6fdef22d9412
Best regards,
--
Edward Srouji <edwards@nvidia.com>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH rdma-next v3 1/5] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
@ 2026-04-27 11:02 ` Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 2/5] RDMA/mlx5: Fix UAF in DCT " Edward Srouji
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].
After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.
[1] RIP: 0010:mlx5_cmd_destroy_srq+0xd8/0x110 [mlx5_ib]
Code: 89 e1 ba 06 04 00 00 4c 89 f6 48 89 ef e8 80 19 70 e1 c6 83 a0 0f 00 00 00 fb 5b 44 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc <0f> 0b 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 08
RSP: 0018:ff110001037b7d08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff1100010bb9c000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff110001037b7c90
RBP: ff1100010bb9cfa0 R08: 0000000000000000 R09: 0000000000000000
R10: ff110001037b7da0 R11: ff11000104f29580 R12: ff1100010e2ac090
R13: 000000000000000d R14: 0000000000000001 R15: ff11000105336300
FS: 00007fa24787c740(0000) GS:ff1100046eb8d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa247984e90 CR3: 0000000109d59005 CR4: 0000000000373eb0
Call Trace:
<TASK>
mlx5_ib_destroy_srq+0x25/0xa0 [mlx5_ib]
ib_destroy_srq_user+0x21/0x90 [ib_core]
uverbs_free_srq+0x1b/0x50 [ib_uverbs]
destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs]
uverbs_destroy_uobject+0x35/0x180 [ib_uverbs]
__uverbs_cleanup_ufile+0xdd/0x140 [ib_uverbs]
uverbs_destroy_ufile_hw+0x38/0xf0 [ib_uverbs]
ib_uverbs_close+0x17/0xa0 [ib_uverbs]
__fput+0xe0/0x2a0
__x64_sys_close+0x3a/0x80
do_syscall_64+0x55/0xac0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa247984ea4
Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d a5 51 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
RSP: 002b:00007ffecfa79498 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000200000000080 RCX: 00007fa247984ea4
RDX: 0000000000000040 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007ffecfa794e0 R08: 00007ffecfa794e0 R09: 00007ffecfa794e0
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 0000200000000000 R15: 0000200000000009
</TASK>
---[ end trace 0000000000000000 ]---
Fixes: fd89099d635e ("RDMA/mlx5: Issue FW command to destroy SRQ on reentry")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
drivers/infiniband/hw/mlx5/srq_cmd.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
index 8b338539659933aef94a3e2c056e9400c3fb9bb0..c1a088120915c5741f37ed44fd2e8139bcb6802e 100644
--- a/drivers/infiniband/hw/mlx5/srq_cmd.c
+++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
@@ -683,7 +683,14 @@ int mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, srq, 0);
return err;
}
- xa_erase_irq(&table->array, srq->srqn);
+
+ /*
+ * A race can occur where a concurrent create gets the same srqn
+ * (after hardware released it) and overwrites XA_ZERO_ENTRY with
+ * its new SRQ before we reach here. In that case, we must not erase
+ * the entry as it now belongs to the new SRQ.
+ */
+ xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, NULL, 0);
mlx5_core_res_put(&srq->common);
wait_for_completion(&srq->common.free);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v3 2/5] RDMA/mlx5: Fix UAF in DCT destroy due to race with create
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 1/5] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
@ 2026-04-27 11:02 ` Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 3/5] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik
A potential race condition exists between mlx5_core_destroy_dct() and
mlx5_core_create_dct() that can lead to a use-after-free.
After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN
can be immediately reallocated for a new DCT being created concurrently.
If the create path stores the new DCT in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new DCT's entry.
Later accesses then hit freed memory.
Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created DCT.
Fixes: afff24899846 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
drivers/infiniband/hw/mlx5/qpc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/qpc.c b/drivers/infiniband/hw/mlx5/qpc.c
index 146d03ae40bd9fd9650530fba77eb7e942d5fe79..a7a4f9420271a228e161aaac1ffa432d304ce431 100644
--- a/drivers/infiniband/hw/mlx5/qpc.c
+++ b/drivers/infiniband/hw/mlx5/qpc.c
@@ -314,7 +314,14 @@ int mlx5_core_destroy_dct(struct mlx5_ib_dev *dev,
xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, XA_ZERO_ENTRY, dct, 0);
return err;
}
- xa_erase_irq(&table->dct_xa, dct->mqp.qpn);
+
+ /*
+ * A race can occur where a concurrent create gets the same dctn
+ * (after hardware released it) and overwrites XA_ZERO_ENTRY with
+ * its new DCT before we reach here. In that case, we must not erase
+ * the entry as it now belongs to the new DCT.
+ */
+ xa_cmpxchg_irq(&table->dct_xa, dct->mqp.qpn, XA_ZERO_ENTRY, NULL, 0);
return 0;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v3 3/5] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 1/5] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 2/5] RDMA/mlx5: Fix UAF in DCT " Edward Srouji
@ 2026-04-27 11:02 ` Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 4/5] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 5/5] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji
4 siblings, 0 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Maher Sanalla
From: Maher Sanalla <msanalla@nvidia.com>
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a
netlink request to the userspace daemon to perform IP-to-GID
resolution in certain cases. The function allocates the netlink message
buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the
size of size_t) instead of 16 bytes (the size of an IPv6 address).
This results in an 8-byte under-allocation.
This is currently masked by nlmsg_new() over-allocation of the skb
in its internal logic. However, the code remains incorrect.
Fix the issue by supplying the proper IPv6 address length to
nla_total_size().
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/addr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 6526fda8f9c0bbbcddcf54f5c953d3f7a9785d66..5cd930d47eae52d35db8657ab3fc5993c5cd7770 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -150,7 +150,7 @@ static int ib_nl_ip_send_msg(struct rdma_dev_addr *dev_addr,
attrtype = RDMA_NLA_F_MANDATORY | LS_NLA_TYPE_IPV6;
}
- len = nla_total_size(sizeof(size));
+ len = nla_total_size(size);
len += NLMSG_ALIGN(sizeof(*header));
skb = nlmsg_new(len, GFP_KERNEL);
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v3 4/5] RDMA/core: Fix rereg_mr use-after-free race
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
` (2 preceding siblings ...)
2026-04-27 11:02 ` [PATCH rdma-next v3 3/5] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
@ 2026-04-27 11:02 ` Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 5/5] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji
4 siblings, 0 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik,
Maher Sanalla
From: Michael Guralnik <michaelgur@nvidia.com>
When a driver creates a new MR during rereg_user_mr, a race window
exists between rdma_alloc_commit_uobject() for the new MR and the point
where the code reads that MR to populate the response keys.
A concurrent rereg_mr or destroy_mr could destroy the MR in this window
and cause UAF in the first thread.
Racing flow between two rereg_mr calls:
CPU0 CPU1
---- ----
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr0
mr1 = driver→rereg()
rdma_alloc_commit_uobject(mr1)
// mr1 replaced mr0 and is unlocked
uobj_put_destroy(mr0)
rereg_user_mr(mr_handle)
uobj_get_write(mr_handle) -> mr1
mr2 = driver→rereg()
rdma_alloc_commit_uobject(mr2)
// mr2 replaced mr1 and is unlocked
uobj_put_destroy(mr1)
// Destroys mr1!
resp.lkey = mr1->lkey; // UAF - mr1 was freed!
resp.rkey = mr1->rkey; // UAF - mr1 was freed!
Fix by storing lkey/rkey in local variables before the new MR is
unlocked and using the local variables to set the user response.
Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/uverbs_cmd.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a768436ba46805a81ab5a0b8acd4d64b4f2b1b51..91a62d2ade4dd0ce402604ec283f8cdc70d2ef06 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -778,6 +778,7 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
struct ib_pd *orig_pd;
struct ib_pd *new_pd;
struct ib_mr *new_mr;
+ u32 lkey, rkey;
ret = uverbs_request(attrs, &cmd, sizeof(cmd));
if (ret)
@@ -846,6 +847,8 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
new_mr->uobject = uobj;
atomic_inc(&new_pd->usecnt);
new_uobj->object = new_mr;
+ lkey = new_mr->lkey;
+ rkey = new_mr->rkey;
rdma_restrack_new(&new_mr->res, RDMA_RESTRACK_MR);
rdma_restrack_set_name(&new_mr->res, NULL);
@@ -871,11 +874,13 @@ static int ib_uverbs_rereg_mr(struct uverbs_attr_bundle *attrs)
mr->iova = cmd.hca_va;
mr->length = cmd.length;
}
+ lkey = mr->lkey;
+ rkey = mr->rkey;
}
memset(&resp, 0, sizeof(resp));
- resp.lkey = mr->lkey;
- resp.rkey = mr->rkey;
+ resp.lkey = lkey;
+ resp.rkey = rkey;
ret = uverbs_response(attrs, &resp, sizeof(resp));
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH rdma-next v3 5/5] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
` (3 preceding siblings ...)
2026-04-27 11:02 ` [PATCH rdma-next v3 4/5] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
@ 2026-04-27 11:02 ` Edward Srouji
4 siblings, 0 replies; 6+ messages in thread
From: Edward Srouji @ 2026-04-27 11:02 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe, Chiara Meiohas,
Dennis Dalessandro, Gal Pressman, Mark Bloch, Steve Wise,
Mark Zhang, Neta Ostrovsky, Patrisious Haddad, Doug Ledford,
Matan Barak, majd, Maor Gottlieb
Cc: linux-rdma, linux-kernel, Edward Srouji, Michael Guralnik,
Maher Sanalla
From: Michael Guralnik <michaelgur@nvidia.com>
Raw Packet QPs are unique in that they support separate send and receive
queues, using 2 different user-provided buffers.
They can also be created with one of the queues having size 0, allowing
a send-only or receive-only QP.
The Raw Packet RQ umem is created in the common user QP creation path,
which allows zero-length queues. Add a later validation of the RQ umem
in Raw Packet QP creation path when an RQ was requested.
This prevents possible null-ptr dereference crashes, as seen in the
below trace:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
CPU: 6 UID: 0 PID: 3539 Comm: raw_packet_umem Not tainted 6.19.0-rc1+ #166 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]
Code: ff df 41 57 49 89 ff 41 56 41 55 41 89 d5 41 54 4d 89 cc 4c 8d 4f 30 55 4c 89 ca 48 89 f5 53 48 c1 ea 03 48 89 cb 48 83 ec 18 <80> 3c 02 00 44 89 04 24 0f 85 01 02 00 00 48 ba 00 00 00 00 00 fc
RSP: 0018:ff1100013966f4e0 EFLAGS: 00010282
RAX: dffffc0000000000 RBX: 00000000ffffffc0 RCX: 00000000ffffffc0
RDX: 0000000000000006 RSI: 00000ffffffff000 RDI: 0000000000000000
RBP: 00000ffffffff000 R08: 0000000000000040 R09: 0000000000000030
R10: 0000000000000000 R11: 0000000000000000 R12: ff1100013966f648
R13: 0000000000000005 R14: ff1100013966f980 R15: 0000000000000000
FS: 00007fae6c82f740(0000) GS:ff11000898ba1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000200000000000 CR3: 000000010f96c005 CR4: 0000000000373eb0
Call Trace:
<TASK>
create_qp+0x747d/0xc740 [mlx5_ib]
? is_module_address+0x18/0x110
? _create_user_qp.constprop.0+0x18e0/0x18e0 [mlx5_ib]
? __module_address+0x49/0x210
? is_module_address+0x68/0x110
? static_obj+0x67/0x90
? lockdep_init_map_type+0x58/0x200
mlx5_ib_create_qp+0xc85/0x2620 [mlx5_ib]
? find_held_lock+0x2b/0x80
? create_qp+0xc740/0xc740 [mlx5_ib]
? lock_release+0xcb/0x260
? lockdep_init_map_type+0x58/0x200
? __init_swait_queue_head+0xcb/0x150
create_qp.part.0+0x558/0x7c0 [ib_core]
ib_create_qp_user+0xa0/0x4f0 [ib_core]
? rdma_lookup_get_uobject+0x1e4/0x400 [ib_uverbs]
create_qp+0xe4f/0x1d10 [ib_uverbs]
? ib_uverbs_rereg_mr+0xd40/0xd40 [ib_uverbs]
? ib_uverbs_cq_event_handler+0x120/0x120 [ib_uverbs]
? __might_fault+0x81/0x100
? lock_release+0xcb/0x260
? _copy_from_user+0x3e/0x90
ib_uverbs_create_qp+0x10a/0x150 [ib_uverbs]
? ib_uverbs_ex_create_qp+0xe0/0xe0 [ib_uverbs]
? __might_fault+0x81/0x100
? lock_release+0xcb/0x260
ib_uverbs_write+0x7e5/0xc90 [ib_uverbs]
? uverbs_devnode+0xc0/0xc0 [ib_uverbs]
? lock_acquire+0xfa/0x2b0
? find_held_lock+0x2b/0x80
? finish_task_switch.isra.0+0x189/0x6c0
vfs_write+0x1c0/0xf70
? lockdep_hardirqs_on_prepare+0xde/0x170
? kernel_write+0x5a0/0x5a0
? __switch_to+0x527/0xe60
? __schedule+0x10a3/0x3950
? io_schedule_timeout+0x110/0x110
ksys_write+0x170/0x1c0
? __x64_sys_read+0xb0/0xb0
? trace_hardirqs_off.part.0+0x4e/0xe0
do_syscall_64+0x70/0x1360
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fae6ca3118d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b cc 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe678ca308 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ffe678ca448 RCX: 00007fae6ca3118d
RDX: 0000000000000070 RSI: 0000200000000280 RDI: 0000000000000003
RBP: 00007ffe678ca320 R08: 00000000ffffffff R09: 00007fae6c8ec5b8
R10: 0000000000000064 R11: 0000000000000213 R12: 0000000000000001
R13: 0000000000000000 R14: 00007fae6cb71000 R15: 0000000000404df0
</TASK>
Modules linked in: mlx5_ib mlx5_fwctl mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_ucm ib_uverbs rdma_cm iw_cm ib_ipoib ib_cm ib_umad ib_core rpcsec_gss_krb5 auth_rpcgss oid_registry overlay nfnetlink zram zsmalloc fuse scsi_transport_iscsi [last unloaded: mlx5_core]
---[ end trace 0000000000000000 ]---
RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib]
Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP")
Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/hw/mlx5/qp.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 8f50e7342a76949f7caf24f9ea32243ea60e2b83..1611a704c1b33231063f5ff0f155ce7d5148e508 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1603,6 +1603,11 @@ static int create_raw_packet_qp(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp,
}
if (qp->rq.wqe_cnt) {
+ if (!rq->base.ubuffer.umem) {
+ err = -EINVAL;
+ goto err_destroy_sq;
+ }
+
rq->base.container_mibqp = qp;
if (qp->flags & IB_QP_CREATE_CVLAN_STRIPPING)
--
2.49.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-27 11:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 11:02 [PATCH rdma-next v3 0/5] RDMA: Stability and race condition fixes Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 1/5] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 2/5] RDMA/mlx5: Fix UAF in DCT " Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 3/5] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 4/5] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
2026-04-27 11:02 ` [PATCH rdma-next v3 5/5] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox