public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Edward Srouji <edwards@nvidia.com>
To: Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>,
	"Chiara Meiohas" <cmeiohas@nvidia.com>,
	Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	Gal Pressman <galpress@amazon.com>,
	Mark Bloch <markb@mellanox.com>,
	Steve Wise <larrystevenwise@gmail.com>,
	Mark Zhang <markzhang@nvidia.com>,
	"Neta Ostrovsky" <netao@nvidia.com>,
	Patrisious Haddad <phaddad@nvidia.com>,
	"Doug Ledford" <dledford@redhat.com>,
	Matan Barak <matanb@mellanox.com>, <majd@mellanox.com>
Cc: <linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	"Edward Srouji" <edwards@nvidia.com>,
	Michael Guralnik <michaelgur@nvidia.com>
Subject: [PATCH rdma-next 06/10] RDMA/mlx5: Fix UAF in SRQ destroy due to race with create
Date: Wed, 25 Mar 2026 21:00:06 +0200	[thread overview]
Message-ID: <20260325-security-bug-fixes-v1-6-c8332981ad26@nvidia.com> (raw)
In-Reply-To: <20260325-security-bug-fixes-v1-0-c8332981ad26@nvidia.com>

A race condition exists between mlx5_cmd_destroy_srq() and
mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1].

After destroy_srq_split() releases the SRQ to firmware, the SRQN can be
immediately reallocated for a new SRQ being created concurrently. If the
create path stores the new SRQ in the xarray before the destroy path
erases it, the destroy will incorrectly delete the new SRQ's entry.
Later accesses then hit freed memory.

Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq()
that only erases the entry if it hasn't already been replaced (still
contains XA_ZERO_ENTRY), preserving any newly created SRQ.

[1] RIP: 0010:mlx5_cmd_destroy_srq+0xd8/0x110 [mlx5_ib]
Code: 89 e1 ba 06 04 00 00 4c 89 f6 48 89 ef e8 80 19 70 e1 c6 83 a0 0f 00 00 00 fb 5b 44 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc <0f> 0b 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 08
RSP: 0018:ff110001037b7d08 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ff1100010bb9c000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff110001037b7c90
RBP: ff1100010bb9cfa0 R08: 0000000000000000 R09: 0000000000000000
R10: ff110001037b7da0 R11: ff11000104f29580 R12: ff1100010e2ac090
R13: 000000000000000d R14: 0000000000000001 R15: ff11000105336300
FS:  00007fa24787c740(0000) GS:ff1100046eb8d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa247984e90 CR3: 0000000109d59005 CR4: 0000000000373eb0
Call Trace:
 <TASK>
 mlx5_ib_destroy_srq+0x25/0xa0 [mlx5_ib]
 ib_destroy_srq_user+0x21/0x90 [ib_core]
 uverbs_free_srq+0x1b/0x50 [ib_uverbs]
 destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs]
 uverbs_destroy_uobject+0x35/0x180 [ib_uverbs]
 __uverbs_cleanup_ufile+0xdd/0x140 [ib_uverbs]
 uverbs_destroy_ufile_hw+0x38/0xf0 [ib_uverbs]
 ib_uverbs_close+0x17/0xa0 [ib_uverbs]
 __fput+0xe0/0x2a0
 __x64_sys_close+0x3a/0x80
 do_syscall_64+0x55/0xac0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa247984ea4
Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d a5 51 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d
RSP: 002b:00007ffecfa79498 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000200000000080 RCX: 00007fa247984ea4
RDX: 0000000000000040 RSI: 0000200000000200 RDI: 0000000000000003
RBP: 00007ffecfa794e0 R08: 00007ffecfa794e0 R09: 00007ffecfa794e0
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000000 R14: 0000200000000000 R15: 0000200000000009
 </TASK>
 ---[ end trace 0000000000000000 ]---

Fixes: fd89099d635e ("RDMA/mlx5: Issue FW command to destroy SRQ on reentry")
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
---
 drivers/infiniband/hw/mlx5/srq_cmd.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
index 8b338539659933aef94a3e2c056e9400c3fb9bb0..c1a088120915c5741f37ed44fd2e8139bcb6802e 100644
--- a/drivers/infiniband/hw/mlx5/srq_cmd.c
+++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
@@ -683,7 +683,14 @@ int mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
 		xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, srq, 0);
 		return err;
 	}
-	xa_erase_irq(&table->array, srq->srqn);
+
+	/*
+	 * A race can occur where a concurrent create gets the same srqn
+	 * (after hardware released it) and overwrites XA_ZERO_ENTRY with
+	 * its new SRQ before we reach here. In that case, we must not erase
+	 * the entry as it now belongs to the new SRQ.
+	 */
+	xa_cmpxchg_irq(&table->array, srq->srqn, XA_ZERO_ENTRY, NULL, 0);
 
 	mlx5_core_res_put(&srq->common);
 	wait_for_completion(&srq->common.free);

-- 
2.49.0


  parent reply	other threads:[~2026-03-25 19:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25 19:00 [PATCH rdma-next 00/10] RDMA: Stability and race condition fixes Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 01/10] RDMA/mlx5: Remove DCT restrack tracking Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 02/10] RDMA/core: Preserve restrack resource ID on reinsertion Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 03/10] RDMA/core: Fix use after free in ib_query_qp() Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 04/10] RDMA/core: Fix potential use after free in ib_destroy_cq_user() Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 05/10] RDMA/core: Fix potential use after free in ib_destroy_srq_user() Edward Srouji
2026-03-25 19:00 ` Edward Srouji [this message]
2026-03-25 19:00 ` [PATCH rdma-next 07/10] RDMA/mlx5: Fix UAF in DCT destroy due to race with create Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 08/10] IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg() Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 09/10] RDMA/core: Fix rereg_mr use-after-free race Edward Srouji
2026-03-25 19:00 ` [PATCH rdma-next 10/10] RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creation Edward Srouji

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260325-security-bug-fixes-v1-6-c8332981ad26@nvidia.com \
    --to=edwards@nvidia.com \
    --cc=cmeiohas@nvidia.com \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=dledford@redhat.com \
    --cc=galpress@amazon.com \
    --cc=jgg@ziepe.ca \
    --cc=larrystevenwise@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=majd@mellanox.com \
    --cc=markb@mellanox.com \
    --cc=markzhang@nvidia.com \
    --cc=matanb@mellanox.com \
    --cc=michaelgur@nvidia.com \
    --cc=netao@nvidia.com \
    --cc=phaddad@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox