From: Jason Gunthorpe <jgg@nvidia.com>
To: Edward Srouji <edwards@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
Chiara Meiohas <cmeiohas@nvidia.com>,
Maor Gottlieb <maorg@mellanox.com>,
Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
Gal Pressman <galpress@amazon.com>,
Steve Wise <larrystevenwise@gmail.com>,
Mark Bloch <markb@mellanox.com>,
Mark Zhang <markzhang@nvidia.com>,
Neta Ostrovsky <netao@nvidia.com>,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
Patrisious Haddad <phaddad@nvidia.com>,
Michael Guralnik <michaelgur@nvidia.com>
Subject: Re: [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy
Date: Thu, 11 Jun 2026 16:11:04 -0300 [thread overview]
Message-ID: <20260611191104.GA1501742@nvidia.com> (raw)
In-Reply-To: <20260607-restrack-uaf-fix-v1-0-d72e45eb76c2@nvidia.com>
On Sun, Jun 07, 2026 at 09:18:07PM +0300, Edward Srouji wrote:
> The resource-tracking (restrack) database is the back-end for the netlink
> "rdma resource show" interface which pins objects with
> rdma_restrack_get().
> The QP/CQ/SRQ destroy flows call rdma_restrack_del() at the end of
> ib_destroy_*_user(), after device->ops.destroy_*() had already freed the
> vendor object. Therefore, a concurrent netlink dump could look the
> object up and touch freed memory, causing a use-after-free via
> ib_query_qp() for instance.
>
> Fix this by splitting the delete into a begin/commit/abort sequence:
> begin_del() parks the entry as XA_ZERO_ENTRY (so lookups return NULL),
> drops the birth reference and waits for in-flight readers to drain,
> while keeping the index reserved. The destroy paths run begin_del()
> first, then commit_del() on success or abort_del() on error.
> abort_del() re-inserts into the reserved slot, so it needs no allocation
> and cannot fail.
>
> The first two patches remove DCT and raw RSS QP restrack tracking as
> they have never worked (their ID is unset/reserved at create time).
>
> Signed-off-by: Edward Srouji <edwards@nvidia.com>
> ---
> Patrisious Haddad (6):
> RDMA/mlx5: Remove DCT restrack tracking
> RDMA/mlx5: Remove raw RSS QP restrack tracking
> RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations
> RDMA/core: Fix use after free in ib_query_qp()
> RDMA/core: Fix potential use after free in ib_destroy_cq_user()
> RDMA/core: Fix potential use after free in ib_destroy_srq_user()
The pre-existing sashiko issues look real too, can you fix them also:
https://sashiko.dev/#/patchset/20260607-restrack-uaf-fix-v1-0-d72e45eb76c2%40nvidia.com
The sashiko notes about XA_ZERO_ENTRY seems to be really obviously
wrong:
void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
void *old, void *entry, gfp_t gfp)
{
return xa_zero_to_null(__xa_cmpxchg_raw(xa, index, old, entry, gfp));
}
EXPORT_SYMBOL(__xa_cmpxchg);
This looks legit:
For instance, in drivers/infiniband/core/cq.c:ib_free_cq():
ret = cq->device->ops.destroy_cq(cq, NULL);
WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail");
rdma_restrack_del(&cq->res);
and so on
Please send a series switching more/all places to commit/abort,
probably there should be very few/no calls to a naked del left.
This doesn't apply on top of the restrack_sync addition, please rebase
it.
You should probably be refactoring rdma_restrack_sync() and using its
parts in this implementation since it does the same things.
I don't think this should NULL the task on abort either, it doesn't
seem necessary.
Jason
next prev parent reply other threads:[~2026-06-11 19:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-07 18:18 [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 1/6] RDMA/mlx5: Remove DCT restrack tracking Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 2/6] RDMA/mlx5: Remove raw RSS QP " Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 3/6] RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 4/6] RDMA/core: Fix use after free in ib_query_qp() Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 5/6] RDMA/core: Fix potential use after free in ib_destroy_cq_user() Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 6/6] RDMA/core: Fix potential use after free in ib_destroy_srq_user() Edward Srouji
2026-06-11 19:11 ` Jason Gunthorpe [this message]
2026-06-12 8:53 ` [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy Patrisious Haddad
2026-06-12 11:52 ` Jason Gunthorpe
2026-06-11 19:14 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260611191104.GA1501742@nvidia.com \
--to=jgg@nvidia.com \
--cc=cmeiohas@nvidia.com \
--cc=dennis.dalessandro@cornelisnetworks.com \
--cc=edwards@nvidia.com \
--cc=galpress@amazon.com \
--cc=larrystevenwise@gmail.com \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@mellanox.com \
--cc=markb@mellanox.com \
--cc=markzhang@nvidia.com \
--cc=michaelgur@nvidia.com \
--cc=netao@nvidia.com \
--cc=phaddad@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.