Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Edward Srouji <edwards@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	Chiara Meiohas <cmeiohas@nvidia.com>,
	Maor Gottlieb <maorg@mellanox.com>,
	Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	Gal Pressman <galpress@amazon.com>,
	Steve Wise <larrystevenwise@gmail.com>,
	Mark Bloch <markb@mellanox.com>,
	Mark Zhang <markzhang@nvidia.com>,
	Neta Ostrovsky <netao@nvidia.com>,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	Patrisious Haddad <phaddad@nvidia.com>,
	Michael Guralnik <michaelgur@nvidia.com>
Subject: Re: [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy
Date: Thu, 11 Jun 2026 16:11:04 -0300	[thread overview]
Message-ID: <20260611191104.GA1501742@nvidia.com> (raw)
In-Reply-To: <20260607-restrack-uaf-fix-v1-0-d72e45eb76c2@nvidia.com>

On Sun, Jun 07, 2026 at 09:18:07PM +0300, Edward Srouji wrote:
> The resource-tracking (restrack) database is the back-end for the netlink
> "rdma resource show" interface which pins objects with
> rdma_restrack_get().
> The QP/CQ/SRQ destroy flows call rdma_restrack_del() at the end of
> ib_destroy_*_user(), after device->ops.destroy_*() had already freed the 
> vendor object. Therefore, a concurrent netlink dump could look the
> object up and touch freed memory, causing a use-after-free via
> ib_query_qp() for instance.
> 
> Fix this by splitting the delete into a begin/commit/abort sequence:
> begin_del() parks the entry as XA_ZERO_ENTRY (so lookups return NULL),
> drops the birth reference and waits for in-flight readers to drain,
> while keeping the index reserved. The destroy paths run begin_del()
> first, then commit_del() on success or abort_del() on error.
> abort_del() re-inserts into the reserved slot, so it needs no allocation
> and cannot fail.
> 
> The first two patches remove DCT and raw RSS QP restrack tracking as
> they have never worked (their ID is unset/reserved at create time).
> 
> Signed-off-by: Edward Srouji <edwards@nvidia.com>
> ---
> Patrisious Haddad (6):
>       RDMA/mlx5: Remove DCT restrack tracking
>       RDMA/mlx5: Remove raw RSS QP restrack tracking
>       RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations
>       RDMA/core: Fix use after free in ib_query_qp()
>       RDMA/core: Fix potential use after free in ib_destroy_cq_user()
>       RDMA/core: Fix potential use after free in ib_destroy_srq_user()

The pre-existing sashiko issues look real too, can you fix them also:

https://sashiko.dev/#/patchset/20260607-restrack-uaf-fix-v1-0-d72e45eb76c2%40nvidia.com

The sashiko notes about XA_ZERO_ENTRY seems to be really obviously
wrong:

void *__xa_cmpxchg(struct xarray *xa, unsigned long index,
			void *old, void *entry, gfp_t gfp)
{
	return xa_zero_to_null(__xa_cmpxchg_raw(xa, index, old, entry, gfp));
}
EXPORT_SYMBOL(__xa_cmpxchg);

This looks legit:

For instance, in drivers/infiniband/core/cq.c:ib_free_cq():
    ret = cq->device->ops.destroy_cq(cq, NULL);
    WARN_ONCE(ret, "Destroy of kernel CQ shouldn't fail");
    rdma_restrack_del(&cq->res);

and so on

Please send a series switching more/all places to commit/abort,
probably there should be very few/no calls to a naked del left.

This doesn't apply on top of the restrack_sync addition, please rebase
it.

You should probably be refactoring rdma_restrack_sync() and using its
parts in this implementation since it does the same things.

I don't think this should NULL the task on abort either, it doesn't
seem necessary.

Jason

  parent reply	other threads:[~2026-06-11 19:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-07 18:18 [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 1/6] RDMA/mlx5: Remove DCT restrack tracking Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 2/6] RDMA/mlx5: Remove raw RSS QP " Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 3/6] RDMA/core: Add rdma_restrack_begin/abort/commit_del() operations Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 4/6] RDMA/core: Fix use after free in ib_query_qp() Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 5/6] RDMA/core: Fix potential use after free in ib_destroy_cq_user() Edward Srouji
2026-06-07 18:18 ` [PATCH rdma-next 6/6] RDMA/core: Fix potential use after free in ib_destroy_srq_user() Edward Srouji
2026-06-11 19:11 ` Jason Gunthorpe [this message]
2026-06-11 19:14 ` [PATCH rdma-next 0/6] RDMA: Fix restrack UAF in QP/CQ/SRQ destroy Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611191104.GA1501742@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=cmeiohas@nvidia.com \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=edwards@nvidia.com \
    --cc=galpress@amazon.com \
    --cc=larrystevenwise@gmail.com \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@mellanox.com \
    --cc=markb@mellanox.com \
    --cc=markzhang@nvidia.com \
    --cc=michaelgur@nvidia.com \
    --cc=netao@nvidia.com \
    --cc=phaddad@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox