* [RFC] RDMA/irdma: Add support for revocable dmabuf import
@ 2026-02-17 18:21 Jacob Moroni
2026-02-17 18:45 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Jacob Moroni @ 2026-02-17 18:21 UTC (permalink / raw)
To: tatyana.e.nikolova, krzysztof.czurylo, jgg, leon; +Cc: linux-rdma, Jacob Moroni
In order to import a dmabuf from VFIO, the importer must support
revocation. This is achieved by providing a move_notify callback
that will cause the region to be invalidated in hardware prior to
calling ib_umem_dmabuf_unmap_pages. The mkey and data structures
are not freed until the user explicitly deregisters the region,
but the HW will no longer access the memory (any attempt would
result in an AE for the QP just like a normal dereg mr).
Tested with VFIO by triggering a VFIO_DEVICE_RESET while the
region is registered to ensure the callback is executed.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
---
drivers/infiniband/hw/irdma/verbs.c | 78 ++++++++++++++++++++++++++---
1 file changed, 71 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/hw/irdma/verbs.c b/drivers/infiniband/hw/irdma/verbs.c
index cf8d19150..157e1413d 100644
--- a/drivers/infiniband/hw/irdma/verbs.c
+++ b/drivers/infiniband/hw/irdma/verbs.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2015 - 2021 Intel Corporation */
#include "main.h"
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
/**
* irdma_query_device - get device attributes
@@ -3590,6 +3592,44 @@ static struct ib_mr *irdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 len,
return ERR_PTR(err);
}
+static int irdma_hwdereg_mr(struct ib_mr *ib_mr);
+
+static void irdma_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
+{
+ struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
+ struct irdma_mr *iwmr = umem_dmabuf->private;
+ int err;
+
+ dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
+
+ if (!iwmr)
+ return;
+
+ /* Invalidate the region in hardware, but do not release the key yet.
+ * This will either invalidate the region or issue a reset. Either way,
+ * the HW will no longer touch the region after. If successful, the
+ * region is marked as invalidated so that the real dereg MR later ends
+ * up skipping the HW request.
+ */
+ err = irdma_hwdereg_mr(&iwmr->ibmr);
+ if (err) {
+ struct irdma_device *iwdev = to_iwdev(iwmr->ibmr.device);
+
+ ibdev_err(&iwdev->ibdev, "dmabuf mr invalidate failed %d", err);
+ if (!iwdev->rf->reset) {
+ iwdev->rf->reset = true;
+ iwdev->rf->gen_ops.request_reset(iwdev->rf);
+ }
+ }
+
+ ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+}
+
+static struct dma_buf_attach_ops irdma_dmabuf_attach_ops = {
+ .allow_peer2peer = 1,
+ .move_notify = irdma_dmabuf_invalidate_cb,
+};
+
static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
u64 len, u64 virt,
int fd, int access,
@@ -3599,7 +3639,7 @@ static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
struct irdma_device *iwdev = to_iwdev(pd->device);
struct ib_umem_dmabuf *umem_dmabuf;
struct irdma_mr *iwmr;
- int err;
+ int err = -1;
if (dmah)
return ERR_PTR(-EOPNOTSUPP);
@@ -3607,31 +3647,43 @@ static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
if (len > iwdev->rf->sc_dev.hw_attrs.max_mr_size)
return ERR_PTR(-EINVAL);
- umem_dmabuf = ib_umem_dmabuf_get_pinned(pd->device, start, len, fd, access);
+ umem_dmabuf = ib_umem_dmabuf_get(pd->device, start, len, fd, access,
+ &irdma_dmabuf_attach_ops);
if (IS_ERR(umem_dmabuf)) {
ibdev_dbg(&iwdev->ibdev, "Failed to get dmabuf umem[%pe]\n",
umem_dmabuf);
return ERR_CAST(umem_dmabuf);
}
+ dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
+
+ err = ib_umem_dmabuf_map_pages(umem_dmabuf);
+ if (err)
+ goto err_map;
+
iwmr = irdma_alloc_iwmr(&umem_dmabuf->umem, pd, virt, IRDMA_MEMREG_TYPE_MEM);
if (IS_ERR(iwmr)) {
err = PTR_ERR(iwmr);
- goto err_release;
+ goto err_alloc;
}
err = irdma_reg_user_mr_type_mem(iwmr, access, true);
if (err)
goto err_iwmr;
+ umem_dmabuf->private = iwmr;
+
+ dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
+
return &iwmr->ibmr;
err_iwmr:
irdma_free_iwmr(iwmr);
-
-err_release:
+err_alloc:
+ ib_umem_dmabuf_unmap_pages(umem_dmabuf);
+err_map:
+ dma_resv_unlock(umem_dmabuf->attach->dmabuf->resv);
ib_umem_release(&umem_dmabuf->umem);
-
return ERR_PTR(err);
}
@@ -3923,7 +3975,19 @@ static int irdma_dereg_mr(struct ib_mr *ib_mr, struct ib_udata *udata)
goto done;
}
- ret = irdma_hwdereg_mr(ib_mr);
+ if (iwmr->region && iwmr->region->is_dmabuf) {
+ struct ib_umem_dmabuf *udb = to_ib_umem_dmabuf(iwmr->region);
+
+ dma_resv_lock(udb->attach->dmabuf->resv, NULL);
+ /* Could have already been invalidated, but it's okay. */
+ ret = irdma_hwdereg_mr(ib_mr);
+ ib_umem_dmabuf_unmap_pages(udb);
+ udb->private = NULL;
+ dma_resv_unlock(udb->attach->dmabuf->resv);
+ } else {
+ ret = irdma_hwdereg_mr(ib_mr);
+ }
+
if (ret)
return ret;
--
2.53.0.310.g728cabbaf7-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC] RDMA/irdma: Add support for revocable dmabuf import
2026-02-17 18:21 [RFC] RDMA/irdma: Add support for revocable dmabuf import Jacob Moroni
@ 2026-02-17 18:45 ` Jason Gunthorpe
2026-02-17 23:08 ` Jacob Moroni
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2026-02-17 18:45 UTC (permalink / raw)
To: Jacob Moroni; +Cc: tatyana.e.nikolova, krzysztof.czurylo, leon, linux-rdma
On Tue, Feb 17, 2026 at 06:21:15PM +0000, Jacob Moroni wrote:
> +static void irdma_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
> +{
> + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> + struct irdma_mr *iwmr = umem_dmabuf->private;
> + int err;
> +
> + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> +
> + if (!iwmr)
> + return;
> +
> + /* Invalidate the region in hardware, but do not release the key yet.
> + * This will either invalidate the region or issue a reset. Either way,
> + * the HW will no longer touch the region after. If successful, the
> + * region is marked as invalidated so that the real dereg MR later ends
> + * up skipping the HW request.
> + */
> + err = irdma_hwdereg_mr(&iwmr->ibmr);
Er this command issues:
cqp_info->cqp_cmd = IRDMA_OP_DEALLOC_STAG;
Really need to explain this better, I forget how iwarp works - but you
can't release the rkey/stag in a way that something else can get it
reallocated.
Generally the way to do this is with the IBA defined reregister MR
verb and change some property of it to do revoke, eg change the PD or
make it 0 length or something like that.
> @@ -3599,7 +3639,7 @@ static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
> struct irdma_device *iwdev = to_iwdev(pd->device);
> struct ib_umem_dmabuf *umem_dmabuf;
> struct irdma_mr *iwmr;
> - int err;
> + int err = -1;
>
> if (dmah)
> return ERR_PTR(-EOPNOTSUPP);
> @@ -3607,31 +3647,43 @@ static struct ib_mr *irdma_reg_user_mr_dmabuf(struct ib_pd *pd, u64 start,
> if (len > iwdev->rf->sc_dev.hw_attrs.max_mr_size)
> return ERR_PTR(-EINVAL);
>
> - umem_dmabuf = ib_umem_dmabuf_get_pinned(pd->device, start, len, fd, access);
> + umem_dmabuf = ib_umem_dmabuf_get(pd->device, start, len, fd, access,
> + &irdma_dmabuf_attach_ops);
> if (IS_ERR(umem_dmabuf)) {
> ibdev_dbg(&iwdev->ibdev, "Failed to get dmabuf umem[%pe]\n",
> umem_dmabuf);
> return ERR_CAST(umem_dmabuf);
> }
>
> + dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
> +
> + err = ib_umem_dmabuf_map_pages(umem_dmabuf);
> + if (err)
> + goto err_map;
> +
> iwmr = irdma_alloc_iwmr(&umem_dmabuf->umem, pd, virt, IRDMA_MEMREG_TYPE_MEM);
> if (IS_ERR(iwmr)) {
> err = PTR_ERR(iwmr);
> - goto err_release;
> + goto err_alloc;
> }
You also need to be careful of races here because it could have been
revoked already. Also notice if this ahppens private is NULL and it
will crash.
Finally, we don't actually support revocable mappings at the core code
level. We either have fully pinned or fully movable, so this is not
right to just change to ib_umem_dmabuf_get(), that assumes the HW is
fault capable.
Probably what you want to do is add a revoke callback to the pinned
importer?
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] RDMA/irdma: Add support for revocable dmabuf import
2026-02-17 18:45 ` Jason Gunthorpe
@ 2026-02-17 23:08 ` Jacob Moroni
2026-02-17 23:21 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Jacob Moroni @ 2026-02-17 23:08 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: tatyana.e.nikolova, krzysztof.czurylo, leon, linux-rdma
Hi,
Thanks for taking a look.
> Really need to explain this better, I forget how iwarp works - but you
> can't release the rkey/stag in a way that something else can get it
> reallocated.
I think the HW command names are a little confusing, but for irdma, the key
allocation is actually handled by the driver. The key can't be reused until
the region is fully deregistered (which calls irdma_free_stag), so a new
registration can't grab the same key even if the dmabuf revocation occurs.
That said, I am testing with the NIC in RoCEv2 mode, but I don't think it
changes the driver behavior in this area.
> Finally, we don't actually support revocable mappings at the core code
> level. We either have fully pinned or fully movable, so this is not
> right to just change to ib_umem_dmabuf_get(), that assumes the HW is
> fault capable.
Ack. It sounds like what I really want is more like ib_umem_dmabuf_get_pinned
but with a functional invalidate_mappings method?
> Probably what you want to do is add a revoke callback to the pinned
> importer?
That does seem ideal. Re-registering it as a 0 length region (will check
the spec) seems like the easiest way to achieve it. Using a special PD
for quarantine purposes should also work, but it would add a little more
state and an object to manage (could we keep it in struct ib_device?).
I was hoping it could be done in a way that doesn't require driver changes
but I don't think it's possible. There's no kernel rereg_mr method,
just rereg_user_mr.
Should I create a new kernel device method for this? If so, then I wonder if
it makes sense to expose it as a generic "invalidate_mr" method and let
the drivers choose now to actually implement it (many can probably just
forward the call to their internal rereg_mr logic).
Thanks,
Jake
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] RDMA/irdma: Add support for revocable dmabuf import
2026-02-17 23:08 ` Jacob Moroni
@ 2026-02-17 23:21 ` Jason Gunthorpe
2026-02-18 9:05 ` Leon Romanovsky
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2026-02-17 23:21 UTC (permalink / raw)
To: Jacob Moroni; +Cc: tatyana.e.nikolova, krzysztof.czurylo, leon, linux-rdma
On Tue, Feb 17, 2026 at 06:08:54PM -0500, Jacob Moroni wrote:
> Hi,
>
> Thanks for taking a look.
>
> > Really need to explain this better, I forget how iwarp works - but you
> > can't release the rkey/stag in a way that something else can get it
> > reallocated.
>
> I think the HW command names are a little confusing, but for irdma, the key
> allocation is actually handled by the driver. The key can't be reused until
> the region is fully deregistered (which calls irdma_free_stag), so a new
> registration can't grab the same key even if the dmabuf revocation occurs.
Hmm, maybe that is OK then. Please explain it though more clearly like
this paragraph
> > Finally, we don't actually support revocable mappings at the core code
> > level. We either have fully pinned or fully movable, so this is not
> > right to just change to ib_umem_dmabuf_get(), that assumes the HW is
> > fault capable.
>
> Ack. It sounds like what I really want is more like ib_umem_dmabuf_get_pinned
> but with a functional invalidate_mappings method?
Yes, method provided by the driver.
> > Probably what you want to do is add a revoke callback to the pinned
> > importer?
>
> That does seem ideal.
Probably, but I mean some argument to the
ib_umem_dmabuf_get_pinned(), not a whole new op for MRs..
> Re-registering it as a 0 length region (will check
> the spec) seems like the easiest way to achieve it. Using a special PD
> for quarantine purposes should also work, but it would add a little more
> state and an object to manage (could we keep it in struct ib_device?).
Yes these are good options, but if they rkey is preserved it is not
strictly necessary to do these things. Pedentically the user should be
able to re-reg that mkey and revive it, but nobody does that and you
don't have to try to implement it.
> Should I create a new kernel device method for this? If so, then I wonder if
> it makes sense to expose it as a generic "invalidate_mr" method and let
> the drivers choose now to actually implement it (many can probably just
> forward the call to their internal rereg_mr logic).
I have on and off thought about doing something like that with rereg
mr as it would be more general, but I think for now just extending the
ib_umem_dmabuf_get_pinned() is reasonable, and avoids the races.
Keep in mind umems are used for more than just MRs, so a global op
gets a bit tricky.
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] RDMA/irdma: Add support for revocable dmabuf import
2026-02-17 23:21 ` Jason Gunthorpe
@ 2026-02-18 9:05 ` Leon Romanovsky
2026-02-18 15:24 ` Jacob Moroni
0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2026-02-18 9:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Jacob Moroni, tatyana.e.nikolova, krzysztof.czurylo, linux-rdma
On Tue, Feb 17, 2026 at 07:21:58PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 17, 2026 at 06:08:54PM -0500, Jacob Moroni wrote:
> > Hi,
> >
> > Thanks for taking a look.
<...>
> > Should I create a new kernel device method for this? If so, then I wonder if
> > it makes sense to expose it as a generic "invalidate_mr" method and let
> > the drivers choose now to actually implement it (many can probably just
> > forward the call to their internal rereg_mr logic).
>
> I have on and off thought about doing something like that with rereg
> mr as it would be more general, but I think for now just extending the
> ib_umem_dmabuf_get_pinned() is reasonable, and avoids the races.
I'm in the camp that, sooner or later, we will need a generic solution in
ib_core to handle this. More and more drivers now support dmabuf in RDMA,
and most of them lack ODP, so they will all need to implement
invalidate_mr at some point.
That said, starting with the simplest reasonable approach and refactoring
later sounds fine.
Thanks
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] RDMA/irdma: Add support for revocable dmabuf import
2026-02-18 9:05 ` Leon Romanovsky
@ 2026-02-18 15:24 ` Jacob Moroni
0 siblings, 0 replies; 6+ messages in thread
From: Jacob Moroni @ 2026-02-18 15:24 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jason Gunthorpe, tatyana.e.nikolova, krzysztof.czurylo,
linux-rdma
Makes sense, thanks.
I'll grab your latest dmabuf changes first (.invalidate_mappings, etc.) and
prepare a new patch soon.
Thanks,
- Jake
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-18 15:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-17 18:21 [RFC] RDMA/irdma: Add support for revocable dmabuf import Jacob Moroni
2026-02-17 18:45 ` Jason Gunthorpe
2026-02-17 23:08 ` Jacob Moroni
2026-02-17 23:21 ` Jason Gunthorpe
2026-02-18 9:05 ` Leon Romanovsky
2026-02-18 15:24 ` Jacob Moroni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox