* [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors
@ 2026-01-08 11:11 Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji
0 siblings, 2 replies; 11+ messages in thread
From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
Christian König
Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig,
Yishai Hadas, Edward Srouji
This patch series introduces dma-buf export support for RDMA/InfiniBand
devices, enabling userspace applications to export RDMA PCI-backed
memory regions (such as device memory or mlx5 UAR pages) as dma-buf file
descriptors.
This allows PCI device memory to be shared with other kernel subsystems
(e.g., graphics or media) or between userspace processes, via the
standard dma-buf interface, avoiding unnecessary copies and enabling
efficient peer-to-peer (P2P) DMA transfers. See [1] for background on
dma-buf.
As part of this series, we introduce a new uverbs object of type FD for
dma-buf export, along with the corresponding APIs for allocation and
teardown. This object encapsulates all attributes required to export a
dma-buf.
The implementation enforces P2P-only mappings and properly manages
resource lifecycle, including:
- Cleanup during driver removal or RDMA context destruction.
- Revocation via dma_buf_move_notify() when the underlying mmap entries
are removed.
- Refactors common cleanup logic for reuse across FD uobject types.
The infrastructure is generic within uverbs, allowing individual drivers
to easily integrate and supply their vendor-specific implementation.
The mlx5 driver is the first consumer of this new API, providing:
- Initialization of PCI peer-to-peer DMA support.
- mlx5-specific implementations of the mmap_get_pfns and
pgoff_to_mmap_entry device operations required for dma-buf export.
[1] https://docs.kernel.org/driver-api/dma-buf.html
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
Yishai Hadas (2):
RDMA/uverbs: Add DMABUF object type and operations
RDMA/mlx5: Implement DMABUF export ops
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/ib_core_uverbs.c | 19 +++
drivers/infiniband/core/rdma_core.c | 63 ++++----
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs.h | 10 ++
drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++
drivers/infiniband/core/uverbs_uapi.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 72 +++++++++
include/rdma/ib_verbs.h | 9 ++
include/rdma/uverbs_types.h | 1 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
12 files changed, 335 insertions(+), 26 deletions(-)
---
base-commit: 325e3b5431ddd27c5f93156b36838a351e3b2f72
change-id: 20260108-dmabuf-export-0d598058dd1e
Best regards,
--
Edward Srouji <edwards@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji
@ 2026-01-08 11:11 ` Edward Srouji
2026-01-20 18:15 ` Jason Gunthorpe
2026-01-25 14:31 ` Leon Romanovsky
2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji
1 sibling, 2 replies; 11+ messages in thread
From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
Christian König
Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig,
Yishai Hadas, Edward Srouji
From: Yishai Hadas <yishaih@nvidia.com>
Expose DMABUF functionality to userspace through the uverbs interface,
enabling InfiniBand/RDMA devices to export PCI based memory regions
(e.g. device memory) as DMABUF file descriptors. This allows
zero-copy sharing of RDMA memory with other subsystems that support the
dma-buf framework.
A new UVERBS_OBJECT_DMABUF object type and allocation method were
introduced.
During allocation, uverbs invokes the driver to supply the
rdma_user_mmap_entry associated with the given page offset (pgoff).
Based on the returned rdma_user_mmap_entry, uverbs requests the driver
to provide the corresponding physical-memory details as well as the
driver’s PCI provider information.
Using this information, dma_buf_export() is called; if it succeeds,
uobj->object is set to the underlying file pointer returned by the
dma-buf framework.
The file descriptor number follows the standard uverbs allocation flow,
but the file pointer comes from the dma-buf subsystem, including its own
fops and private data.
Because of this, alloc_begin_fd_uobject() must handle cases where
fd_type->fops is NULL, and both alloc_commit_fd_uobject() and
alloc_abort_fd_uobject() must account for whether filp->private_data
exists, since it is only populated after a successful dma_buf_export().
When an mmap entry is removed, uverbs iterates over its associated
DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that
their importers are notified.
The same procedure applies during the disassociate flow; final cleanup
occurs when the application closes the file.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/ib_core_uverbs.c | 19 +++
drivers/infiniband/core/rdma_core.c | 63 ++++----
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs.h | 10 ++
drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++
drivers/infiniband/core/uverbs_uapi.c | 1 +
include/rdma/ib_verbs.h | 9 ++
include/rdma/uverbs_types.h | 1 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
11 files changed, 263 insertions(+), 26 deletions(-)
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index f483e0c12444..a2a7a9d2e0d3 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -33,6 +33,7 @@ ib_umad-y := user_mad.o
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
uverbs_std_types_cq.o \
+ uverbs_std_types_dmabuf.o \
uverbs_std_types_dmah.o \
uverbs_std_types_flow_action.o uverbs_std_types_dm.o \
uverbs_std_types_mr.o uverbs_std_types_counters.o \
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 4e09f6e0995e..416242b9c158 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2765,6 +2765,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, map_mr_sg);
SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
SET_DEVICE_OP(dev_ops, mmap);
+ SET_DEVICE_OP(dev_ops, mmap_get_pfns);
SET_DEVICE_OP(dev_ops, mmap_free);
SET_DEVICE_OP(dev_ops, modify_ah);
SET_DEVICE_OP(dev_ops, modify_cq);
@@ -2775,6 +2776,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, modify_srq);
SET_DEVICE_OP(dev_ops, modify_wq);
SET_DEVICE_OP(dev_ops, peek_cq);
+ SET_DEVICE_OP(dev_ops, pgoff_to_mmap_entry);
SET_DEVICE_OP(dev_ops, pre_destroy_cq);
SET_DEVICE_OP(dev_ops, poll_cq);
SET_DEVICE_OP(dev_ops, port_groups);
diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c
index b51bd7087a88..1ff53b8a0e89 100644
--- a/drivers/infiniband/core/ib_core_uverbs.c
+++ b/drivers/infiniband/core/ib_core_uverbs.c
@@ -5,9 +5,13 @@
* Copyright 2019 Marvell. All rights reserved.
*/
#include <linux/xarray.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-resv.h>
#include "uverbs.h"
#include "core_priv.h"
+MODULE_IMPORT_NS("DMA_BUF");
+
/**
* rdma_umap_priv_init() - Initialize the private data of a vma
*
@@ -229,12 +233,24 @@ EXPORT_SYMBOL(rdma_user_mmap_entry_put);
*/
void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry)
{
+ struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp;
+
if (!entry)
return;
+ mutex_lock(&entry->dmabufs_lock);
xa_lock(&entry->ucontext->mmap_xa);
entry->driver_removed = true;
xa_unlock(&entry->ucontext->mmap_xa);
+ list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) {
+ dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL);
+ list_del(&uverbs_dmabuf->dmabufs_elm);
+ uverbs_dmabuf->revoked = true;
+ dma_buf_move_notify(uverbs_dmabuf->dmabuf);
+ dma_resv_unlock(uverbs_dmabuf->dmabuf->resv);
+ }
+ mutex_unlock(&entry->dmabufs_lock);
+
kref_put(&entry->ref, rdma_user_mmap_entry_free);
}
EXPORT_SYMBOL(rdma_user_mmap_entry_remove);
@@ -274,6 +290,9 @@ int rdma_user_mmap_entry_insert_range(struct ib_ucontext *ucontext,
return -EINVAL;
kref_init(&entry->ref);
+ INIT_LIST_HEAD(&entry->dmabufs);
+ mutex_init(&entry->dmabufs_lock);
+
entry->ucontext = ucontext;
/*
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index 18918f463361..3e0a8b9cd288 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
fd_type =
container_of(obj->type_attrs, struct uverbs_obj_fd_type, type);
- if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release &&
+ if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release &&
fd_type->fops->release != &uverbs_async_event_release)) {
ret = ERR_PTR(-EINVAL);
goto err_fd;
@@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
goto err_fd;
}
- /* Note that uverbs_uobject_fd_release() is called during abort */
- filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
- fd_type->flags);
- if (IS_ERR(filp)) {
- ret = ERR_CAST(filp);
- goto err_getfile;
+ if (fd_type->fops) {
+ /* Note that uverbs_uobject_fd_release() is called during abort */
+ filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
+ fd_type->flags);
+ if (IS_ERR(filp)) {
+ ret = ERR_CAST(filp);
+ goto err_getfile;
+ }
+ uobj->object = filp;
}
- uobj->object = filp;
uobj->id = new_fd;
return uobj;
@@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj)
{
struct file *filp = uobj->object;
- fput(filp);
+ if (filp)
+ fput(filp);
+
put_unused_fd(uobj->id);
}
@@ -628,11 +632,14 @@ static void alloc_commit_fd_uobject(struct ib_uobject *uobj)
/* This shouldn't be used anymore. Use the file object instead */
uobj->id = 0;
- /*
- * NOTE: Once we install the file we loose ownership of our kref on
- * uobj. It will be put by uverbs_uobject_fd_release()
- */
- filp->private_data = uobj;
+ if (!filp->private_data) {
+ /*
+ * NOTE: Once we install the file we loose ownership of our kref on
+ * uobj. It will be put by uverbs_uobject_fd_release()
+ */
+ filp->private_data = uobj;
+ }
+
fd_install(fd, filp);
}
@@ -802,21 +809,10 @@ const struct uverbs_obj_type_class uverbs_idr_class = {
};
EXPORT_SYMBOL(uverbs_idr_class);
-/*
- * Users of UVERBS_TYPE_ALLOC_FD should set this function as the struct
- * file_operations release method.
- */
-int uverbs_uobject_fd_release(struct inode *inode, struct file *filp)
+int uverbs_uobject_release(struct ib_uobject *uobj)
{
struct ib_uverbs_file *ufile;
- struct ib_uobject *uobj;
- /*
- * This can only happen if the fput came from alloc_abort_fd_uobject()
- */
- if (!filp->private_data)
- return 0;
- uobj = filp->private_data;
ufile = uobj->ufile;
if (down_read_trylock(&ufile->hw_destroy_rwsem)) {
@@ -843,6 +839,21 @@ int uverbs_uobject_fd_release(struct inode *inode, struct file *filp)
uverbs_uobject_put(uobj);
return 0;
}
+
+/*
+ * Users of UVERBS_TYPE_ALLOC_FD should set this function as the struct
+ * file_operations release method.
+ */
+int uverbs_uobject_fd_release(struct inode *inode, struct file *filp)
+{
+ /*
+ * This can only happen if the fput came from alloc_abort_fd_uobject()
+ */
+ if (!filp->private_data)
+ return 0;
+
+ return uverbs_uobject_release(filp->private_data);
+}
EXPORT_SYMBOL(uverbs_uobject_fd_release);
/*
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
index a59b087611cb..55f1e3558856 100644
--- a/drivers/infiniband/core/rdma_core.h
+++ b/drivers/infiniband/core/rdma_core.h
@@ -156,6 +156,7 @@ extern const struct uapi_definition uverbs_def_obj_counters[];
extern const struct uapi_definition uverbs_def_obj_cq[];
extern const struct uapi_definition uverbs_def_obj_device[];
extern const struct uapi_definition uverbs_def_obj_dm[];
+extern const struct uapi_definition uverbs_def_obj_dmabuf[];
extern const struct uapi_definition uverbs_def_obj_dmah[];
extern const struct uapi_definition uverbs_def_obj_flow_action[];
extern const struct uapi_definition uverbs_def_obj_intf[];
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 797e2fcc8072..66287e8e7ad7 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file {
struct ib_uverbs_event_queue ev_queue;
};
+struct ib_uverbs_dmabuf_file {
+ struct ib_uobject uobj;
+ struct dma_buf *dmabuf;
+ struct list_head dmabufs_elm;
+ struct rdma_user_mmap_entry *mmap_entry;
+ struct dma_buf_phys_vec phys_vec;
+ struct p2pdma_provider *provider;
+ u8 revoked :1;
+};
+
struct ib_uverbs_event {
union {
struct ib_uverbs_async_event_desc async;
diff --git a/drivers/infiniband/core/uverbs_std_types_dmabuf.c b/drivers/infiniband/core/uverbs_std_types_dmabuf.c
new file mode 100644
index 000000000000..ef5484022e77
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_std_types_dmabuf.c
@@ -0,0 +1,172 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/dma-buf-mapping.h>
+#include <linux/pci-p2pdma.h>
+#include <linux/dma-resv.h>
+#include <rdma/uverbs_std_types.h>
+#include "rdma_core.h"
+#include "uverbs.h"
+
+static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attachment)
+{
+ struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
+
+ if (!attachment->peer2peer)
+ return -EOPNOTSUPP;
+
+ if (priv->revoked)
+ return -ENODEV;
+
+ return 0;
+}
+
+static struct sg_table *
+uverbs_dmabuf_map(struct dma_buf_attachment *attachment,
+ enum dma_data_direction dir)
+{
+ struct ib_uverbs_dmabuf_file *priv = attachment->dmabuf->priv;
+
+ dma_resv_assert_held(priv->dmabuf->resv);
+
+ if (priv->revoked)
+ return ERR_PTR(-ENODEV);
+
+ return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
+ &priv->phys_vec, 1, priv->phys_vec.len,
+ dir);
+}
+
+static void uverbs_dmabuf_unmap(struct dma_buf_attachment *attachment,
+ struct sg_table *sgt,
+ enum dma_data_direction dir)
+{
+ dma_buf_free_sgt(attachment, sgt, dir);
+}
+
+static void uverbs_dmabuf_release(struct dma_buf *dmabuf)
+{
+ struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
+
+ /*
+ * This can only happen if the fput came from alloc_abort_fd_uobject()
+ */
+ if (!priv->uobj.context)
+ return;
+
+ uverbs_uobject_release(&priv->uobj);
+}
+
+static const struct dma_buf_ops uverbs_dmabuf_ops = {
+ .attach = uverbs_dmabuf_attach,
+ .map_dma_buf = uverbs_dmabuf_map,
+ .unmap_dma_buf = uverbs_dmabuf_unmap,
+ .release = uverbs_dmabuf_release,
+};
+
+static int UVERBS_HANDLER(UVERBS_METHOD_DMABUF_ALLOC)(
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_uobject *uobj =
+ uverbs_attr_get(attrs, UVERBS_ATTR_ALLOC_DMABUF_HANDLE)
+ ->obj_attr.uobject;
+ struct ib_uverbs_dmabuf_file *uverbs_dmabuf =
+ container_of(uobj, struct ib_uverbs_dmabuf_file, uobj);
+ struct ib_device *ib_dev = attrs->context->device;
+ struct rdma_user_mmap_entry *mmap_entry;
+ DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+ off_t pg_off;
+ int ret;
+
+ ret = uverbs_get_const(&pg_off, attrs, UVERBS_ATTR_ALLOC_DMABUF_PGOFF);
+ if (ret)
+ return ret;
+
+ mmap_entry = ib_dev->ops.pgoff_to_mmap_entry(attrs->context, pg_off);
+ if (!mmap_entry)
+ return -EINVAL;
+
+ ret = ib_dev->ops.mmap_get_pfns(mmap_entry, &uverbs_dmabuf->phys_vec,
+ &uverbs_dmabuf->provider);
+ if (ret)
+ goto err;
+
+ exp_info.ops = &uverbs_dmabuf_ops;
+ exp_info.size = uverbs_dmabuf->phys_vec.len;
+ exp_info.flags = O_CLOEXEC;
+ exp_info.priv = uverbs_dmabuf;
+
+ uverbs_dmabuf->dmabuf = dma_buf_export(&exp_info);
+ if (IS_ERR(uverbs_dmabuf->dmabuf)) {
+ ret = PTR_ERR(uverbs_dmabuf->dmabuf);
+ goto err;
+ }
+
+ INIT_LIST_HEAD(&uverbs_dmabuf->dmabufs_elm);
+ mutex_lock(&mmap_entry->dmabufs_lock);
+ if (mmap_entry->driver_removed)
+ ret = -EIO;
+ else
+ list_add_tail(&uverbs_dmabuf->dmabufs_elm, &mmap_entry->dmabufs);
+ mutex_unlock(&mmap_entry->dmabufs_lock);
+ if (ret)
+ goto err_revoked;
+
+ uobj->object = uverbs_dmabuf->dmabuf->file;
+ uverbs_dmabuf->mmap_entry = mmap_entry;
+ uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_ALLOC_DMABUF_HANDLE);
+ return 0;
+
+err_revoked:
+ dma_buf_put(uverbs_dmabuf->dmabuf);
+err:
+ rdma_user_mmap_entry_put(mmap_entry);
+ return ret;
+}
+
+DECLARE_UVERBS_NAMED_METHOD(
+ UVERBS_METHOD_DMABUF_ALLOC,
+ UVERBS_ATTR_FD(UVERBS_ATTR_ALLOC_DMABUF_HANDLE,
+ UVERBS_OBJECT_DMABUF,
+ UVERBS_ACCESS_NEW,
+ UA_MANDATORY),
+ UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMABUF_PGOFF,
+ UVERBS_ATTR_TYPE(u64),
+ UA_MANDATORY));
+
+static void uverbs_dmabuf_fd_destroy_uobj(struct ib_uobject *uobj,
+ enum rdma_remove_reason why)
+{
+ struct ib_uverbs_dmabuf_file *uverbs_dmabuf =
+ container_of(uobj, struct ib_uverbs_dmabuf_file, uobj);
+
+ mutex_lock(&uverbs_dmabuf->mmap_entry->dmabufs_lock);
+ dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL);
+ if (!uverbs_dmabuf->revoked) {
+ uverbs_dmabuf->revoked = true;
+ list_del(&uverbs_dmabuf->dmabufs_elm);
+ dma_buf_move_notify(uverbs_dmabuf->dmabuf);
+ }
+ dma_resv_unlock(uverbs_dmabuf->dmabuf->resv);
+ mutex_unlock(&uverbs_dmabuf->mmap_entry->dmabufs_lock);
+
+ /* Matches the get done as part of pgoff_to_mmap_entry() */
+ rdma_user_mmap_entry_put(uverbs_dmabuf->mmap_entry);
+};
+
+DECLARE_UVERBS_NAMED_OBJECT(
+ UVERBS_OBJECT_DMABUF,
+ UVERBS_TYPE_ALLOC_FD(sizeof(struct ib_uverbs_dmabuf_file),
+ uverbs_dmabuf_fd_destroy_uobj,
+ NULL, NULL, O_RDONLY),
+ &UVERBS_METHOD(UVERBS_METHOD_DMABUF_ALLOC));
+
+const struct uapi_definition uverbs_def_obj_dmabuf[] = {
+ UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_DMABUF),
+ UAPI_DEF_OBJ_NEEDS_FN(mmap_get_pfns),
+ UAPI_DEF_OBJ_NEEDS_FN(pgoff_to_mmap_entry),
+ {}
+};
diff --git a/drivers/infiniband/core/uverbs_uapi.c b/drivers/infiniband/core/uverbs_uapi.c
index e00ea63175bd..38d0bbbee796 100644
--- a/drivers/infiniband/core/uverbs_uapi.c
+++ b/drivers/infiniband/core/uverbs_uapi.c
@@ -631,6 +631,7 @@ static const struct uapi_definition uverbs_core_api[] = {
UAPI_DEF_CHAIN(uverbs_def_obj_cq),
UAPI_DEF_CHAIN(uverbs_def_obj_device),
UAPI_DEF_CHAIN(uverbs_def_obj_dm),
+ UAPI_DEF_CHAIN(uverbs_def_obj_dmabuf),
UAPI_DEF_CHAIN(uverbs_def_obj_dmah),
UAPI_DEF_CHAIN(uverbs_def_obj_flow_action),
UAPI_DEF_CHAIN(uverbs_def_obj_intf),
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6c372a37c482..5be67013c8ae 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -43,6 +43,7 @@
#include <uapi/rdma/rdma_user_ioctl.h>
#include <uapi/rdma/ib_user_ioctl_verbs.h>
#include <linux/pci-tph.h>
+#include <linux/dma-buf.h>
#define IB_FW_VERSION_NAME_MAX ETHTOOL_FWVERS_LEN
@@ -2363,6 +2364,9 @@ struct rdma_user_mmap_entry {
unsigned long start_pgoff;
size_t npages;
bool driver_removed;
+ /* protects access to dmabufs */
+ struct mutex dmabufs_lock;
+ struct list_head dmabufs;
};
/* Return the offset (in bytes) the user should pass to libc's mmap() */
@@ -2500,6 +2504,11 @@ struct ib_device_ops {
* Therefore needs to be implemented by the driver in mmap_free.
*/
void (*mmap_free)(struct rdma_user_mmap_entry *entry);
+ int (*mmap_get_pfns)(struct rdma_user_mmap_entry *entry,
+ struct dma_buf_phys_vec *phys_vec,
+ struct p2pdma_provider **provider);
+ struct rdma_user_mmap_entry *(*pgoff_to_mmap_entry)(struct ib_ucontext *ucontext,
+ off_t pg_off);
void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
int (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h
index 26ba919ac245..6a253b7dc5ea 100644
--- a/include/rdma/uverbs_types.h
+++ b/include/rdma/uverbs_types.h
@@ -186,6 +186,7 @@ struct ib_uverbs_file {
extern const struct uverbs_obj_type_class uverbs_idr_class;
extern const struct uverbs_obj_type_class uverbs_fd_class;
int uverbs_uobject_fd_release(struct inode *inode, struct file *filp);
+int uverbs_uobject_release(struct ib_uobject *uobj);
#define UVERBS_BUILD_BUG_ON(cond) (sizeof(char[1 - 2 * !!(cond)]) - \
sizeof(char))
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 35da4026f452..72041c1b0ea5 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -56,6 +56,7 @@ enum uverbs_default_objects {
UVERBS_OBJECT_COUNTERS,
UVERBS_OBJECT_ASYNC_EVENT,
UVERBS_OBJECT_DMAH,
+ UVERBS_OBJECT_DMABUF,
};
enum {
@@ -263,6 +264,15 @@ enum uverbs_methods_dmah {
UVERBS_METHOD_DMAH_FREE,
};
+enum uverbs_attrs_alloc_dmabuf_cmd_attr_ids {
+ UVERBS_ATTR_ALLOC_DMABUF_HANDLE,
+ UVERBS_ATTR_ALLOC_DMABUF_PGOFF,
+};
+
+enum uverbs_methods_dmabuf {
+ UVERBS_METHOD_DMABUF_ALLOC,
+};
+
enum uverbs_attrs_reg_dm_mr_cmd_attr_ids {
UVERBS_ATTR_REG_DM_MR_HANDLE,
UVERBS_ATTR_REG_DM_MR_OFFSET,
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops
2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
@ 2026-01-08 11:11 ` Edward Srouji
2026-01-20 18:18 ` Jason Gunthorpe
1 sibling, 1 reply; 11+ messages in thread
From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
Christian König
Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig,
Yishai Hadas, Edward Srouji
From: Yishai Hadas <yishaih@nvidia.com>
Enable p2pdma on the mlx5 PCI device to allow DMABUF-based peer-to-peer
DMA mappings.
Add implementation of the mmap_get_pfns and pgoff_to_mmap_entry device
operations required for DMABUF support in the mlx5 RDMA driver.
The pgoff_to_mmap_entry operation converts a page offset to the
corresponding rdma_user_mmap_entry by extracting the command and index
from the offset and looking it up in the ucontext's mmap_xa.
The mmap_get_pfns operation retrieves the physical address and length
from the mmap entry and obtains the p2pdma provider for the underlying
PCI device, which is needed for peer-to-peer DMA operations with
DMABUFs.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 72 +++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e81080622283..f97c86c96d83 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2446,6 +2446,70 @@ static int mlx5_ib_mmap_clock_info_page(struct mlx5_ib_dev *dev,
virt_to_page(dev->mdev->clock_info));
}
+static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa)
+{
+ resource_size_t start, end;
+ int bar;
+
+ for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
+ /* Skip BARs not present or not memory-mapped */
+ if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
+ continue;
+
+ start = pci_resource_start(pdev, bar);
+ end = pci_resource_end(pdev, bar);
+
+ if (!start || !end)
+ continue;
+
+ if (pa >= start && pa <= end)
+ return bar;
+ }
+
+ return -1;
+}
+
+static int mlx5_ib_mmap_get_pfns(struct rdma_user_mmap_entry *entry,
+ struct dma_buf_phys_vec *phys_vec,
+ struct p2pdma_provider **provider)
+{
+ struct mlx5_user_mmap_entry *mentry = to_mmmap(entry);
+ struct pci_dev *pdev = to_mdev(entry->ucontext->device)->mdev->pdev;
+ int bar;
+
+ phys_vec->paddr = mentry->address;
+ phys_vec->len = entry->npages * PAGE_SIZE;
+
+ bar = phys_addr_to_bar(pdev, phys_vec->paddr);
+ if (bar < 0)
+ return -EINVAL;
+
+ *provider = pcim_p2pdma_provider(pdev, bar);
+ /* If the kernel was not compiled with CONFIG_PCI_P2PDMA the
+ * functionality is not supported.
+ */
+ if (!*provider)
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
+static struct rdma_user_mmap_entry *
+mlx5_ib_pgoff_to_mmap_entry(struct ib_ucontext *ucontext, off_t pg_off)
+{
+ unsigned long entry_pgoff;
+ unsigned long idx;
+ u8 command;
+
+ pg_off = pg_off >> PAGE_SHIFT;
+ command = get_command(pg_off);
+ idx = get_extended_index(pg_off);
+
+ entry_pgoff = command << 16 | idx;
+
+ return rdma_user_mmap_entry_get_pgoff(ucontext, entry_pgoff);
+}
+
static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry)
{
struct mlx5_user_mmap_entry *mentry = to_mmmap(entry);
@@ -4360,7 +4424,13 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev)
if (err)
goto err_mp;
+ err = pcim_p2pdma_init(mdev->pdev);
+ if (err && err != -EOPNOTSUPP)
+ goto err_dd;
+
return 0;
+err_dd:
+ mlx5_ib_data_direct_cleanup(dev);
err_mp:
mlx5_ib_cleanup_multiport_master(dev);
err:
@@ -4412,11 +4482,13 @@ static const struct ib_device_ops mlx5_ib_dev_ops = {
.map_mr_sg_pi = mlx5_ib_map_mr_sg_pi,
.mmap = mlx5_ib_mmap,
.mmap_free = mlx5_ib_mmap_free,
+ .mmap_get_pfns = mlx5_ib_mmap_get_pfns,
.modify_cq = mlx5_ib_modify_cq,
.modify_device = mlx5_ib_modify_device,
.modify_port = mlx5_ib_modify_port,
.modify_qp = mlx5_ib_modify_qp,
.modify_srq = mlx5_ib_modify_srq,
+ .pgoff_to_mmap_entry = mlx5_ib_pgoff_to_mmap_entry,
.pre_destroy_cq = mlx5_ib_pre_destroy_cq,
.poll_cq = mlx5_ib_poll_cq,
.post_destroy_cq = mlx5_ib_post_destroy_cq,
--
2.49.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
@ 2026-01-20 18:15 ` Jason Gunthorpe
2026-01-21 8:32 ` Leon Romanovsky
2026-01-21 10:07 ` Yishai Hadas
2026-01-25 14:31 ` Leon Romanovsky
1 sibling, 2 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2026-01-20 18:15 UTC (permalink / raw)
To: Edward Srouji
Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas
On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote:
> void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry)
> {
> + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp;
> +
> if (!entry)
> return;
>
> + mutex_lock(&entry->dmabufs_lock);
> xa_lock(&entry->ucontext->mmap_xa);
> entry->driver_removed = true;
> xa_unlock(&entry->ucontext->mmap_xa);
> + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) {
> + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL);
> + list_del(&uverbs_dmabuf->dmabufs_elm);
> + uverbs_dmabuf->revoked = true;
> + dma_buf_move_notify(uverbs_dmabuf->dmabuf);
> + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv);
This will need the same wait that Christian pointed out for VFIO..
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index 18918f463361..3e0a8b9cd288 100644
> --- a/drivers/infiniband/core/rdma_core.c
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
>
> fd_type =
> container_of(obj->type_attrs, struct uverbs_obj_fd_type, type);
> - if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release &&
> + if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release &&
> fd_type->fops->release != &uverbs_async_event_release)) {
> ret = ERR_PTR(-EINVAL);
> goto err_fd;
> @@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
> goto err_fd;
> }
>
> - /* Note that uverbs_uobject_fd_release() is called during abort */
> - filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
> - fd_type->flags);
> - if (IS_ERR(filp)) {
> - ret = ERR_CAST(filp);
> - goto err_getfile;
> + if (fd_type->fops) {
> + /* Note that uverbs_uobject_fd_release() is called during abort */
> + filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
> + fd_type->flags);
> + if (IS_ERR(filp)) {
> + ret = ERR_CAST(filp);
> + goto err_getfile;
> + }
> + uobj->object = filp;
> }
> - uobj->object = filp;
>
> uobj->id = new_fd;
> return uobj;
> @@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj)
> {
> struct file *filp = uobj->object;
>
> - fput(filp);
> + if (filp)
> + fput(filp);
> +
> put_unused_fd(uobj->id);
This stuff changing hw the uobjects work should probably be in its own
patch with its own explanation about creating a uobject that wrappers
an externally allocated file descriptor vs this automatic internal
allocation.
> index 797e2fcc8072..66287e8e7ad7 100644
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file {
> struct ib_uverbs_event_queue ev_queue;
> };
>
> +struct ib_uverbs_dmabuf_file {
> + struct ib_uobject uobj;
> + struct dma_buf *dmabuf;
> + struct list_head dmabufs_elm;
> + struct rdma_user_mmap_entry *mmap_entry;
> + struct dma_buf_phys_vec phys_vec;
Oh, are we going to have weird merge conflicts with this Leon?
> +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
> + struct dma_buf_attachment *attachment)
> +{
> + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
> +
> + if (!attachment->peer2peer)
> + return -EOPNOTSUPP;
> +
> + if (priv->revoked)
> + return -ENODEV;
This should only be checked in map
This should also eventually call the new revoke testing function Leon
is adding
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops
2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji
@ 2026-01-20 18:18 ` Jason Gunthorpe
2026-01-21 10:35 ` Yishai Hadas
0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-01-20 18:18 UTC (permalink / raw)
To: Edward Srouji
Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas
On Thu, Jan 08, 2026 at 01:11:15PM +0200, Edward Srouji wrote:
> +static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa)
> +{
> + resource_size_t start, end;
> + int bar;
> +
> + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
> + /* Skip BARs not present or not memory-mapped */
> + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
> + continue;
> +
> + start = pci_resource_start(pdev, bar);
> + end = pci_resource_end(pdev, bar);
> +
> + if (!start || !end)
> + continue;
> +
> + if (pa >= start && pa <= end)
> + return bar;
> + }
Don't we know which of the two BARs the mmap entry came from based on
its type? This seems like overkill..
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-20 18:15 ` Jason Gunthorpe
@ 2026-01-21 8:32 ` Leon Romanovsky
2026-01-21 13:56 ` Jason Gunthorpe
2026-01-21 10:07 ` Yishai Hadas
1 sibling, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2026-01-21 8:32 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas
On Tue, Jan 20, 2026 at 02:15:20PM -0400, Jason Gunthorpe wrote:
> On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote:
> > void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry)
> > {
> > + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp;
> > +
> > if (!entry)
> > return;
> >
> > + mutex_lock(&entry->dmabufs_lock);
> > xa_lock(&entry->ucontext->mmap_xa);
> > entry->driver_removed = true;
> > xa_unlock(&entry->ucontext->mmap_xa);
> > + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) {
> > + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL);
> > + list_del(&uverbs_dmabuf->dmabufs_elm);
> > + uverbs_dmabuf->revoked = true;
> > + dma_buf_move_notify(uverbs_dmabuf->dmabuf);
> > + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv);
>
> This will need the same wait that Christian pointed out for VFIO..
Yes, something like this is missing
https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-6-b7e0b07b8214@nvidia.com/
<...>
> > +struct ib_uverbs_dmabuf_file {
> > + struct ib_uobject uobj;
> > + struct dma_buf *dmabuf;
> > + struct list_head dmabufs_elm;
> > + struct rdma_user_mmap_entry *mmap_entry;
> > + struct dma_buf_phys_vec phys_vec;
>
> Oh, are we going to have weird merge conflicts with this Leon?
No, Alex created a shared branch with the rename already applied for me.
I had planned to merge it into the RDMA tree before taking this series, and
then update dma_buf_phys_vec to phys_vec locally.
>
> > +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
> > + struct dma_buf_attachment *attachment)
> > +{
> > + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
> > +
> > + if (!attachment->peer2peer)
> > + return -EOPNOTSUPP;
> > +
> > + if (priv->revoked)
> > + return -ENODEV;
>
> This should only be checked in map
I disagree with word "only", the more accurate word is "too". There is
no need to allow new importer attach if this exporter is marked as
revoked.
>
> This should also eventually call the new revoke testing function Leon
> is adding
We will add it once my series will be accepted.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-20 18:15 ` Jason Gunthorpe
2026-01-21 8:32 ` Leon Romanovsky
@ 2026-01-21 10:07 ` Yishai Hadas
1 sibling, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-01-21 10:07 UTC (permalink / raw)
To: Jason Gunthorpe, Edward Srouji
Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig
On 20/01/2026 20:15, Jason Gunthorpe wrote:
> On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote:
>> void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry)
>> {
>> + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp;
>> +
>> if (!entry)
>> return;
>>
>> + mutex_lock(&entry->dmabufs_lock);
>> xa_lock(&entry->ucontext->mmap_xa);
>> entry->driver_removed = true;
>> xa_unlock(&entry->ucontext->mmap_xa);
>> + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) {
>> + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL);
>> + list_del(&uverbs_dmabuf->dmabufs_elm);
>> + uverbs_dmabuf->revoked = true;
>> + dma_buf_move_notify(uverbs_dmabuf->dmabuf);
>> + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv);
>
> This will need the same wait that Christian pointed out for VFIO..
Sure, I'll add.
>
>
>> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
>> index 18918f463361..3e0a8b9cd288 100644
>> --- a/drivers/infiniband/core/rdma_core.c
>> +++ b/drivers/infiniband/core/rdma_core.c
>> @@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
>>
>> fd_type =
>> container_of(obj->type_attrs, struct uverbs_obj_fd_type, type);
>> - if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release &&
>> + if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release &&
>> fd_type->fops->release != &uverbs_async_event_release)) {
>> ret = ERR_PTR(-EINVAL);
>> goto err_fd;
>> @@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj,
>> goto err_fd;
>> }
>>
>> - /* Note that uverbs_uobject_fd_release() is called during abort */
>> - filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
>> - fd_type->flags);
>> - if (IS_ERR(filp)) {
>> - ret = ERR_CAST(filp);
>> - goto err_getfile;
>> + if (fd_type->fops) {
>> + /* Note that uverbs_uobject_fd_release() is called during abort */
>> + filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL,
>> + fd_type->flags);
>> + if (IS_ERR(filp)) {
>> + ret = ERR_CAST(filp);
>> + goto err_getfile;
>> + }
>> + uobj->object = filp;
>> }
>> - uobj->object = filp;
>>
>> uobj->id = new_fd;
>> return uobj;
>> @@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj)
>> {
>> struct file *filp = uobj->object;
>>
>> - fput(filp);
>> + if (filp)
>> + fput(filp);
>> +
>> put_unused_fd(uobj->id);
>
> This stuff changing hw the uobjects work should probably be in its own
> patch with its own explanation about creating a uobject that wrappers
> an externally allocated file descriptor vs this automatic internal
> allocation.
Sure, I’ll split the current patch into two patches.
>
>> index 797e2fcc8072..66287e8e7ad7 100644
>> --- a/drivers/infiniband/core/uverbs.h
>> +++ b/drivers/infiniband/core/uverbs.h
>> @@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file {
>> struct ib_uverbs_event_queue ev_queue;
>> };
>>
>> +struct ib_uverbs_dmabuf_file {
>> + struct ib_uobject uobj;
>> + struct dma_buf *dmabuf;
>> + struct list_head dmabufs_elm;
>> + struct rdma_user_mmap_entry *mmap_entry;
>> + struct dma_buf_phys_vec phys_vec;
>
> Oh, are we going to have weird merge conflicts with this Leon?
>
>> +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
>> + struct dma_buf_attachment *attachment)
>> +{
>> + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
>> +
>> + if (!attachment->peer2peer)
>> + return -EOPNOTSUPP;
>> +
>> + if (priv->revoked)
>> + return -ENODEV;
>
> This should only be checked in map
Please see Leon's answer on that.
Yishai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops
2026-01-20 18:18 ` Jason Gunthorpe
@ 2026-01-21 10:35 ` Yishai Hadas
0 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-01-21 10:35 UTC (permalink / raw)
To: Jason Gunthorpe, Edward Srouji
Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel
On 20/01/2026 20:18, Jason Gunthorpe wrote:
> On Thu, Jan 08, 2026 at 01:11:15PM +0200, Edward Srouji wrote:
>> +static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa)
>> +{
>> + resource_size_t start, end;
>> + int bar;
>> +
>> + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) {
>> + /* Skip BARs not present or not memory-mapped */
>> + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM))
>> + continue;
>> +
>> + start = pci_resource_start(pdev, bar);
>> + end = pci_resource_end(pdev, bar);
>> +
>> + if (!start || !end)
>> + continue;
>> +
>> + if (pa >= start && pa <= end)
>> + return bar;
>> + }
>
> Don't we know which of the two BARs the mmap entry came from based on
> its type? This seems like overkill..
>
Actually no.
Currently, a given type can reside on different BARs based on function
type (i.e. PF/SF).
As we don't have any cap/knowledge for the above mapping, we would
prefer the above code which finds the correct bar (for now 0 or 2)
dynamically.
Yishai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-21 8:32 ` Leon Romanovsky
@ 2026-01-21 13:56 ` Jason Gunthorpe
2026-01-21 16:27 ` Yishai Hadas
0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2026-01-21 13:56 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas
On Wed, Jan 21, 2026 at 10:32:46AM +0200, Leon Romanovsky wrote:
> > > +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
> > > + struct dma_buf_attachment *attachment)
> > > +{
> > > + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
> > > +
> > > + if (!attachment->peer2peer)
> > > + return -EOPNOTSUPP;
> > > +
> > > + if (priv->revoked)
> > > + return -ENODEV;
> >
> > This should only be checked in map
>
> I disagree with word "only", the more accurate word is "too". There is
> no need to allow new importer attach if this exporter is marked as
> revoked.
It must check during map, during attach as well is redundant and a bit
confusing.
> > This should also eventually call the new revoke testing function Leon
> > is adding
>
> We will add it once my series will be accepted.
It should also refuse pinned importers with an always fail pin op
until we get that done. This is a case like VFIO where the lifecycle
is more general and I don't want to accidently allow things that
shouldn't work.
Jason
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-21 13:56 ` Jason Gunthorpe
@ 2026-01-21 16:27 ` Yishai Hadas
0 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-01-21 16:27 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig
On 21/01/2026 15:56, Jason Gunthorpe wrote:
> On Wed, Jan 21, 2026 at 10:32:46AM +0200, Leon Romanovsky wrote:
>>>> +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf,
>>>> + struct dma_buf_attachment *attachment)
>>>> +{
>>>> + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv;
>>>> +
>>>> + if (!attachment->peer2peer)
>>>> + return -EOPNOTSUPP;
>>>> +
>>>> + if (priv->revoked)
>>>> + return -ENODEV;
>>>
>>> This should only be checked in map
>>
>> I disagree with word "only", the more accurate word is "too". There is
>> no need to allow new importer attach if this exporter is marked as
>> revoked.
>
> It must check during map, during attach as well is redundant and a bit
> confusing.
>
OK, let's drop this check as part of the 'attach'.
>>> This should also eventually call the new revoke testing function Leon
>>> is adding
>>
>> We will add it once my series will be accepted.
>
> It should also refuse pinned importers with an always fail pin op
> until we get that done. This is a case like VFIO where the lifecycle
> is more general and I don't want to accidently allow things that
> shouldn't work.
>
Sure, will be part of V1.
Thanks,
Yishai
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
2026-01-20 18:15 ` Jason Gunthorpe
@ 2026-01-25 14:31 ` Leon Romanovsky
1 sibling, 0 replies; 11+ messages in thread
From: Leon Romanovsky @ 2026-01-25 14:31 UTC (permalink / raw)
To: Edward Srouji
Cc: Jason Gunthorpe, Sumit Semwal, Christian König, linux-kernel,
linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas
On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote:
> From: Yishai Hadas <yishaih@nvidia.com>
>
> Expose DMABUF functionality to userspace through the uverbs interface,
> enabling InfiniBand/RDMA devices to export PCI based memory regions
> (e.g. device memory) as DMABUF file descriptors. This allows
> zero-copy sharing of RDMA memory with other subsystems that support the
> dma-buf framework.
>
> A new UVERBS_OBJECT_DMABUF object type and allocation method were
> introduced.
>
> During allocation, uverbs invokes the driver to supply the
> rdma_user_mmap_entry associated with the given page offset (pgoff).
>
> Based on the returned rdma_user_mmap_entry, uverbs requests the driver
> to provide the corresponding physical-memory details as well as the
> driver’s PCI provider information.
>
> Using this information, dma_buf_export() is called; if it succeeds,
> uobj->object is set to the underlying file pointer returned by the
> dma-buf framework.
>
> The file descriptor number follows the standard uverbs allocation flow,
> but the file pointer comes from the dma-buf subsystem, including its own
> fops and private data.
>
> Because of this, alloc_begin_fd_uobject() must handle cases where
> fd_type->fops is NULL, and both alloc_commit_fd_uobject() and
> alloc_abort_fd_uobject() must account for whether filp->private_data
> exists, since it is only populated after a successful dma_buf_export().
>
> When an mmap entry is removed, uverbs iterates over its associated
> DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that
> their importers are notified.
>
> The same procedure applies during the disassociate flow; final cleanup
> occurs when the application closes the file.
>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Edward Srouji <edwards@nvidia.com>
> ---
> drivers/infiniband/core/Makefile | 1 +
> drivers/infiniband/core/device.c | 2 +
> drivers/infiniband/core/ib_core_uverbs.c | 19 +++
> drivers/infiniband/core/rdma_core.c | 63 ++++----
> drivers/infiniband/core/rdma_core.h | 1 +
> drivers/infiniband/core/uverbs.h | 10 ++
> drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++
> drivers/infiniband/core/uverbs_uapi.c | 1 +
> include/rdma/ib_verbs.h | 9 ++
> include/rdma/uverbs_types.h | 1 +
> include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
> 11 files changed, 263 insertions(+), 26 deletions(-)
<...>
> +static struct sg_table *
> +uverbs_dmabuf_map(struct dma_buf_attachment *attachment,
> + enum dma_data_direction dir)
> +{
> + struct ib_uverbs_dmabuf_file *priv = attachment->dmabuf->priv;
> +
> + dma_resv_assert_held(priv->dmabuf->resv);
> +
> + if (priv->revoked)
> + return ERR_PTR(-ENODEV);
> +
> + return dma_buf_phys_vec_to_sgt(attachment, priv->provider,
> + &priv->phys_vec, 1, priv->phys_vec.len,
> + dir);
> +}
> +
> +static void uverbs_dmabuf_unmap(struct dma_buf_attachment *attachment,
> + struct sg_table *sgt,
> + enum dma_data_direction dir)
> +{
> + dma_buf_free_sgt(attachment, sgt, dir);
> +}
Unfortunately, it is not enough. Exporters should count their
map<->unmap calls and make sure that they are equal.
See this VFIO change https://lore.kernel.org/kvm/20260124-dmabuf-revoke-v5-4-f98fca917e96@nvidia.com/
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-01-25 14:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
2026-01-20 18:15 ` Jason Gunthorpe
2026-01-21 8:32 ` Leon Romanovsky
2026-01-21 13:56 ` Jason Gunthorpe
2026-01-21 16:27 ` Yishai Hadas
2026-01-21 10:07 ` Yishai Hadas
2026-01-25 14:31 ` Leon Romanovsky
2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji
2026-01-20 18:18 ` Jason Gunthorpe
2026-01-21 10:35 ` Yishai Hadas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox