* [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors
@ 2026-01-08 11:11 Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji
2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji
0 siblings, 2 replies; 11+ messages in thread
From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal,
Christian König
Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig,
Yishai Hadas, Edward Srouji
This patch series introduces dma-buf export support for RDMA/InfiniBand
devices, enabling userspace applications to export RDMA PCI-backed
memory regions (such as device memory or mlx5 UAR pages) as dma-buf file
descriptors.
This allows PCI device memory to be shared with other kernel subsystems
(e.g., graphics or media) or between userspace processes, via the
standard dma-buf interface, avoiding unnecessary copies and enabling
efficient peer-to-peer (P2P) DMA transfers. See [1] for background on
dma-buf.
As part of this series, we introduce a new uverbs object of type FD for
dma-buf export, along with the corresponding APIs for allocation and
teardown. This object encapsulates all attributes required to export a
dma-buf.
The implementation enforces P2P-only mappings and properly manages
resource lifecycle, including:
- Cleanup during driver removal or RDMA context destruction.
- Revocation via dma_buf_move_notify() when the underlying mmap entries
are removed.
- Refactors common cleanup logic for reuse across FD uobject types.
The infrastructure is generic within uverbs, allowing individual drivers
to easily integrate and supply their vendor-specific implementation.
The mlx5 driver is the first consumer of this new API, providing:
- Initialization of PCI peer-to-peer DMA support.
- mlx5-specific implementations of the mmap_get_pfns and
pgoff_to_mmap_entry device operations required for dma-buf export.
[1] https://docs.kernel.org/driver-api/dma-buf.html
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
---
Yishai Hadas (2):
RDMA/uverbs: Add DMABUF object type and operations
RDMA/mlx5: Implement DMABUF export ops
drivers/infiniband/core/Makefile | 1 +
drivers/infiniband/core/device.c | 2 +
drivers/infiniband/core/ib_core_uverbs.c | 19 +++
drivers/infiniband/core/rdma_core.c | 63 ++++----
drivers/infiniband/core/rdma_core.h | 1 +
drivers/infiniband/core/uverbs.h | 10 ++
drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++
drivers/infiniband/core/uverbs_uapi.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 72 +++++++++
include/rdma/ib_verbs.h | 9 ++
include/rdma/uverbs_types.h | 1 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++
12 files changed, 335 insertions(+), 26 deletions(-)
---
base-commit: 325e3b5431ddd27c5f93156b36838a351e3b2f72
change-id: 20260108-dmabuf-export-0d598058dd1e
Best regards,
--
Edward Srouji <edwards@nvidia.com>
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji @ 2026-01-08 11:11 ` Edward Srouji 2026-01-20 18:15 ` Jason Gunthorpe 2026-01-25 14:31 ` Leon Romanovsky 2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji 1 sibling, 2 replies; 11+ messages in thread From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal, Christian König Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas, Edward Srouji From: Yishai Hadas <yishaih@nvidia.com> Expose DMABUF functionality to userspace through the uverbs interface, enabling InfiniBand/RDMA devices to export PCI based memory regions (e.g. device memory) as DMABUF file descriptors. This allows zero-copy sharing of RDMA memory with other subsystems that support the dma-buf framework. A new UVERBS_OBJECT_DMABUF object type and allocation method were introduced. During allocation, uverbs invokes the driver to supply the rdma_user_mmap_entry associated with the given page offset (pgoff). Based on the returned rdma_user_mmap_entry, uverbs requests the driver to provide the corresponding physical-memory details as well as the driver’s PCI provider information. Using this information, dma_buf_export() is called; if it succeeds, uobj->object is set to the underlying file pointer returned by the dma-buf framework. The file descriptor number follows the standard uverbs allocation flow, but the file pointer comes from the dma-buf subsystem, including its own fops and private data. Because of this, alloc_begin_fd_uobject() must handle cases where fd_type->fops is NULL, and both alloc_commit_fd_uobject() and alloc_abort_fd_uobject() must account for whether filp->private_data exists, since it is only populated after a successful dma_buf_export(). When an mmap entry is removed, uverbs iterates over its associated DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that their importers are notified. The same procedure applies during the disassociate flow; final cleanup occurs when the application closes the file. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> --- drivers/infiniband/core/Makefile | 1 + drivers/infiniband/core/device.c | 2 + drivers/infiniband/core/ib_core_uverbs.c | 19 +++ drivers/infiniband/core/rdma_core.c | 63 ++++---- drivers/infiniband/core/rdma_core.h | 1 + drivers/infiniband/core/uverbs.h | 10 ++ drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++ drivers/infiniband/core/uverbs_uapi.c | 1 + include/rdma/ib_verbs.h | 9 ++ include/rdma/uverbs_types.h | 1 + include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++ 11 files changed, 263 insertions(+), 26 deletions(-) diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index f483e0c12444..a2a7a9d2e0d3 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -33,6 +33,7 @@ ib_umad-y := user_mad.o ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \ rdma_core.o uverbs_std_types.o uverbs_ioctl.o \ uverbs_std_types_cq.o \ + uverbs_std_types_dmabuf.o \ uverbs_std_types_dmah.o \ uverbs_std_types_flow_action.o uverbs_std_types_dm.o \ uverbs_std_types_mr.o uverbs_std_types_counters.o \ diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 4e09f6e0995e..416242b9c158 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -2765,6 +2765,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) SET_DEVICE_OP(dev_ops, map_mr_sg); SET_DEVICE_OP(dev_ops, map_mr_sg_pi); SET_DEVICE_OP(dev_ops, mmap); + SET_DEVICE_OP(dev_ops, mmap_get_pfns); SET_DEVICE_OP(dev_ops, mmap_free); SET_DEVICE_OP(dev_ops, modify_ah); SET_DEVICE_OP(dev_ops, modify_cq); @@ -2775,6 +2776,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops) SET_DEVICE_OP(dev_ops, modify_srq); SET_DEVICE_OP(dev_ops, modify_wq); SET_DEVICE_OP(dev_ops, peek_cq); + SET_DEVICE_OP(dev_ops, pgoff_to_mmap_entry); SET_DEVICE_OP(dev_ops, pre_destroy_cq); SET_DEVICE_OP(dev_ops, poll_cq); SET_DEVICE_OP(dev_ops, port_groups); diff --git a/drivers/infiniband/core/ib_core_uverbs.c b/drivers/infiniband/core/ib_core_uverbs.c index b51bd7087a88..1ff53b8a0e89 100644 --- a/drivers/infiniband/core/ib_core_uverbs.c +++ b/drivers/infiniband/core/ib_core_uverbs.c @@ -5,9 +5,13 @@ * Copyright 2019 Marvell. All rights reserved. */ #include <linux/xarray.h> +#include <linux/dma-buf.h> +#include <linux/dma-resv.h> #include "uverbs.h" #include "core_priv.h" +MODULE_IMPORT_NS("DMA_BUF"); + /** * rdma_umap_priv_init() - Initialize the private data of a vma * @@ -229,12 +233,24 @@ EXPORT_SYMBOL(rdma_user_mmap_entry_put); */ void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry) { + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp; + if (!entry) return; + mutex_lock(&entry->dmabufs_lock); xa_lock(&entry->ucontext->mmap_xa); entry->driver_removed = true; xa_unlock(&entry->ucontext->mmap_xa); + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) { + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL); + list_del(&uverbs_dmabuf->dmabufs_elm); + uverbs_dmabuf->revoked = true; + dma_buf_move_notify(uverbs_dmabuf->dmabuf); + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv); + } + mutex_unlock(&entry->dmabufs_lock); + kref_put(&entry->ref, rdma_user_mmap_entry_free); } EXPORT_SYMBOL(rdma_user_mmap_entry_remove); @@ -274,6 +290,9 @@ int rdma_user_mmap_entry_insert_range(struct ib_ucontext *ucontext, return -EINVAL; kref_init(&entry->ref); + INIT_LIST_HEAD(&entry->dmabufs); + mutex_init(&entry->dmabufs_lock); + entry->ucontext = ucontext; /* diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c index 18918f463361..3e0a8b9cd288 100644 --- a/drivers/infiniband/core/rdma_core.c +++ b/drivers/infiniband/core/rdma_core.c @@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, fd_type = container_of(obj->type_attrs, struct uverbs_obj_fd_type, type); - if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release && + if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release && fd_type->fops->release != &uverbs_async_event_release)) { ret = ERR_PTR(-EINVAL); goto err_fd; @@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, goto err_fd; } - /* Note that uverbs_uobject_fd_release() is called during abort */ - filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, - fd_type->flags); - if (IS_ERR(filp)) { - ret = ERR_CAST(filp); - goto err_getfile; + if (fd_type->fops) { + /* Note that uverbs_uobject_fd_release() is called during abort */ + filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, + fd_type->flags); + if (IS_ERR(filp)) { + ret = ERR_CAST(filp); + goto err_getfile; + } + uobj->object = filp; } - uobj->object = filp; uobj->id = new_fd; return uobj; @@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj) { struct file *filp = uobj->object; - fput(filp); + if (filp) + fput(filp); + put_unused_fd(uobj->id); } @@ -628,11 +632,14 @@ static void alloc_commit_fd_uobject(struct ib_uobject *uobj) /* This shouldn't be used anymore. Use the file object instead */ uobj->id = 0; - /* - * NOTE: Once we install the file we loose ownership of our kref on - * uobj. It will be put by uverbs_uobject_fd_release() - */ - filp->private_data = uobj; + if (!filp->private_data) { + /* + * NOTE: Once we install the file we loose ownership of our kref on + * uobj. It will be put by uverbs_uobject_fd_release() + */ + filp->private_data = uobj; + } + fd_install(fd, filp); } @@ -802,21 +809,10 @@ const struct uverbs_obj_type_class uverbs_idr_class = { }; EXPORT_SYMBOL(uverbs_idr_class); -/* - * Users of UVERBS_TYPE_ALLOC_FD should set this function as the struct - * file_operations release method. - */ -int uverbs_uobject_fd_release(struct inode *inode, struct file *filp) +int uverbs_uobject_release(struct ib_uobject *uobj) { struct ib_uverbs_file *ufile; - struct ib_uobject *uobj; - /* - * This can only happen if the fput came from alloc_abort_fd_uobject() - */ - if (!filp->private_data) - return 0; - uobj = filp->private_data; ufile = uobj->ufile; if (down_read_trylock(&ufile->hw_destroy_rwsem)) { @@ -843,6 +839,21 @@ int uverbs_uobject_fd_release(struct inode *inode, struct file *filp) uverbs_uobject_put(uobj); return 0; } + +/* + * Users of UVERBS_TYPE_ALLOC_FD should set this function as the struct + * file_operations release method. + */ +int uverbs_uobject_fd_release(struct inode *inode, struct file *filp) +{ + /* + * This can only happen if the fput came from alloc_abort_fd_uobject() + */ + if (!filp->private_data) + return 0; + + return uverbs_uobject_release(filp->private_data); +} EXPORT_SYMBOL(uverbs_uobject_fd_release); /* diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h index a59b087611cb..55f1e3558856 100644 --- a/drivers/infiniband/core/rdma_core.h +++ b/drivers/infiniband/core/rdma_core.h @@ -156,6 +156,7 @@ extern const struct uapi_definition uverbs_def_obj_counters[]; extern const struct uapi_definition uverbs_def_obj_cq[]; extern const struct uapi_definition uverbs_def_obj_device[]; extern const struct uapi_definition uverbs_def_obj_dm[]; +extern const struct uapi_definition uverbs_def_obj_dmabuf[]; extern const struct uapi_definition uverbs_def_obj_dmah[]; extern const struct uapi_definition uverbs_def_obj_flow_action[]; extern const struct uapi_definition uverbs_def_obj_intf[]; diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 797e2fcc8072..66287e8e7ad7 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file { struct ib_uverbs_event_queue ev_queue; }; +struct ib_uverbs_dmabuf_file { + struct ib_uobject uobj; + struct dma_buf *dmabuf; + struct list_head dmabufs_elm; + struct rdma_user_mmap_entry *mmap_entry; + struct dma_buf_phys_vec phys_vec; + struct p2pdma_provider *provider; + u8 revoked :1; +}; + struct ib_uverbs_event { union { struct ib_uverbs_async_event_desc async; diff --git a/drivers/infiniband/core/uverbs_std_types_dmabuf.c b/drivers/infiniband/core/uverbs_std_types_dmabuf.c new file mode 100644 index 000000000000..ef5484022e77 --- /dev/null +++ b/drivers/infiniband/core/uverbs_std_types_dmabuf.c @@ -0,0 +1,172 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include <linux/dma-buf-mapping.h> +#include <linux/pci-p2pdma.h> +#include <linux/dma-resv.h> +#include <rdma/uverbs_std_types.h> +#include "rdma_core.h" +#include "uverbs.h" + +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; + + if (!attachment->peer2peer) + return -EOPNOTSUPP; + + if (priv->revoked) + return -ENODEV; + + return 0; +} + +static struct sg_table * +uverbs_dmabuf_map(struct dma_buf_attachment *attachment, + enum dma_data_direction dir) +{ + struct ib_uverbs_dmabuf_file *priv = attachment->dmabuf->priv; + + dma_resv_assert_held(priv->dmabuf->resv); + + if (priv->revoked) + return ERR_PTR(-ENODEV); + + return dma_buf_phys_vec_to_sgt(attachment, priv->provider, + &priv->phys_vec, 1, priv->phys_vec.len, + dir); +} + +static void uverbs_dmabuf_unmap(struct dma_buf_attachment *attachment, + struct sg_table *sgt, + enum dma_data_direction dir) +{ + dma_buf_free_sgt(attachment, sgt, dir); +} + +static void uverbs_dmabuf_release(struct dma_buf *dmabuf) +{ + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; + + /* + * This can only happen if the fput came from alloc_abort_fd_uobject() + */ + if (!priv->uobj.context) + return; + + uverbs_uobject_release(&priv->uobj); +} + +static const struct dma_buf_ops uverbs_dmabuf_ops = { + .attach = uverbs_dmabuf_attach, + .map_dma_buf = uverbs_dmabuf_map, + .unmap_dma_buf = uverbs_dmabuf_unmap, + .release = uverbs_dmabuf_release, +}; + +static int UVERBS_HANDLER(UVERBS_METHOD_DMABUF_ALLOC)( + struct uverbs_attr_bundle *attrs) +{ + struct ib_uobject *uobj = + uverbs_attr_get(attrs, UVERBS_ATTR_ALLOC_DMABUF_HANDLE) + ->obj_attr.uobject; + struct ib_uverbs_dmabuf_file *uverbs_dmabuf = + container_of(uobj, struct ib_uverbs_dmabuf_file, uobj); + struct ib_device *ib_dev = attrs->context->device; + struct rdma_user_mmap_entry *mmap_entry; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + off_t pg_off; + int ret; + + ret = uverbs_get_const(&pg_off, attrs, UVERBS_ATTR_ALLOC_DMABUF_PGOFF); + if (ret) + return ret; + + mmap_entry = ib_dev->ops.pgoff_to_mmap_entry(attrs->context, pg_off); + if (!mmap_entry) + return -EINVAL; + + ret = ib_dev->ops.mmap_get_pfns(mmap_entry, &uverbs_dmabuf->phys_vec, + &uverbs_dmabuf->provider); + if (ret) + goto err; + + exp_info.ops = &uverbs_dmabuf_ops; + exp_info.size = uverbs_dmabuf->phys_vec.len; + exp_info.flags = O_CLOEXEC; + exp_info.priv = uverbs_dmabuf; + + uverbs_dmabuf->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(uverbs_dmabuf->dmabuf)) { + ret = PTR_ERR(uverbs_dmabuf->dmabuf); + goto err; + } + + INIT_LIST_HEAD(&uverbs_dmabuf->dmabufs_elm); + mutex_lock(&mmap_entry->dmabufs_lock); + if (mmap_entry->driver_removed) + ret = -EIO; + else + list_add_tail(&uverbs_dmabuf->dmabufs_elm, &mmap_entry->dmabufs); + mutex_unlock(&mmap_entry->dmabufs_lock); + if (ret) + goto err_revoked; + + uobj->object = uverbs_dmabuf->dmabuf->file; + uverbs_dmabuf->mmap_entry = mmap_entry; + uverbs_finalize_uobj_create(attrs, UVERBS_ATTR_ALLOC_DMABUF_HANDLE); + return 0; + +err_revoked: + dma_buf_put(uverbs_dmabuf->dmabuf); +err: + rdma_user_mmap_entry_put(mmap_entry); + return ret; +} + +DECLARE_UVERBS_NAMED_METHOD( + UVERBS_METHOD_DMABUF_ALLOC, + UVERBS_ATTR_FD(UVERBS_ATTR_ALLOC_DMABUF_HANDLE, + UVERBS_OBJECT_DMABUF, + UVERBS_ACCESS_NEW, + UA_MANDATORY), + UVERBS_ATTR_PTR_IN(UVERBS_ATTR_ALLOC_DMABUF_PGOFF, + UVERBS_ATTR_TYPE(u64), + UA_MANDATORY)); + +static void uverbs_dmabuf_fd_destroy_uobj(struct ib_uobject *uobj, + enum rdma_remove_reason why) +{ + struct ib_uverbs_dmabuf_file *uverbs_dmabuf = + container_of(uobj, struct ib_uverbs_dmabuf_file, uobj); + + mutex_lock(&uverbs_dmabuf->mmap_entry->dmabufs_lock); + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL); + if (!uverbs_dmabuf->revoked) { + uverbs_dmabuf->revoked = true; + list_del(&uverbs_dmabuf->dmabufs_elm); + dma_buf_move_notify(uverbs_dmabuf->dmabuf); + } + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv); + mutex_unlock(&uverbs_dmabuf->mmap_entry->dmabufs_lock); + + /* Matches the get done as part of pgoff_to_mmap_entry() */ + rdma_user_mmap_entry_put(uverbs_dmabuf->mmap_entry); +}; + +DECLARE_UVERBS_NAMED_OBJECT( + UVERBS_OBJECT_DMABUF, + UVERBS_TYPE_ALLOC_FD(sizeof(struct ib_uverbs_dmabuf_file), + uverbs_dmabuf_fd_destroy_uobj, + NULL, NULL, O_RDONLY), + &UVERBS_METHOD(UVERBS_METHOD_DMABUF_ALLOC)); + +const struct uapi_definition uverbs_def_obj_dmabuf[] = { + UAPI_DEF_CHAIN_OBJ_TREE_NAMED(UVERBS_OBJECT_DMABUF), + UAPI_DEF_OBJ_NEEDS_FN(mmap_get_pfns), + UAPI_DEF_OBJ_NEEDS_FN(pgoff_to_mmap_entry), + {} +}; diff --git a/drivers/infiniband/core/uverbs_uapi.c b/drivers/infiniband/core/uverbs_uapi.c index e00ea63175bd..38d0bbbee796 100644 --- a/drivers/infiniband/core/uverbs_uapi.c +++ b/drivers/infiniband/core/uverbs_uapi.c @@ -631,6 +631,7 @@ static const struct uapi_definition uverbs_core_api[] = { UAPI_DEF_CHAIN(uverbs_def_obj_cq), UAPI_DEF_CHAIN(uverbs_def_obj_device), UAPI_DEF_CHAIN(uverbs_def_obj_dm), + UAPI_DEF_CHAIN(uverbs_def_obj_dmabuf), UAPI_DEF_CHAIN(uverbs_def_obj_dmah), UAPI_DEF_CHAIN(uverbs_def_obj_flow_action), UAPI_DEF_CHAIN(uverbs_def_obj_intf), diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 6c372a37c482..5be67013c8ae 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -43,6 +43,7 @@ #include <uapi/rdma/rdma_user_ioctl.h> #include <uapi/rdma/ib_user_ioctl_verbs.h> #include <linux/pci-tph.h> +#include <linux/dma-buf.h> #define IB_FW_VERSION_NAME_MAX ETHTOOL_FWVERS_LEN @@ -2363,6 +2364,9 @@ struct rdma_user_mmap_entry { unsigned long start_pgoff; size_t npages; bool driver_removed; + /* protects access to dmabufs */ + struct mutex dmabufs_lock; + struct list_head dmabufs; }; /* Return the offset (in bytes) the user should pass to libc's mmap() */ @@ -2500,6 +2504,11 @@ struct ib_device_ops { * Therefore needs to be implemented by the driver in mmap_free. */ void (*mmap_free)(struct rdma_user_mmap_entry *entry); + int (*mmap_get_pfns)(struct rdma_user_mmap_entry *entry, + struct dma_buf_phys_vec *phys_vec, + struct p2pdma_provider **provider); + struct rdma_user_mmap_entry *(*pgoff_to_mmap_entry)(struct ib_ucontext *ucontext, + off_t pg_off); void (*disassociate_ucontext)(struct ib_ucontext *ibcontext); int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata); int (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata); diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h index 26ba919ac245..6a253b7dc5ea 100644 --- a/include/rdma/uverbs_types.h +++ b/include/rdma/uverbs_types.h @@ -186,6 +186,7 @@ struct ib_uverbs_file { extern const struct uverbs_obj_type_class uverbs_idr_class; extern const struct uverbs_obj_type_class uverbs_fd_class; int uverbs_uobject_fd_release(struct inode *inode, struct file *filp); +int uverbs_uobject_release(struct ib_uobject *uobj); #define UVERBS_BUILD_BUG_ON(cond) (sizeof(char[1 - 2 * !!(cond)]) - \ sizeof(char)) diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h index 35da4026f452..72041c1b0ea5 100644 --- a/include/uapi/rdma/ib_user_ioctl_cmds.h +++ b/include/uapi/rdma/ib_user_ioctl_cmds.h @@ -56,6 +56,7 @@ enum uverbs_default_objects { UVERBS_OBJECT_COUNTERS, UVERBS_OBJECT_ASYNC_EVENT, UVERBS_OBJECT_DMAH, + UVERBS_OBJECT_DMABUF, }; enum { @@ -263,6 +264,15 @@ enum uverbs_methods_dmah { UVERBS_METHOD_DMAH_FREE, }; +enum uverbs_attrs_alloc_dmabuf_cmd_attr_ids { + UVERBS_ATTR_ALLOC_DMABUF_HANDLE, + UVERBS_ATTR_ALLOC_DMABUF_PGOFF, +}; + +enum uverbs_methods_dmabuf { + UVERBS_METHOD_DMABUF_ALLOC, +}; + enum uverbs_attrs_reg_dm_mr_cmd_attr_ids { UVERBS_ATTR_REG_DM_MR_HANDLE, UVERBS_ATTR_REG_DM_MR_OFFSET, -- 2.49.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji @ 2026-01-20 18:15 ` Jason Gunthorpe 2026-01-21 8:32 ` Leon Romanovsky 2026-01-21 10:07 ` Yishai Hadas 2026-01-25 14:31 ` Leon Romanovsky 1 sibling, 2 replies; 11+ messages in thread From: Jason Gunthorpe @ 2026-01-20 18:15 UTC (permalink / raw) To: Edward Srouji Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote: > void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry) > { > + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp; > + > if (!entry) > return; > > + mutex_lock(&entry->dmabufs_lock); > xa_lock(&entry->ucontext->mmap_xa); > entry->driver_removed = true; > xa_unlock(&entry->ucontext->mmap_xa); > + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) { > + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL); > + list_del(&uverbs_dmabuf->dmabufs_elm); > + uverbs_dmabuf->revoked = true; > + dma_buf_move_notify(uverbs_dmabuf->dmabuf); > + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv); This will need the same wait that Christian pointed out for VFIO.. > diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c > index 18918f463361..3e0a8b9cd288 100644 > --- a/drivers/infiniband/core/rdma_core.c > +++ b/drivers/infiniband/core/rdma_core.c > @@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, > > fd_type = > container_of(obj->type_attrs, struct uverbs_obj_fd_type, type); > - if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release && > + if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release && > fd_type->fops->release != &uverbs_async_event_release)) { > ret = ERR_PTR(-EINVAL); > goto err_fd; > @@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, > goto err_fd; > } > > - /* Note that uverbs_uobject_fd_release() is called during abort */ > - filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, > - fd_type->flags); > - if (IS_ERR(filp)) { > - ret = ERR_CAST(filp); > - goto err_getfile; > + if (fd_type->fops) { > + /* Note that uverbs_uobject_fd_release() is called during abort */ > + filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, > + fd_type->flags); > + if (IS_ERR(filp)) { > + ret = ERR_CAST(filp); > + goto err_getfile; > + } > + uobj->object = filp; > } > - uobj->object = filp; > > uobj->id = new_fd; > return uobj; > @@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj) > { > struct file *filp = uobj->object; > > - fput(filp); > + if (filp) > + fput(filp); > + > put_unused_fd(uobj->id); This stuff changing hw the uobjects work should probably be in its own patch with its own explanation about creating a uobject that wrappers an externally allocated file descriptor vs this automatic internal allocation. > index 797e2fcc8072..66287e8e7ad7 100644 > --- a/drivers/infiniband/core/uverbs.h > +++ b/drivers/infiniband/core/uverbs.h > @@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file { > struct ib_uverbs_event_queue ev_queue; > }; > > +struct ib_uverbs_dmabuf_file { > + struct ib_uobject uobj; > + struct dma_buf *dmabuf; > + struct list_head dmabufs_elm; > + struct rdma_user_mmap_entry *mmap_entry; > + struct dma_buf_phys_vec phys_vec; Oh, are we going to have weird merge conflicts with this Leon? > +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *attachment) > +{ > + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; > + > + if (!attachment->peer2peer) > + return -EOPNOTSUPP; > + > + if (priv->revoked) > + return -ENODEV; This should only be checked in map This should also eventually call the new revoke testing function Leon is adding Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-20 18:15 ` Jason Gunthorpe @ 2026-01-21 8:32 ` Leon Romanovsky 2026-01-21 13:56 ` Jason Gunthorpe 2026-01-21 10:07 ` Yishai Hadas 1 sibling, 1 reply; 11+ messages in thread From: Leon Romanovsky @ 2026-01-21 8:32 UTC (permalink / raw) To: Jason Gunthorpe Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas On Tue, Jan 20, 2026 at 02:15:20PM -0400, Jason Gunthorpe wrote: > On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote: > > void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry) > > { > > + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp; > > + > > if (!entry) > > return; > > > > + mutex_lock(&entry->dmabufs_lock); > > xa_lock(&entry->ucontext->mmap_xa); > > entry->driver_removed = true; > > xa_unlock(&entry->ucontext->mmap_xa); > > + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) { > > + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL); > > + list_del(&uverbs_dmabuf->dmabufs_elm); > > + uverbs_dmabuf->revoked = true; > > + dma_buf_move_notify(uverbs_dmabuf->dmabuf); > > + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv); > > This will need the same wait that Christian pointed out for VFIO.. Yes, something like this is missing https://lore.kernel.org/all/20260120-dmabuf-revoke-v3-6-b7e0b07b8214@nvidia.com/ <...> > > +struct ib_uverbs_dmabuf_file { > > + struct ib_uobject uobj; > > + struct dma_buf *dmabuf; > > + struct list_head dmabufs_elm; > > + struct rdma_user_mmap_entry *mmap_entry; > > + struct dma_buf_phys_vec phys_vec; > > Oh, are we going to have weird merge conflicts with this Leon? No, Alex created a shared branch with the rename already applied for me. I had planned to merge it into the RDMA tree before taking this series, and then update dma_buf_phys_vec to phys_vec locally. > > > +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, > > + struct dma_buf_attachment *attachment) > > +{ > > + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; > > + > > + if (!attachment->peer2peer) > > + return -EOPNOTSUPP; > > + > > + if (priv->revoked) > > + return -ENODEV; > > This should only be checked in map I disagree with word "only", the more accurate word is "too". There is no need to allow new importer attach if this exporter is marked as revoked. > > This should also eventually call the new revoke testing function Leon > is adding We will add it once my series will be accepted. Thanks > > Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-21 8:32 ` Leon Romanovsky @ 2026-01-21 13:56 ` Jason Gunthorpe 2026-01-21 16:27 ` Yishai Hadas 0 siblings, 1 reply; 11+ messages in thread From: Jason Gunthorpe @ 2026-01-21 13:56 UTC (permalink / raw) To: Leon Romanovsky Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas On Wed, Jan 21, 2026 at 10:32:46AM +0200, Leon Romanovsky wrote: > > > +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, > > > + struct dma_buf_attachment *attachment) > > > +{ > > > + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; > > > + > > > + if (!attachment->peer2peer) > > > + return -EOPNOTSUPP; > > > + > > > + if (priv->revoked) > > > + return -ENODEV; > > > > This should only be checked in map > > I disagree with word "only", the more accurate word is "too". There is > no need to allow new importer attach if this exporter is marked as > revoked. It must check during map, during attach as well is redundant and a bit confusing. > > This should also eventually call the new revoke testing function Leon > > is adding > > We will add it once my series will be accepted. It should also refuse pinned importers with an always fail pin op until we get that done. This is a case like VFIO where the lifecycle is more general and I don't want to accidently allow things that shouldn't work. Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-21 13:56 ` Jason Gunthorpe @ 2026-01-21 16:27 ` Yishai Hadas 0 siblings, 0 replies; 11+ messages in thread From: Yishai Hadas @ 2026-01-21 16:27 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky Cc: Edward Srouji, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig On 21/01/2026 15:56, Jason Gunthorpe wrote: > On Wed, Jan 21, 2026 at 10:32:46AM +0200, Leon Romanovsky wrote: >>>> +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, >>>> + struct dma_buf_attachment *attachment) >>>> +{ >>>> + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; >>>> + >>>> + if (!attachment->peer2peer) >>>> + return -EOPNOTSUPP; >>>> + >>>> + if (priv->revoked) >>>> + return -ENODEV; >>> >>> This should only be checked in map >> >> I disagree with word "only", the more accurate word is "too". There is >> no need to allow new importer attach if this exporter is marked as >> revoked. > > It must check during map, during attach as well is redundant and a bit > confusing. > OK, let's drop this check as part of the 'attach'. >>> This should also eventually call the new revoke testing function Leon >>> is adding >> >> We will add it once my series will be accepted. > > It should also refuse pinned importers with an always fail pin op > until we get that done. This is a case like VFIO where the lifecycle > is more general and I don't want to accidently allow things that > shouldn't work. > Sure, will be part of V1. Thanks, Yishai ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-20 18:15 ` Jason Gunthorpe 2026-01-21 8:32 ` Leon Romanovsky @ 2026-01-21 10:07 ` Yishai Hadas 1 sibling, 0 replies; 11+ messages in thread From: Yishai Hadas @ 2026-01-21 10:07 UTC (permalink / raw) To: Jason Gunthorpe, Edward Srouji Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig On 20/01/2026 20:15, Jason Gunthorpe wrote: > On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote: >> void rdma_user_mmap_entry_remove(struct rdma_user_mmap_entry *entry) >> { >> + struct ib_uverbs_dmabuf_file *uverbs_dmabuf, *tmp; >> + >> if (!entry) >> return; >> >> + mutex_lock(&entry->dmabufs_lock); >> xa_lock(&entry->ucontext->mmap_xa); >> entry->driver_removed = true; >> xa_unlock(&entry->ucontext->mmap_xa); >> + list_for_each_entry_safe(uverbs_dmabuf, tmp, &entry->dmabufs, dmabufs_elm) { >> + dma_resv_lock(uverbs_dmabuf->dmabuf->resv, NULL); >> + list_del(&uverbs_dmabuf->dmabufs_elm); >> + uverbs_dmabuf->revoked = true; >> + dma_buf_move_notify(uverbs_dmabuf->dmabuf); >> + dma_resv_unlock(uverbs_dmabuf->dmabuf->resv); > > This will need the same wait that Christian pointed out for VFIO.. Sure, I'll add. > > >> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c >> index 18918f463361..3e0a8b9cd288 100644 >> --- a/drivers/infiniband/core/rdma_core.c >> +++ b/drivers/infiniband/core/rdma_core.c >> @@ -465,7 +465,7 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, >> >> fd_type = >> container_of(obj->type_attrs, struct uverbs_obj_fd_type, type); >> - if (WARN_ON(fd_type->fops->release != &uverbs_uobject_fd_release && >> + if (WARN_ON(fd_type->fops && fd_type->fops->release != &uverbs_uobject_fd_release && >> fd_type->fops->release != &uverbs_async_event_release)) { >> ret = ERR_PTR(-EINVAL); >> goto err_fd; >> @@ -477,14 +477,16 @@ alloc_begin_fd_uobject(const struct uverbs_api_object *obj, >> goto err_fd; >> } >> >> - /* Note that uverbs_uobject_fd_release() is called during abort */ >> - filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, >> - fd_type->flags); >> - if (IS_ERR(filp)) { >> - ret = ERR_CAST(filp); >> - goto err_getfile; >> + if (fd_type->fops) { >> + /* Note that uverbs_uobject_fd_release() is called during abort */ >> + filp = anon_inode_getfile(fd_type->name, fd_type->fops, NULL, >> + fd_type->flags); >> + if (IS_ERR(filp)) { >> + ret = ERR_CAST(filp); >> + goto err_getfile; >> + } >> + uobj->object = filp; >> } >> - uobj->object = filp; >> >> uobj->id = new_fd; >> return uobj; >> @@ -561,7 +563,9 @@ static void alloc_abort_fd_uobject(struct ib_uobject *uobj) >> { >> struct file *filp = uobj->object; >> >> - fput(filp); >> + if (filp) >> + fput(filp); >> + >> put_unused_fd(uobj->id); > > This stuff changing hw the uobjects work should probably be in its own > patch with its own explanation about creating a uobject that wrappers > an externally allocated file descriptor vs this automatic internal > allocation. Sure, I’ll split the current patch into two patches. > >> index 797e2fcc8072..66287e8e7ad7 100644 >> --- a/drivers/infiniband/core/uverbs.h >> +++ b/drivers/infiniband/core/uverbs.h >> @@ -133,6 +133,16 @@ struct ib_uverbs_completion_event_file { >> struct ib_uverbs_event_queue ev_queue; >> }; >> >> +struct ib_uverbs_dmabuf_file { >> + struct ib_uobject uobj; >> + struct dma_buf *dmabuf; >> + struct list_head dmabufs_elm; >> + struct rdma_user_mmap_entry *mmap_entry; >> + struct dma_buf_phys_vec phys_vec; > > Oh, are we going to have weird merge conflicts with this Leon? > >> +static int uverbs_dmabuf_attach(struct dma_buf *dmabuf, >> + struct dma_buf_attachment *attachment) >> +{ >> + struct ib_uverbs_dmabuf_file *priv = dmabuf->priv; >> + >> + if (!attachment->peer2peer) >> + return -EOPNOTSUPP; >> + >> + if (priv->revoked) >> + return -ENODEV; > > This should only be checked in map Please see Leon's answer on that. Yishai ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations 2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji 2026-01-20 18:15 ` Jason Gunthorpe @ 2026-01-25 14:31 ` Leon Romanovsky 1 sibling, 0 replies; 11+ messages in thread From: Leon Romanovsky @ 2026-01-25 14:31 UTC (permalink / raw) To: Edward Srouji Cc: Jason Gunthorpe, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas On Thu, Jan 08, 2026 at 01:11:14PM +0200, Edward Srouji wrote: > From: Yishai Hadas <yishaih@nvidia.com> > > Expose DMABUF functionality to userspace through the uverbs interface, > enabling InfiniBand/RDMA devices to export PCI based memory regions > (e.g. device memory) as DMABUF file descriptors. This allows > zero-copy sharing of RDMA memory with other subsystems that support the > dma-buf framework. > > A new UVERBS_OBJECT_DMABUF object type and allocation method were > introduced. > > During allocation, uverbs invokes the driver to supply the > rdma_user_mmap_entry associated with the given page offset (pgoff). > > Based on the returned rdma_user_mmap_entry, uverbs requests the driver > to provide the corresponding physical-memory details as well as the > driver’s PCI provider information. > > Using this information, dma_buf_export() is called; if it succeeds, > uobj->object is set to the underlying file pointer returned by the > dma-buf framework. > > The file descriptor number follows the standard uverbs allocation flow, > but the file pointer comes from the dma-buf subsystem, including its own > fops and private data. > > Because of this, alloc_begin_fd_uobject() must handle cases where > fd_type->fops is NULL, and both alloc_commit_fd_uobject() and > alloc_abort_fd_uobject() must account for whether filp->private_data > exists, since it is only populated after a successful dma_buf_export(). > > When an mmap entry is removed, uverbs iterates over its associated > DMABUFs, marks them as revoked, and calls dma_buf_move_notify() so that > their importers are notified. > > The same procedure applies during the disassociate flow; final cleanup > occurs when the application closes the file. > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com> > Signed-off-by: Edward Srouji <edwards@nvidia.com> > --- > drivers/infiniband/core/Makefile | 1 + > drivers/infiniband/core/device.c | 2 + > drivers/infiniband/core/ib_core_uverbs.c | 19 +++ > drivers/infiniband/core/rdma_core.c | 63 ++++---- > drivers/infiniband/core/rdma_core.h | 1 + > drivers/infiniband/core/uverbs.h | 10 ++ > drivers/infiniband/core/uverbs_std_types_dmabuf.c | 172 ++++++++++++++++++++++ > drivers/infiniband/core/uverbs_uapi.c | 1 + > include/rdma/ib_verbs.h | 9 ++ > include/rdma/uverbs_types.h | 1 + > include/uapi/rdma/ib_user_ioctl_cmds.h | 10 ++ > 11 files changed, 263 insertions(+), 26 deletions(-) <...> > +static struct sg_table * > +uverbs_dmabuf_map(struct dma_buf_attachment *attachment, > + enum dma_data_direction dir) > +{ > + struct ib_uverbs_dmabuf_file *priv = attachment->dmabuf->priv; > + > + dma_resv_assert_held(priv->dmabuf->resv); > + > + if (priv->revoked) > + return ERR_PTR(-ENODEV); > + > + return dma_buf_phys_vec_to_sgt(attachment, priv->provider, > + &priv->phys_vec, 1, priv->phys_vec.len, > + dir); > +} > + > +static void uverbs_dmabuf_unmap(struct dma_buf_attachment *attachment, > + struct sg_table *sgt, > + enum dma_data_direction dir) > +{ > + dma_buf_free_sgt(attachment, sgt, dir); > +} Unfortunately, it is not enough. Exporters should count their map<->unmap calls and make sure that they are equal. See this VFIO change https://lore.kernel.org/kvm/20260124-dmabuf-revoke-v5-4-f98fca917e96@nvidia.com/ Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops 2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji 2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji @ 2026-01-08 11:11 ` Edward Srouji 2026-01-20 18:18 ` Jason Gunthorpe 1 sibling, 1 reply; 11+ messages in thread From: Edward Srouji @ 2026-01-08 11:11 UTC (permalink / raw) To: Jason Gunthorpe, Leon Romanovsky, Sumit Semwal, Christian König Cc: linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas, Edward Srouji From: Yishai Hadas <yishaih@nvidia.com> Enable p2pdma on the mlx5 PCI device to allow DMABUF-based peer-to-peer DMA mappings. Add implementation of the mmap_get_pfns and pgoff_to_mmap_entry device operations required for DMABUF support in the mlx5 RDMA driver. The pgoff_to_mmap_entry operation converts a page offset to the corresponding rdma_user_mmap_entry by extracting the command and index from the offset and looking it up in the ucontext's mmap_xa. The mmap_get_pfns operation retrieves the physical address and length from the mmap entry and obtains the p2pdma provider for the underlying PCI device, which is needed for peer-to-peer DMA operations with DMABUFs. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> --- drivers/infiniband/hw/mlx5/main.c | 72 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index e81080622283..f97c86c96d83 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -2446,6 +2446,70 @@ static int mlx5_ib_mmap_clock_info_page(struct mlx5_ib_dev *dev, virt_to_page(dev->mdev->clock_info)); } +static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa) +{ + resource_size_t start, end; + int bar; + + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { + /* Skip BARs not present or not memory-mapped */ + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) + continue; + + start = pci_resource_start(pdev, bar); + end = pci_resource_end(pdev, bar); + + if (!start || !end) + continue; + + if (pa >= start && pa <= end) + return bar; + } + + return -1; +} + +static int mlx5_ib_mmap_get_pfns(struct rdma_user_mmap_entry *entry, + struct dma_buf_phys_vec *phys_vec, + struct p2pdma_provider **provider) +{ + struct mlx5_user_mmap_entry *mentry = to_mmmap(entry); + struct pci_dev *pdev = to_mdev(entry->ucontext->device)->mdev->pdev; + int bar; + + phys_vec->paddr = mentry->address; + phys_vec->len = entry->npages * PAGE_SIZE; + + bar = phys_addr_to_bar(pdev, phys_vec->paddr); + if (bar < 0) + return -EINVAL; + + *provider = pcim_p2pdma_provider(pdev, bar); + /* If the kernel was not compiled with CONFIG_PCI_P2PDMA the + * functionality is not supported. + */ + if (!*provider) + return -EOPNOTSUPP; + + return 0; +} + +static struct rdma_user_mmap_entry * +mlx5_ib_pgoff_to_mmap_entry(struct ib_ucontext *ucontext, off_t pg_off) +{ + unsigned long entry_pgoff; + unsigned long idx; + u8 command; + + pg_off = pg_off >> PAGE_SHIFT; + command = get_command(pg_off); + idx = get_extended_index(pg_off); + + entry_pgoff = command << 16 | idx; + + return rdma_user_mmap_entry_get_pgoff(ucontext, entry_pgoff); +} + static void mlx5_ib_mmap_free(struct rdma_user_mmap_entry *entry) { struct mlx5_user_mmap_entry *mentry = to_mmmap(entry); @@ -4360,7 +4424,13 @@ static int mlx5_ib_stage_init_init(struct mlx5_ib_dev *dev) if (err) goto err_mp; + err = pcim_p2pdma_init(mdev->pdev); + if (err && err != -EOPNOTSUPP) + goto err_dd; + return 0; +err_dd: + mlx5_ib_data_direct_cleanup(dev); err_mp: mlx5_ib_cleanup_multiport_master(dev); err: @@ -4412,11 +4482,13 @@ static const struct ib_device_ops mlx5_ib_dev_ops = { .map_mr_sg_pi = mlx5_ib_map_mr_sg_pi, .mmap = mlx5_ib_mmap, .mmap_free = mlx5_ib_mmap_free, + .mmap_get_pfns = mlx5_ib_mmap_get_pfns, .modify_cq = mlx5_ib_modify_cq, .modify_device = mlx5_ib_modify_device, .modify_port = mlx5_ib_modify_port, .modify_qp = mlx5_ib_modify_qp, .modify_srq = mlx5_ib_modify_srq, + .pgoff_to_mmap_entry = mlx5_ib_pgoff_to_mmap_entry, .pre_destroy_cq = mlx5_ib_pre_destroy_cq, .poll_cq = mlx5_ib_poll_cq, .post_destroy_cq = mlx5_ib_post_destroy_cq, -- 2.49.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops 2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji @ 2026-01-20 18:18 ` Jason Gunthorpe 2026-01-21 10:35 ` Yishai Hadas 0 siblings, 1 reply; 11+ messages in thread From: Jason Gunthorpe @ 2026-01-20 18:18 UTC (permalink / raw) To: Edward Srouji Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel, linaro-mm-sig, Yishai Hadas On Thu, Jan 08, 2026 at 01:11:15PM +0200, Edward Srouji wrote: > +static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa) > +{ > + resource_size_t start, end; > + int bar; > + > + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { > + /* Skip BARs not present or not memory-mapped */ > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) > + continue; > + > + start = pci_resource_start(pdev, bar); > + end = pci_resource_end(pdev, bar); > + > + if (!start || !end) > + continue; > + > + if (pa >= start && pa <= end) > + return bar; > + } Don't we know which of the two BARs the mmap entry came from based on its type? This seems like overkill.. Jason ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops 2026-01-20 18:18 ` Jason Gunthorpe @ 2026-01-21 10:35 ` Yishai Hadas 0 siblings, 0 replies; 11+ messages in thread From: Yishai Hadas @ 2026-01-21 10:35 UTC (permalink / raw) To: Jason Gunthorpe, Edward Srouji Cc: Leon Romanovsky, Sumit Semwal, Christian König, linux-kernel, linux-rdma, linux-media, dri-devel On 20/01/2026 20:18, Jason Gunthorpe wrote: > On Thu, Jan 08, 2026 at 01:11:15PM +0200, Edward Srouji wrote: >> +static int phys_addr_to_bar(struct pci_dev *pdev, phys_addr_t pa) >> +{ >> + resource_size_t start, end; >> + int bar; >> + >> + for (bar = 0; bar < PCI_STD_NUM_BARS; bar++) { >> + /* Skip BARs not present or not memory-mapped */ >> + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) >> + continue; >> + >> + start = pci_resource_start(pdev, bar); >> + end = pci_resource_end(pdev, bar); >> + >> + if (!start || !end) >> + continue; >> + >> + if (pa >= start && pa <= end) >> + return bar; >> + } > > Don't we know which of the two BARs the mmap entry came from based on > its type? This seems like overkill.. > Actually no. Currently, a given type can reside on different BARs based on function type (i.e. PF/SF). As we don't have any cap/knowledge for the above mapping, we would prefer the above code which finds the correct bar (for now 0 or 2) dynamically. Yishai ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-01-25 14:31 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-08 11:11 [PATCH rdma-next 0/2] RDMA: Add support for exporting dma-buf file descriptors Edward Srouji 2026-01-08 11:11 ` [PATCH rdma-next 1/2] RDMA/uverbs: Add DMABUF object type and operations Edward Srouji 2026-01-20 18:15 ` Jason Gunthorpe 2026-01-21 8:32 ` Leon Romanovsky 2026-01-21 13:56 ` Jason Gunthorpe 2026-01-21 16:27 ` Yishai Hadas 2026-01-21 10:07 ` Yishai Hadas 2026-01-25 14:31 ` Leon Romanovsky 2026-01-08 11:11 ` [PATCH rdma-next 2/2] RDMA/mlx5: Implement DMABUF export ops Edward Srouji 2026-01-20 18:18 ` Jason Gunthorpe 2026-01-21 10:35 ` Yishai Hadas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox