public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [PATCH 0/2] support dmabuf
@ 2026-01-27 17:44 Cliff Burdick
  2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
                   ` (3 more replies)
  0 siblings, 4 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-01-27 17:44 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov

Add support for kernel dmabuf feature and integrate it in the mlx5 driver. 
This feature is needed to support GPUDirect on newer kernels.

Cliff Burdick (2):
  eal: support dmabuf
  common/mlx5: support dmabuf

 .mailmap                                      |   1 +
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  28 ++-
 drivers/common/mlx5/mlx5_common_mr.c          | 108 ++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 lib/eal/common/eal_common_memory.c            | 168 ++++++++++++++++++
 lib/eal/common/eal_memalloc.h                 |  21 +++
 lib/eal/common/malloc_heap.c                  |  27 +++
 lib/eal/common/malloc_heap.h                  |   5 +
 lib/eal/include/rte_memory.h                  | 125 +++++++++++++
 16 files changed, 576 insertions(+), 8 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] eal: support dmabuf
  2026-01-27 17:44 [PATCH 0/2] support dmabuf Cliff Burdick
@ 2026-01-27 17:44 ` Cliff Burdick
  2026-01-29  1:48   ` Stephen Hemminger
  2026-01-29  1:51   ` Stephen Hemminger
  2026-01-27 17:44 ` [PATCH 2/2] common/mlx5: " Cliff Burdick
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-01-27 17:44 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov, Thomas Monjalon

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

A new api function rte_extmem_register_dmabuf is introduced to create
the mapping from a dmabuf file descriptor. dmabuf uses a file descriptor
and an offset that has been pre-opened with the kernel. The kernel uses
the file descriptor to map to a VA pointer. To avoid ABI changes, a
static struct is used inside of eal_common_memory.c, and lookups are
done on this struct rather than from the rte_memseg_list.

Ideally we would like to add both the dmabuf file descriptor and offset
to rte_memseg_list, but it's not clear if we can reuse existing fields
when using the dmabuf API.

We could rename the external flag to a more generic "properties" flag where
"external" is the lowest bit, then we can use the second bit to indicate the
presence of dmabuf. In the presence of the flag for dmabuf we could
reuse the base_va address field for the dmabuf offset, and the socket_id
for the file descriptor.

Which option is preferred?

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                           |   1 +
 lib/eal/common/eal_common_memory.c | 168 +++++++++++++++++++++++++++++
 lib/eal/common/eal_memalloc.h      |  21 ++++
 lib/eal/common/malloc_heap.c       |  27 +++++
 lib/eal/common/malloc_heap.h       |   5 +
 lib/eal/include/rte_memory.h       | 125 +++++++++++++++++++++
 6 files changed, 347 insertions(+)

diff --git a/.mailmap b/.mailmap
index 2f089326ff..4c2b2f921d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,6 +291,7 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
+Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c62edf5e55..304ed18396 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -45,6 +45,18 @@
 static void *next_baseaddr;
 static uint64_t system_page_sz;
 
+/* Internal storage for dmabuf info, indexed by memseg list index.
+ * This keeps dmabuf metadata out of the public rte_memseg_list structure
+ * to preserve ABI compatibility.
+ */
+static struct {
+		int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
+		uint64_t offset; /**< offset within dmabuf */
+	} dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
+	[0 ... RTE_MAX_MEMSEG_LISTS - 1] = { .fd = -1, .offset = 0 }
+};
+
+
 #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5
 void *
 eal_get_virtual_area(void *requested_addr, size_t *size,
@@ -930,6 +942,109 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
+/* Internal dmabuf info functions */
+int
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	dmabuf_info[list_idx].fd = fd;
+	dmabuf_info[list_idx].offset = offset;
+	return 0;
+}
+
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	return dmabuf_info[list_idx].fd;
+}
+
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS || offset == NULL)
+		return -EINVAL;
+
+	*offset = dmabuf_info[list_idx].offset;
+	return 0;
+}
+
+/* Public dmabuf info API functions */
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_fd_thread_unsafe)
+int
+rte_memseg_list_get_dmabuf_fd_thread_unsafe(const struct rte_memseg_list *msl)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dmabuf_info[msl_idx].fd;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_fd)
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_offset_thread_unsafe)
+int
+rte_memseg_list_get_dmabuf_offset_thread_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL || offset == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	*offset = dmabuf_info[msl_idx].offset;
+	return 0;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_offset)
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_offset_thread_unsafe(msl, offset);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
 RTE_EXPORT_SYMBOL(rte_extmem_register)
 int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
@@ -980,6 +1095,59 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 	return ret;
 }
 
+RTE_EXPORT_SYMBOL(rte_extmem_register_dmabuf)
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	unsigned int socket_id, n;
+	int ret = 0;
+
+	if (va_addr == NULL || page_sz == 0 || len == 0 ||
+			!rte_is_power_of_2(page_sz) ||
+			RTE_ALIGN(len, page_sz) != len ||
+			((len / page_sz) != n_pages && iova_addrs != NULL) ||
+			!rte_is_aligned(va_addr, page_sz) ||
+			dmabuf_fd < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	rte_mcfg_mem_write_lock();
+
+	/* make sure the segment doesn't already exist */
+	if (malloc_heap_find_external_seg(va_addr, len) != NULL) {
+		rte_errno = EEXIST;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* get next available socket ID */
+	socket_id = mcfg->next_socket_id;
+	if (socket_id > INT32_MAX) {
+		EAL_LOG(ERR, "Cannot assign new socket ID's");
+		rte_errno = ENOSPC;
+		ret = -1;
+		goto unlock;
+	}
+
+	/* we can create a new memseg with dma-buf info */
+	n = len / page_sz;
+	if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n,
+			page_sz, "extmem_dmabuf", socket_id,
+			dmabuf_fd, dmabuf_offset) == NULL) {
+		ret = -1;
+		goto unlock;
+	}
+
+	/* memseg list successfully created - increment next socket ID */
+	mcfg->next_socket_id++;
+unlock:
+	rte_mcfg_mem_write_unlock();
+	return ret;
+}
+
 RTE_EXPORT_SYMBOL(rte_extmem_unregister)
 int
 rte_extmem_unregister(void *va_addr, size_t len)
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index 0c267066d9..bb2cfa0717 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -90,6 +90,27 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd);
 int
 eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
 
+/*
+ * Set dmabuf info for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset);
+
+/*
+ * Get dmabuf fd for a memseg list.
+ * Returns fd (>= 0) on success, -1 if not dmabuf backed, -errno on error.
+ */
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx);
+
+/*
+ * Get dmabuf offset for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset);
+
 int
 eal_memalloc_init(void)
 	__rte_requires_shared_capability(rte_mcfg_mem_get_lock());
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index 39240c261c..fd0376d13b 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -1232,6 +1232,33 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	msl->version = 0;
 	msl->external = 1;
 
+	/* initialize dmabuf info to "not dmabuf backed" */
+	eal_memseg_list_set_dmabuf_info(i, -1, 0);
+
+	return msl;
+}
+
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int msl_idx;
+
+	/* Create the base external segment */
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, seg_name, socket_id);
+	if (msl == NULL)
+		return NULL;
+
+	/* Get memseg list index */
+	msl_idx = msl - mcfg->memsegs;
+
+	/* Set dma-buf info in the internal side-table */
+	eal_memseg_list_set_dmabuf_info(msl_idx, dmabuf_fd, dmabuf_offset);
+
 	return msl;
 }
 
diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
index dfc56d4ae3..87525d1a68 100644
--- a/lib/eal/common/malloc_heap.h
+++ b/lib/eal/common/malloc_heap.h
@@ -51,6 +51,11 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset);
+
 struct rte_memseg_list *
 malloc_heap_find_external_seg(void *va_addr, size_t len);
 
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index b6e97ad695..d1c2fc8aa5 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -405,6 +405,82 @@ int
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
+
+/**
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+int
+rte_memseg_list_get_dmabuf_fd_thread_unsafe(const struct rte_memseg_list *msl);
+
+/**
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
+/**
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+int
+rte_memseg_list_get_dmabuf_offset_thread_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
 /**
  * Register external memory chunk with DPDK.
  *
@@ -443,6 +519,55 @@ int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz);
 
+/**
+ * Register external memory chunk backed by a dma-buf with DPDK.
+ *
+ * This is similar to rte_extmem_register() but additionally stores dma-buf
+ * file descriptor information, allowing drivers to use dma-buf based
+ * memory registration (e.g., ibv_reg_dmabuf_mr for RDMA devices).
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves via rte_dev_dma_map().
+ *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
+ * @param va_addr
+ *   Start of virtual area to register (mmap'd address of the dma-buf).
+ *   Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ *   This is independent of dmabuf_offset.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf where the registered region starts.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Unregister external memory chunk with DPDK.
  *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] common/mlx5: support dmabuf
  2026-01-27 17:44 [PATCH 0/2] support dmabuf Cliff Burdick
  2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
@ 2026-01-27 17:44 ` Cliff Burdick
  2026-01-27 19:21   ` [REVIEW] " Stephen Hemminger
  2026-01-29  1:51   ` [PATCH 2/2] " Stephen Hemminger
  2026-01-28  0:04 ` [PATCH 0/2] " Stephen Hemminger
  2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
  3 siblings, 2 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-01-27 17:44 UTC (permalink / raw)
  To: dev
  Cc: anatoly.burakov, Dariusz Sosnowski, Viacheslav Ovsiienko,
	Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 +++++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 +++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  28 ++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 108 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 ++-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 10 files changed, 229 insertions(+), 8 deletions(-)

diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 3767e7a69b..8e83104165 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -203,6 +203,8 @@ has_sym_args = [
             'mlx5dv_dr_domain_allow_duplicate_rules' ],
         [ 'HAVE_MLX5_IBV_REG_MR_IOVA', 'infiniband/verbs.h',
             'ibv_reg_mr_iova' ],
+        [ 'HAVE_IBV_REG_DMABUF_MR', 'infiniband/verbs.h',
+            'ibv_reg_dmabuf_mr' ],
         [ 'HAVE_MLX5_IBV_IMPORT_CTX_PD_AND_MR', 'infiniband/verbs.h',
             'ibv_import_device' ],
         [ 'HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index 98260df470..f6d18fd5df 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -129,6 +129,47 @@ mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 	return 0;
 }
 
+/**
+ * Register mr for dma-buf backed memory. Given protection domain pointer,
+ * dma-buf fd, offset and length, register the memory region.
+ *
+ * @param[in] pd
+ *   Pointer to protection domain context.
+ * @param[in] offset
+ *   Offset within the dma-buf.
+ * @param[in] length
+ *   Length of the memory to register.
+ * @param[in] fd
+ *   File descriptor of the dma-buf.
+ * @param[out] pmd_mr
+ *   pmd_mr struct set with lkey, address, length and pointer to mr object
+ *
+ * @return
+ *   0 on successful registration, -1 otherwise
+ */
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_common_verbs_reg_dmabuf_mr)
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr)
+{
+	struct ibv_mr *ibv_mr;
+	ibv_mr = mlx5_glue->reg_dmabuf_mr(pd, offset, length, iova, fd,
+					  IBV_ACCESS_LOCAL_WRITE |
+					  (haswell_broadwell_cpu ? 0 :
+					  IBV_ACCESS_RELAXED_ORDERING));
+	if (!ibv_mr)
+		return -1;
+
+	*pmd_mr = (struct mlx5_pmd_mr){
+		.lkey = ibv_mr->lkey,
+		.addr = ibv_mr->addr,
+		.len = ibv_mr->length,
+		.obj = (void *)ibv_mr,
+	};
+	return 0;
+}
+
 /**
  * Deregister mr. Given the mlx5 pmd MR - deregister the MR
  *
@@ -151,13 +192,18 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_common_verbs_reg_mr;
+	*reg_dmabuf_mr_cb = mlx5_common_verbs_reg_dmabuf_mr;
 	*dereg_mr_cb = mlx5_common_verbs_dereg_mr;
 }
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index a91eaa429d..6fac7f2bcd 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -291,6 +291,24 @@ mlx5_glue_reg_mr_iova(struct ibv_pd *pd, void *addr, size_t length,
 #endif
 }
 
+static struct ibv_mr *
+mlx5_glue_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,
+			uint64_t iova, int fd, int access)
+{
+#ifdef HAVE_IBV_REG_DMABUF_MR
+	return ibv_reg_dmabuf_mr(pd, offset, length, iova, fd, access);
+#else
+	(void)pd;
+	(void)offset;
+	(void)length;
+	(void)iova;
+	(void)fd;
+	(void)access;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static struct ibv_mr *
 mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
 {
@@ -1619,6 +1637,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
 	.reg_mr_iova = mlx5_glue_reg_mr_iova,
+	.reg_dmabuf_mr = mlx5_glue_reg_dmabuf_mr,
 	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 81d6b0aaf9..66216d1194 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -219,6 +219,9 @@ struct mlx5_glue {
 	struct ibv_mr *(*reg_mr_iova)(struct ibv_pd *pd, void *addr,
 				      size_t length, uint64_t iova,
 				      int access);
+	struct ibv_mr *(*reg_dmabuf_mr)(struct ibv_pd *pd, uint64_t offset,
+					size_t length, uint64_t iova,
+					int fd, int access);
 	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 84a93e7dbd..0ec59b0122 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -13,6 +13,7 @@
 #include <rte_class.h>
 #include <rte_malloc.h>
 #include <rte_eal_paging.h>
+#include <rte_memory.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
@@ -1125,6 +1126,7 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 	struct mlx5_common_device *dev;
 	struct mlx5_mr_btree *bt;
 	struct mlx5_mr *mr;
+	struct rte_memseg_list *msl;
 
 	dev = to_mlx5_device(rte_dev);
 	if (!dev) {
@@ -1134,8 +1136,30 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 		rte_errno = ENODEV;
 		return -1;
 	}
-	mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
-				SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	/* Check if this is dma-buf backed external memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl != NULL && msl->external) {
+		int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
+		if (dmabuf_fd >= 0) {
+			uint64_t dmabuf_off;
+			/* Get base offset from memseg list */
+			rte_memseg_list_get_dmabuf_offset_thread_unsafe(msl, &dmabuf_off);
+			/* Calculate offset within dmabuf for this specific address */
+			dmabuf_off += ((uintptr_t)addr - (uintptr_t)msl->base_va);
+			/* Use dma-buf MR registration */
+			mr = mlx5_create_mr_ext_dmabuf(dev->pd, (uintptr_t)addr, len,
+						SOCKET_ID_ANY, dmabuf_fd, dmabuf_off,
+						dev->mr_scache.reg_dmabuf_mr_cb);
+		} else {
+			/* Use regular MR registration */
+			mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
+						SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+		}
+	} else {
+		/* Use regular MR registration */
+		mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
+					SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	}
 	if (!mr) {
 		DRV_LOG(WARNING, "Device %s unable to DMA map", rte_dev->name);
 		rte_errno = EINVAL;
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 8ed988dec9..18b8a6eaa5 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -8,6 +8,7 @@
 #include <rte_eal_memconfig.h>
 #include <rte_eal_paging.h>
 #include <rte_errno.h>
+#include <rte_memory.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
@@ -1141,6 +1142,7 @@ mlx5_mr_create_cache(struct mlx5_mr_share_cache *share_cache, int socket)
 {
 	/* Set the reg_mr and dereg_mr callback functions */
 	mlx5_os_set_reg_mr_cb(&share_cache->reg_mr_cb,
+			      &share_cache->reg_dmabuf_mr_cb,
 			      &share_cache->dereg_mr_cb);
 	rte_rwlock_init(&share_cache->rwlock);
 	rte_rwlock_init(&share_cache->mprwlock);
@@ -1221,6 +1223,74 @@ mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 	return mr;
 }
 
+/**
+ * Creates a memory region for dma-buf backed external memory.
+ *
+ * @param pd
+ *   Pointer to pd of a device (net, regex, vdpa,...).
+ * @param addr
+ *   Starting virtual address of memory (mmap'd address).
+ * @param len
+ *   Length of memory segment being mapped.
+ * @param socket_id
+ *   Socket to allocate heap memory for the control structures.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf.
+ * @param reg_dmabuf_mr_cb
+ *   Callback function for dma-buf MR registration.
+ *
+ * @return
+ *   Pointer to MR structure on success, NULL otherwise.
+ */
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb)
+{
+	struct mlx5_mr *mr = NULL;
+
+	if (reg_dmabuf_mr_cb == NULL) {
+		DRV_LOG(WARNING, "dma-buf MR registration not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	mr = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO,
+			 RTE_ALIGN_CEIL(sizeof(*mr), RTE_CACHE_LINE_SIZE),
+			 RTE_CACHE_LINE_SIZE, socket_id);
+	if (mr == NULL)
+		return NULL;
+	if (reg_dmabuf_mr_cb(pd, dmabuf_offset, len, addr, dmabuf_fd,
+			     &mr->pmd_mr) < 0) {
+		DRV_LOG(WARNING,
+			"Fail to create dma-buf MR for address (%p) fd=%d",
+			(void *)addr, dmabuf_fd);
+		mlx5_free(mr);
+		return NULL;
+	}
+	mr->msl = NULL; /* Mark it is external memory. */
+	mr->ms_bmp = NULL;
+	mr->ms_n = 1;
+	mr->ms_bmp_n = 1;
+	/*
+	 * For dma-buf MR, the returned addr may be NULL since there's no VA
+	 * in the registration. Store the user-provided addr for cache lookup.
+	 */
+	if (mr->pmd_mr.addr == NULL)
+		mr->pmd_mr.addr = (void *)addr;
+	if (mr->pmd_mr.len == 0)
+		mr->pmd_mr.len = len;
+	DRV_LOG(DEBUG,
+		"MR CREATED (%p) for dma-buf external memory %p (fd=%d):\n"
+		"  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+		" lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+		(void *)mr, (void *)addr, dmabuf_fd,
+		addr, addr + len, rte_cpu_to_be_32(mr->pmd_mr.lkey),
+		mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+	return mr;
+}
+
 /**
  * Callback for memory free event. Iterate freed memsegs and check whether it
  * belongs to an existing MR. If found, clear the bit from bitmap of MR. As a
@@ -1747,9 +1817,43 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		struct mlx5_mempool_mr *mr = &new_mpr->mrs[i];
 		const struct mlx5_range *range = &ranges[i];
 		size_t len = range->end - range->start;
+		struct rte_memseg_list *msl;
+		int reg_result;
+
+		/* Check if this is dma-buf backed external memory */
+		msl = rte_mem_virt2memseg_list((void *)range->start);
+		if (msl != NULL && msl->external &&
+		    share_cache->reg_dmabuf_mr_cb != NULL) {
+			int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
+			if (dmabuf_fd >= 0) {
+				uint64_t dmabuf_off;
+				/* Get base offset from memseg list */
+				rte_memseg_list_get_dmabuf_offset_thread_unsafe(msl, &dmabuf_off);
+				/* Calculate offset within dmabuf for this specific range */
+				dmabuf_off += (range->start - (uintptr_t)msl->base_va);
+				/* Use dma-buf MR registration */
+				reg_result = share_cache->reg_dmabuf_mr_cb(pd,
+					dmabuf_off, len, range->start, dmabuf_fd,
+					&mr->pmd_mr);
+				if (reg_result == 0) {
+					/* For dma-buf MR, set addr if not set by driver */
+					if (mr->pmd_mr.addr == NULL)
+						mr->pmd_mr.addr = (void *)range->start;
+					if (mr->pmd_mr.len == 0)
+						mr->pmd_mr.len = len;
+				}
+			} else {
+				/* Use regular MR registration */
+				reg_result = share_cache->reg_mr_cb(pd,
+					(void *)range->start, len, &mr->pmd_mr);
+			}
+		} else {
+			/* Use regular MR registration */
+			reg_result = share_cache->reg_mr_cb(pd,
+				(void *)range->start, len, &mr->pmd_mr);
+		}
 
-		if (share_cache->reg_mr_cb(pd, (void *)range->start, len,
-		    &mr->pmd_mr) < 0) {
+		if (reg_result < 0) {
 			DRV_LOG(ERR,
 				"Failed to create an MR in PD %p for address range "
 				"[0x%" PRIxPTR ", 0x%" PRIxPTR "] (%zu bytes) for mempool %s",
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index cf7c685e9b..3b967b1323 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -35,6 +35,9 @@ struct mlx5_pmd_mr {
  */
 typedef int (*mlx5_reg_mr_t)(void *pd, void *addr, size_t length,
 			     struct mlx5_pmd_mr *pmd_mr);
+typedef int (*mlx5_reg_dmabuf_mr_t)(void *pd, uint64_t offset, size_t length,
+				    uint64_t iova, int fd,
+				    struct mlx5_pmd_mr *pmd_mr);
 typedef void (*mlx5_dereg_mr_t)(struct mlx5_pmd_mr *pmd_mr);
 
 /* Memory Region object. */
@@ -87,6 +90,7 @@ struct __rte_packed_begin mlx5_mr_share_cache {
 	struct mlx5_mr_list mr_free_list; /* Freed MR list. */
 	struct mlx5_mempool_reg_list mempool_reg_list; /* Mempool database. */
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 } __rte_packed_end;
 
@@ -233,6 +237,10 @@ mlx5_mr_lookup_list(struct mlx5_mr_share_cache *share_cache,
 struct mlx5_mr *
 mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 		   mlx5_reg_mr_t reg_mr_cb);
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb);
 void mlx5_mr_free(struct mlx5_mr *mr, mlx5_dereg_mr_t dereg_mr_cb);
 __rte_internal
 uint32_t
@@ -251,12 +259,19 @@ int
 mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 			 struct mlx5_pmd_mr *pmd_mr);
 __rte_internal
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr);
+__rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
 __rte_internal
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb);
 
 __rte_internal
 int
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c b/drivers/common/mlx5/windows/mlx5_common_os.c
index 7fac361460..5e284742ab 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -17,6 +17,7 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_malloc.h"
+#include "mlx5_common_mr.h"
 
 /**
  * Initialization routine for run-time dependency on external lib.
@@ -442,15 +443,20 @@ mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func (NULL on Windows - not supported)
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  *
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_os_reg_mr;
+	*reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */
 	*dereg_mr_cb = mlx5_os_dereg_mr;
 }
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index f9f127e9e6..b2712c9a8d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -41,6 +41,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 89f32c7722..380689cfeb 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -1186,7 +1186,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
-	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb,  &priv->reg_dmabuf_mr_cb,
+			&priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
 	if (mlx5_crypto_is_ipsec_opt(priv)) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [REVIEW] common/mlx5: support dmabuf
  2026-01-27 17:44 ` [PATCH 2/2] common/mlx5: " Cliff Burdick
@ 2026-01-27 19:21   ` Stephen Hemminger
  2026-01-28 14:30     ` David Marchand
  2026-02-03 17:34     ` Cliff Burdick
  2026-01-29  1:51   ` [PATCH 2/2] " Stephen Hemminger
  1 sibling, 2 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-27 19:21 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

AI-generated review of bundle-1701-dmabuf.mbox
Reviewed using Claude (claude-opus-4-5-20251101)

This is an automated review. Please verify all suggestions.

---

# DPDK Patch Review: dmabuf Support

## Summary
This patch series adds dmabuf (DMA buffer) support to DPDK EAL and the MLX5 driver, enabling DMA transfers between drivers without proprietary kernel modules.

---

## Patch 1/2: eal: support dmabuf

### Commit Message Issues

**Warning: Subject line format**
- Subject "eal: support dmabuf" is acceptable but could be more descriptive
- Consider: "eal: add dmabuf external memory registration support"

**Warning: Body contains questions to reviewers**
The commit message contains design questions that should be resolved before submission:
```
Which option is preferred?
```
Remove these questions and state the chosen design approach clearly.

**Info: Body line length**
Some lines in the body exceed 75 characters but are within acceptable range.

### Code Issues

**Error: Double blank line**
```c
} dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
	[0 ... RTE_MAX_MEMSEG_LISTS - 1] = { .fd = -1, .offset = 0 }
};


#define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5
```
Remove the extra blank line after the struct initialization.

**Warning: Inconsistent indentation in struct**
```c
static struct {
		int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
		uint64_t offset; /**< offset within dmabuf */
	} dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
```
The struct members are double-indented with tabs. Should use single tab for consistency:
```c
static struct {
	int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
	uint64_t offset; /**< offset within dmabuf */
} dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
```

**Error: New public APIs missing `__rte_experimental`**
All new public API functions in `rte_memory.h` must be marked as experimental:
- `rte_memseg_list_get_dmabuf_fd()`
- `rte_memseg_list_get_dmabuf_fd_thread_unsafe()`
- `rte_memseg_list_get_dmabuf_offset()`
- `rte_memseg_list_get_dmabuf_offset_thread_unsafe()`
- `rte_extmem_register_dmabuf()`

Add `__rte_experimental` on the line before each function declaration in the header:
```c
__rte_experimental
int
rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
```

**Warning: Missing release notes**
New API functions require documentation in `doc/guides/rel_notes/release_XX_YY.rst`.

**Warning: Missing version.map updates**
New exported symbols need to be added to `lib/eal/version.map` under the `EXPERIMENTAL` section.

**Warning: Missing testpmd hooks and functional tests**
New APIs should have tests in `app/test/` and hooks in `app/testpmd` per guidelines.

**Info: Doxygen style**
The Doxygen comments use `dma-buf` inconsistently (sometimes `dmabuf`, sometimes `dma-buf`). Consider standardizing to one form throughout.

**Warning: Variable `n` unused context**
```c
	n = len / page_sz;
	if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n,
```
The variable `n` is computed but the result from `len / page_sz` could be used directly since it's only used once.

---

## Patch 2/2: common/mlx5: support dmabuf

### Commit Message Issues

**Info: Subject is acceptable**
"common/mlx5: support dmabuf" follows the format guidelines.

### Code Issues

**Warning: Long lines exceed 100 characters**
Several lines in `mlx5_common.c` and `mlx5_common_mr.c` exceed the 100-character limit:

```c
			mr = mlx5_create_mr_ext_dmabuf(dev->pd, (uintptr_t)addr, len,
						SOCKET_ID_ANY, dmabuf_fd, dmabuf_off,
						dev->mr_scache.reg_dmabuf_mr_cb);
```
Line 1149: `mr = mlx5_create_mr_ext_dmabuf(dev->pd, (uintptr_t)addr, len,` - OK
But the continuation could be better formatted.

In `mlx5_common_mr.c`:
```c
				int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
```
This line likely exceeds 100 characters. Break it up:
```c
				int dmabuf_fd;
				dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
```

**Warning: Missing `__rte_internal` attribute**
The function `mlx5_create_mr_ext_dmabuf()` is declared in `mlx5_common_mr.h` but lacks `__rte_internal`:
```c
struct mlx5_mr *
mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
			  int dmabuf_fd, uint64_t dmabuf_offset,
			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb);
```
If this is internal to the driver, add `__rte_internal` on the line before.

**Error: Missing RTE_EXPORT_INTERNAL_SYMBOL for mlx5_create_mr_ext_dmabuf**
The function `mlx5_create_mr_ext_dmabuf()` in `mlx5_common_mr.c` is missing the export macro if it's meant to be used by other mlx5 components.

**Warning: Brace style inconsistency**
```c
		if (dmabuf_fd >= 0) {
			uint64_t dmabuf_off;
```
This is fine, but ensure consistency throughout.

**Info: Code duplication**
The dmabuf detection and registration logic is duplicated between `mlx5_common_dev_dma_map()` and `mlx5_mr_mempool_register_primary()`. Consider extracting to a helper function.

**Warning: Potential NULL dereference**
In `mlx5_common.c`:
```c
	msl = rte_mem_virt2memseg_list(addr);
	if (msl != NULL && msl->external) {
		int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
```
The `rte_memseg_list_get_dmabuf_fd_thread_unsafe()` call happens without holding the memory lock, but the comment in the header says it's only safe within memory-related callbacks. This may be a race condition if memory configuration changes.

**Warning: Missing error check**
```c
				rte_memseg_list_get_dmabuf_offset_thread_unsafe(msl, &dmabuf_off);
```
The return value is not checked. If the function fails, `dmabuf_off` may be uninitialized.

**Info: Windows stub**
```c
	*reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */
```
Good practice to document platform limitations.

---

## General Issues

**Warning: Missing documentation**
- No updates to programmer's guide documentation explaining dmabuf usage
- No updates to release notes

**Warning: ABI considerations**
The patch mentions avoiding ABI changes by using a side-table for dmabuf info. This design decision should be documented more clearly, and the commit message questions should be resolved.

**Info: Linux-only feature**
The dmabuf feature is Linux-specific. Consider adding explicit documentation about platform support.

---

## Checklist Summary

| Item | Status |
|------|--------|
| SPDX license | ✓ OK (existing files) |
| Signed-off-by | ✓ Present |
| Subject line length | ✓ OK |
| Subject format | ✓ OK |
| Body line wrap | ⚠ Some lines long |
| `__rte_experimental` on new APIs | ✗ Missing |
| Release notes | ✗ Missing |
| version.map updates | ✗ Missing |
| Tests | ✗ Missing |
| Documentation | ✗ Missing |
| Code style | ⚠ Minor issues |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] support dmabuf
  2026-01-27 17:44 [PATCH 0/2] support dmabuf Cliff Burdick
  2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
  2026-01-27 17:44 ` [PATCH 2/2] common/mlx5: " Cliff Burdick
@ 2026-01-28  0:04 ` Stephen Hemminger
  2026-02-03 17:18   ` Cliff Burdick
  2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
  3 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-28  0:04 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev, anatoly.burakov

On Tue, 27 Jan 2026 17:44:07 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> Add support for kernel dmabuf feature and integrate it in the mlx5 driver. 
> This feature is needed to support GPUDirect on newer kernels.
> 
> Cliff Burdick (2):
>   eal: support dmabuf
>   common/mlx5: support dmabuf
> 
>  .mailmap                                      |   1 +
>  drivers/common/mlx5/linux/meson.build         |   2 +
>  drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
>  drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
>  drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
>  drivers/common/mlx5/mlx5_common.c             |  28 ++-
>  drivers/common/mlx5/mlx5_common_mr.c          | 108 ++++++++++-
>  drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
>  drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
>  drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
>  drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
>  lib/eal/common/eal_common_memory.c            | 168 ++++++++++++++++++
>  lib/eal/common/eal_memalloc.h                 |  21 +++
>  lib/eal/common/malloc_heap.c                  |  27 +++
>  lib/eal/common/malloc_heap.h                  |   5 +
>  lib/eal/include/rte_memory.h                  | 125 +++++++++++++
>  16 files changed, 576 insertions(+), 8 deletions(-)
> 

Build fails (on MSVC) fix and resubmit.

"cl" "-Ilib\librte_eal.a.p" "-Ilib" "-I..\lib" "-Ilib\eal\common" "-I..\lib\eal\common" "-I." "-I.." "-Iconfig" "-I..\config" "-Ilib\eal\include" "-I..\lib\eal\include" "-Ilib\eal\windows\include" "-I..\lib\eal\windows\include" "-Ilib\eal\x86\include" "-I..\lib\eal\x86\include" "-Ilib\eal" "-I..\lib\eal" "-Ilib\argparse" "-I..\lib\argparse" "-Ilib\log" "-I..\lib\log" "-Ilib\kvargs" "-I..\lib\kvargs" "/MD" "/nologo" "/showIncludes" "/utf-8" "/W3" "/WX" "/std:c11" "/O2" "/Gw" "/wd4244" "/wd4267" "/wd4146" "/experimental:c11atomics" "/d1experimental:typeof" "/experimental:statementExpressions" "/FI" "rte_config.h" "-D_GNU_SOURCE" "-D_WIN32_WINNT=0x0A00" "-DALLOW_EXPERIMENTAL_API" "-DALLOW_INTERNAL_API" "-DABI_VERSION=\"26.1\"" "-DRTE_LOG_DEFAULT_LOGTYPE=lib.eal" "/Fdlib\librte_eal.a.p\eal_common_eal_common_memory.c.pdb" /Folib/librte_eal.a.p/eal_common_eal_common_memory.c.obj "/c" ../lib/eal/common/eal_common_memory.c
../lib/eal/common/eal_common_memory.c(56): error C2143: syntax error: missing ']' before '...'
../lib/eal/common/eal_common_memory.c(56): error C2059: syntax error: '...'
../lib/eal/common/eal_common_memory.c(57): error C2059: syntax error: '}'

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REVIEW] common/mlx5: support dmabuf
  2026-01-27 19:21   ` [REVIEW] " Stephen Hemminger
@ 2026-01-28 14:30     ` David Marchand
  2026-01-28 17:10       ` Stephen Hemminger
  2026-01-28 17:43       ` Stephen Hemminger
  2026-02-03 17:34     ` Cliff Burdick
  1 sibling, 2 replies; 27+ messages in thread
From: David Marchand @ 2026-01-28 14:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hello Stephen,

On Tue, 27 Jan 2026 at 20:22, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> AI-generated review of bundle-1701-dmabuf.mbox
> Reviewed using Claude (claude-opus-4-5-20251101)
>
> This is an automated review. Please verify all suggestions.
>
> ---
>
> # DPDK Patch Review: dmabuf Support
>
> ## Summary
> This patch series adds dmabuf (DMA buffer) support to DPDK EAL and the MLX5 driver, enabling DMA transfers between drivers without proprietary kernel modules.
>
[snip]

> **Warning: Missing version.map updates**
> New exported symbols need to be added to `lib/eal/version.map` under the `EXPERIMENTAL` section.

I noticed similar comments on other series.
There is no version.map update needed anymore, since v25.07.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REVIEW] common/mlx5: support dmabuf
  2026-01-28 14:30     ` David Marchand
@ 2026-01-28 17:10       ` Stephen Hemminger
  2026-01-28 17:43       ` Stephen Hemminger
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-28 17:10 UTC (permalink / raw)
  To: David Marchand; +Cc: dev

On Wed, 28 Jan 2026 15:30:17 +0100
David Marchand <david.marchand@redhat.com> wrote:

> Hello Stephen,
> 
> On Tue, 27 Jan 2026 at 20:22, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > AI-generated review of bundle-1701-dmabuf.mbox
> > Reviewed using Claude (claude-opus-4-5-20251101)
> >
> > This is an automated review. Please verify all suggestions.
> >
> > ---
> >
> > # DPDK Patch Review: dmabuf Support
> >
> > ## Summary
> > This patch series adds dmabuf (DMA buffer) support to DPDK EAL and the MLX5 driver, enabling DMA transfers between drivers without proprietary kernel modules.
> >  
> [snip]
> 
> > **Warning: Missing version.map updates**
> > New exported symbols need to be added to `lib/eal/version.map` under the `EXPERIMENTAL` section.  
> 
> I noticed similar comments on other series.
> There is no version.map update needed anymore, since v25.07.
> 
> 

I know, there is nothing in the AGENTS file about it, not sure where that neuron is coming from.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [REVIEW] common/mlx5: support dmabuf
  2026-01-28 14:30     ` David Marchand
  2026-01-28 17:10       ` Stephen Hemminger
@ 2026-01-28 17:43       ` Stephen Hemminger
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-28 17:43 UTC (permalink / raw)
  To: David Marchand; +Cc: dev

On Wed, 28 Jan 2026 15:30:17 +0100
David Marchand <david.marchand@redhat.com> wrote:

> Hello Stephen,
> 
> On Tue, 27 Jan 2026 at 20:22, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > AI-generated review of bundle-1701-dmabuf.mbox
> > Reviewed using Claude (claude-opus-4-5-20251101)
> >
> > This is an automated review. Please verify all suggestions.
> >
> > ---
> >
> > # DPDK Patch Review: dmabuf Support
> >
> > ## Summary
> > This patch series adds dmabuf (DMA buffer) support to DPDK EAL and the MLX5 driver, enabling DMA transfers between drivers without proprietary kernel modules.
> >  
> [snip]
> 
> > **Warning: Missing version.map updates**
> > New exported symbols need to be added to `lib/eal/version.map` under the `EXPERIMENTAL` section.  
> 
> I noticed similar comments on other series.
> There is no version.map update needed anymore, since v25.07.
> 
> 

I got AI to fix itself :-)

Now I understand the issue. The AGENTS.md says "New external functions must be exported properly" but doesn't explain the current mechanism. The AI is filling in with outdated knowledge about version.map files.

DPDK has moved to automatic symbol map generation using export macros. 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] eal: support dmabuf
  2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
@ 2026-01-29  1:48   ` Stephen Hemminger
  2026-01-29  1:51   ` Stephen Hemminger
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-29  1:48 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev, anatoly.burakov, Thomas Monjalon

On Tue, 27 Jan 2026 17:44:08 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> +		int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
> +		uint64_t offset; /**< offset within dmabuf */
> +	} dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
> +	[0 ... RTE_MAX_MEMSEG_LISTS - 1] = { .fd = -1, .offset = 0 }
> +};
> +

Range initializer are a GCC extension not available in MSVC.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] eal: support dmabuf
  2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
  2026-01-29  1:48   ` Stephen Hemminger
@ 2026-01-29  1:51   ` Stephen Hemminger
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-29  1:51 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev, anatoly.burakov, Thomas Monjalon

On Tue, 27 Jan 2026 17:44:08 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> +/**
> + * Get dma-buf file descriptor associated with a memseg list.
> + *
> + * @note This function does not perform any locking, and is only safe to call
> + *       from within memory-related callback functions.

Maybe warning instead of note.

> + *
> + * @param msl
> + *   A pointer to memseg list for which to get dma-buf fd.
> + *
> + * @return
> + *   Valid dma-buf file descriptor (>= 0) in case of success.
> + *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
> + *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
> + */
> +int
> +rte_memseg_list_get_dmabuf_fd_thread_unsafe(const struct rte_memseg_list *msl);

ENAMETOOLONG

At some point, you need to come up with a better naming convention.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] common/mlx5: support dmabuf
  2026-01-27 17:44 ` [PATCH 2/2] common/mlx5: " Cliff Burdick
  2026-01-27 19:21   ` [REVIEW] " Stephen Hemminger
@ 2026-01-29  1:51   ` Stephen Hemminger
  1 sibling, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-01-29  1:51 UTC (permalink / raw)
  To: Cliff Burdick
  Cc: dev, anatoly.burakov, Dariusz Sosnowski, Viacheslav Ovsiienko,
	Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad

On Tue, 27 Jan 2026 17:44:09 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> +static struct ibv_mr *
> +mlx5_glue_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,
> +			uint64_t iova, int fd, int access)
> +{
> +#ifdef HAVE_IBV_REG_DMABUF_MR
> +	return ibv_reg_dmabuf_mr(pd, offset, length, iova, fd, access);
> +#else
> +	(void)pd;
> +	(void)offset;
> +	(void)length;
> +	(void)iova;
> +	(void)fd;
> +	(void)access;
> +	errno = ENOTSUP;
> +	return NULL;
> +#endif
> +}

I would prefer the callback hook did not exist (was NULL)
if you don't have your #ifdef.

The (void) change looks messy and better handled by caller.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 0/2] support dmabuf
  2026-01-28  0:04 ` [PATCH 0/2] " Stephen Hemminger
@ 2026-02-03 17:18   ` Cliff Burdick
  0 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 17:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org, anatoly.burakov@intel.com



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org> 
> Sent: Tuesday, January 27, 2026 4:04 PM
> To: Cliff Burdick <cburdick@nvidia.com>
> Cc: dev@dpdk.org; anatoly.burakov@intel.com
> Subject: Re: [PATCH 0/2] support dmabuf
>
> External email: Use caution opening links or attachments
>
>
> On Tue, 27 Jan 2026 17:44:07 +0000
> Cliff Burdick <cburdick@nvidia.com> wrote:
>
> > Add support for kernel dmabuf feature and integrate it in the mlx5 driver.
> >  This feature is needed to support GPUDirect on newer kernels.
> >
> >  Cliff Burdick (2):
> >   eal: support dmabuf
> >   common/mlx5: support dmabuf
> >
> >  .mailmap                                      |   1 +
> >  drivers/common/mlx5/linux/meson.build         |   2 +
> >  drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
> >  drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
> >  drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
> >  drivers/common/mlx5/mlx5_common.c             |  28 ++-
> >  drivers/common/mlx5/mlx5_common_mr.c          | 108 ++++++++++-
> >  drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
> >  drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
> >  drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
> >  drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
> >  lib/eal/common/eal_common_memory.c            | 168 ++++++++++++++++++
> >  lib/eal/common/eal_memalloc.h                 |  21 +++
> >  lib/eal/common/malloc_heap.c                  |  27 +++
> >  lib/eal/common/malloc_heap.h                  |   5 +
> >  lib/eal/include/rte_memory.h                  | 125 +++++++++++++
> >  16 files changed, 576 insertions(+), 8 deletions(-)
> >
>
> Build fails (on MSVC) fix and resubmit.
>
> "cl" "-Ilib\librte_eal.a.p" "-Ilib" "-I..\lib" "-Ilib\eal\common" "-I..\lib\eal\common" "-I." "-I.." "-Iconfig" "-I..\config" "-Ilib\eal\include" "-I..\lib\eal\include" "-Ilib\eal\windows\include" "-I..\lib\eal\windows\include" "-Ilib\eal\x86\include" "-I..\lib\eal\x86\include" "-Ilib\eal" "-I..\lib\eal" "-Ilib\argparse" "-> I..\lib\argparse" "-Ilib\log" "-I..\lib\log" "-Ilib\kvargs" "-I..\lib\kvargs" "/MD" "/nologo" "/showIncludes" "/utf-8" "/W3" "/WX" "/std:c11" "/O2" "/Gw" "/wd4244" "/wd4267" "/wd4146" "/experimental:c11atomics" "/d1experimental:typeof" "/experimental:statementExpressions" "/FI" "rte_config.h" "-> D_GNU_SOURCE" "-D_WIN32_WINNT=0x0A00" "-DALLOW_EXPERIMENTAL_API" "-DALLOW_INTERNAL_API" "-DABI_VERSION=\"26.1\"" "-DRTE_LOG_DEFAULT_LOGTYPE=lib.eal" "/Fdlib\librte_eal.a.p\eal_common_eal_common_memory.c.pdb" /Folib/librte_eal.a.p/eal_common_eal_common_memory.c.obj > "/c" ../lib/eal/common/eal_common_memory.c
>../lib/eal/common/eal_common_memory.c(56): error C2143: syntax error: missing ']' before '...'
>../lib/eal/common/eal_common_memory.c(56): error C2059: syntax error: '...'
>../lib/eal/common/eal_common_memory.c(57): error C2059: syntax error: '}'

Fixed by moving to an init function

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [REVIEW] common/mlx5: support dmabuf
  2026-01-27 19:21   ` [REVIEW] " Stephen Hemminger
  2026-01-28 14:30     ` David Marchand
@ 2026-02-03 17:34     ` Cliff Burdick
  1 sibling, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 17:34 UTC (permalink / raw)
  To: Stephen Hemminger, dev@dpdk.org

> External email: Use caution opening links or attachments
>  
>  
> AI-generated review of bundle-1701-dmabuf.mbox Reviewed using Claude (claude-opus-4-5-20251101)
>  
> This is an automated review. Please verify all suggestions.
>  
> ---
>  
> # DPDK Patch Review: dmabuf Support
>  
> ## Summary
> This patch series adds dmabuf (DMA buffer) support to DPDK EAL and the MLX5 driver, enabling DMA transfers between drivers without proprietary kernel modules.
>  
> ---
>  
> ## Patch 1/2: eal: support dmabuf
>  
> ### Commit Message Issues
>  
> **Warning: Subject line format**
> - Subject "eal: support dmabuf" is acceptable but could be more descriptive
> - Consider: "eal: add dmabuf external memory registration support"
>  
> **Warning: Body contains questions to reviewers** The commit message contains design questions that should be resolved before submission:
> ```
> Which option is preferred?
> ```
> Remove these questions and state the chosen design approach clearly.
>  
> **Info: Body line length**
> Some lines in the body exceed 75 characters but are within acceptable range.
>  
> ### Code Issues
>  
> **Error: Double blank line**
> ```c
> } dmabuf_info[RTE_MAX_MEMSEG_LISTS] = {
>         [0 ... RTE_MAX_MEMSEG_LISTS - 1] = { .fd = -1, .offset = 0 } };
> 

Fixed

> 
> #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5 ```
> Remove the extra blank line after the struct initialization.
>  
> **Warning: Inconsistent indentation in struct**
> ```c static struct {
>                 int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
>                 uint64_t offset; /**< offset within dmabuf */
>         } dmabuf_info[RTE_MAX_MEMSEG_LISTS] = { ```
> The struct members are double-indented with tabs. Should use single tab for consistency:
> ```c
> static struct {
>         int fd;          /**< dmabuf fd, -1 if not dmabuf backed */
>         uint64_t offset; /**< offset within dmabuf */ } dmabuf_info[RTE_MAX_MEMSEG_LISTS] = { ```
>

Removed as part of refactoring away from the init struct syntax.
>  
> **Error: New public APIs missing `__rte_experimental`** All new public API functions in `rte_memory.h` must be marked as experimental:
> - `rte_memseg_list_get_dmabuf_fd()`
> - `rte_memseg_list_get_dmabuf_fd_thread_unsafe()`
> - `rte_memseg_list_get_dmabuf_offset()`
> - `rte_memseg_list_get_dmabuf_offset_thread_unsafe()`
> - `rte_extmem_register_dmabuf()`
>  
> Add `__rte_experimental` on the line before each function declaration in the header:
> ```c
> __rte_experimental
> int
> rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl); ```
>

Done

>  
> **Warning: Missing release notes**
> New API functions require documentation in `doc/guides/rel_notes/release_XX_YY.rst`.

Done 

>  
> **Warning: Missing version.map updates**
> New exported symbols need to be added to `lib/eal/version.map` under the `EXPERIMENTAL` section.
>  
> **Warning: Missing testpmd hooks and functional tests**
> New APIs should have tests in `app/test/` and hooks in `app/testpmd` per guidelines.
>  
> **Info: Doxygen style**
> The Doxygen comments use `dma-buf` inconsistently (sometimes `dmabuf`, sometimes `dma-buf`). Consider standardizing to one form throughout.
>  
> **Warning: Variable `n` unused context**
> ```c
>         n = len / page_sz;
>         if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n, ```
> The variable `n` is computed but the result from `len / page_sz` could be used directly since it's only used once.

Fixed as part of refactoring.
>  
> ---
>  
> ## Patch 2/2: common/mlx5: support dmabuf
>  
> ### Commit Message Issues
>  
> **Info: Subject is acceptable**
> "common/mlx5: support dmabuf" follows the format guidelines.
>  
> ### Code Issues
>  
> **Warning: Long lines exceed 100 characters**
> Several lines in `mlx5_common.c` and `mlx5_common_mr.c` exceed the 100-character limit:
>  
> ```c
>                         mr = mlx5_create_mr_ext_dmabuf(dev->pd, (uintptr_t)addr, len,
>                                                 SOCKET_ID_ANY, dmabuf_fd, dmabuf_off,
>                                                 dev->mr_scache.reg_dmabuf_mr_cb); ```
> Line 1149: `mr = mlx5_create_mr_ext_dmabuf(dev->pd, (uintptr_t)addr, len,` - OK
> But the continuation could be better formatted.
>  
> In `mlx5_common_mr.c`:
> ```c
>                                 int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
> ```
> This line likely exceeds 100 characters. Break it up:
> ```c
>                                 int dmabuf_fd;
>                                 dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
> ```
>  
> **Warning: Missing `__rte_internal` attribute**
> The function `mlx5_create_mr_ext_dmabuf()` is declared in `mlx5_common_mr.h` but lacks `__rte_internal`:
> ```c
> struct mlx5_mr *
> mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
>                            int dmabuf_fd, uint64_t dmabuf_offset,
>                            mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb); ```
> If this is internal to the driver, add `__rte_internal` on the line before.
>  
> **Error: Missing RTE_EXPORT_INTERNAL_SYMBOL for mlx5_create_mr_ext_dmabuf**
> The function `mlx5_create_mr_ext_dmabuf()` in `mlx5_common_mr.c` is missing the export macro if it's meant to be used by other mlx5 components.

Fixed

>  
> **Warning: Brace style inconsistency**
> ```c
>                 if (dmabuf_fd >= 0) {
>                         uint64_t dmabuf_off; ```
> This is fine, but ensure consistency throughout.
>  
> **Info: Code duplication**
> The dmabuf detection and registration logic is duplicated between `mlx5_common_dev_dma_map()` and `mlx5_mr_mempool_register_primary()`. Consider extracting to a helper function.
>  
> **Warning: Potential NULL dereference**
> In `mlx5_common.c`:
> ```c
>         msl = rte_mem_virt2memseg_list(addr);
>         if (msl != NULL && msl->external) {
>                 int dmabuf_fd = rte_memseg_list_get_dmabuf_fd_thread_unsafe(msl);
> ```
> The `rte_memseg_list_get_dmabuf_fd_thread_unsafe()` call happens without holding the memory lock, but the comment in the header says it's only safe within memory-related callbacks. This may be a race condition if memory configuration changes.

Switched to safe version of the functions

>  
> **Warning: Missing error check**
> ```c
>                                 rte_memseg_list_get_dmabuf_offset_thread_unsafe(msl, &dmabuf_off); ```
> The return value is not checked. If the function fails, `dmabuf_off` may be uninitialized.

Checked errors

>  
> **Info: Windows stub**
> ```c
>         *reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */ ```
> Good practice to document platform limitations.
>  
> ---
>  
> ## General Issues
>  
> **Warning: Missing documentation**
> - No updates to programmer's guide documentation explaining dmabuf usage
> - No updates to release notes
>  
> **Warning: ABI considerations**
> The patch mentions avoiding ABI changes by using a side-table for dmabuf info. This design decision should be documented more clearly, and the commit message questions should be resolved.
>  
> **Info: Linux-only feature**
> The dmabuf feature is Linux-specific. Consider adding explicit documentation about platform support.
>  
> ---
>  
> ## Checklist Summary
>  
> | Item | Status |
> |------|--------|
> | SPDX license | ✓ OK (existing files) | Signed-off-by | ✓ Present |
> | Subject line length | ✓ OK | Subject format | ✓ OK | Body line wrap |
> | ⚠ Some lines long | `__rte_experimental` on new APIs | ✗ Missing |
> | Release notes | ✗ Missing | version.map updates | ✗ Missing | Tests |
> | ✗ Missing | Documentation | ✗ Missing | Code style | ⚠ Minor issues |

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 0/2] support dmabuf
  2026-01-27 17:44 [PATCH 0/2] support dmabuf Cliff Burdick
                   ` (2 preceding siblings ...)
  2026-01-28  0:04 ` [PATCH 0/2] " Stephen Hemminger
@ 2026-02-03 22:26 ` Cliff Burdick
  2026-02-03 22:26   ` [PATCH v2 1/2] eal: " Cliff Burdick
                     ` (2 more replies)
  3 siblings, 3 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 22:26 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov

Fixed since v1:
* Fixed issue with MSVC compilation
* Fixed style issues from code review

Add support for kernel dmabuf feature and integrate it in the mlx5 driver. 
This feature is needed to support GPUDirect on newer kernels.

Cliff Burdick (2):
  eal: support dmabuf
  common/mlx5: support dmabuf

 .mailmap                                      |   1 +
 doc/guides/rel_notes/release_26_03.rst        |   6 +
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
 lib/eal/common/eal_memalloc.h                 |  21 +++
 lib/eal/common/malloc_heap.c                  |  27 +++
 lib/eal/common/malloc_heap.h                  |   5 +
 lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
 17 files changed, 612 insertions(+), 14 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/2] eal: support dmabuf
  2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
@ 2026-02-03 22:26   ` Cliff Burdick
  2026-02-03 22:26   ` [PATCH v2 2/2] common/mlx5: " Cliff Burdick
  2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
  2 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 22:26 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov, Thomas Monjalon

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

A new api function rte_extmem_register_dmabuf is introduced to create
the mapping from a dmabuf file descriptor. dmabuf uses a file descriptor
and an offset that has been pre-opened with the kernel. The kernel uses
the file descriptor to map to a VA pointer. To avoid ABI changes, a
static struct is used inside of eal_common_memory.c, and lookups are
done on this struct rather than from the rte_memseg_list.

Ideally we would like to add both the dmabuf file descriptor and offset
to rte_memseg_list, but it's not clear if we can reuse existing fields
when using the dmabuf API.

We could rename the external flag to a more generic "properties" flag where
"external" is the lowest bit, then we can use the second bit to indicate the
presence of dmabuf. In the presence of the flag for dmabuf we could
reuse the base_va address field for the dmabuf offset, and the socket_id
for the file descriptor.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                               |   1 +
 doc/guides/rel_notes/release_26_03.rst |   6 +
 lib/eal/common/eal_common_memory.c     | 165 ++++++++++++++++++++++++-
 lib/eal/common/eal_memalloc.h          |  21 ++++
 lib/eal/common/malloc_heap.c           |  27 ++++
 lib/eal/common/malloc_heap.h           |   5 +
 lib/eal/include/rte_memory.h           | 145 ++++++++++++++++++++++
 7 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/.mailmap b/.mailmap
index 2f089326ff..4c2b2f921d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,6 +291,7 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
+Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 15dabee7a1..56457d0382 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added dma-buf-backed external memory support.**
+
+  Added EAL support for registering dma-buf-backed external memory with
+  ``rte_extmem_register_dmabuf``, and enabled mlx5 common code to consume
+  dma-buf mappings for device access.
+
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c62edf5e55..34ebbdc202 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -45,6 +45,15 @@
 static void *next_baseaddr;
 static uint64_t system_page_sz;
 
+/* Internal storage for dma-buf info, indexed by memseg list index.
+ * This keeps dma-buf metadata out of the public rte_memseg_list structure
+ * to preserve ABI compatibility.
+ */
+static struct {
+	int fd;          /**< dma-buf fd, -1 if not dma-buf backed */
+	uint64_t offset; /**< offset within dma-buf */
+} dmabuf_info[RTE_MAX_MEMSEG_LISTS];
+
 #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5
 void *
 eal_get_virtual_area(void *requested_addr, size_t *size,
@@ -232,6 +241,10 @@ eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz,
 {
 	char name[RTE_FBARRAY_NAME_LEN];
 
+	/* Initialize dma-buf info to "not dma-buf backed" */
+	dmabuf_info[type_msl_idx].fd = -1;
+	dmabuf_info[type_msl_idx].offset = 0;
+
 	snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
 		 type_msl_idx);
 
@@ -930,10 +943,113 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
-RTE_EXPORT_SYMBOL(rte_extmem_register)
+/* Internal dma-buf info functions */
 int
-rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
-		unsigned int n_pages, size_t page_sz)
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	dmabuf_info[list_idx].fd = fd;
+	dmabuf_info[list_idx].offset = offset;
+	return 0;
+}
+
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	return dmabuf_info[list_idx].fd;
+}
+
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS || offset == NULL)
+		return -EINVAL;
+
+	*offset = dmabuf_info[list_idx].offset;
+	return 0;
+}
+
+/* Public dma-buf info API functions */
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_fd_unsafe)
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dmabuf_info[msl_idx].fd;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_fd)
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_fd_unsafe(msl);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_offset_unsafe)
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL || offset == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	*offset = dmabuf_info[msl_idx].offset;
+	return 0;
+}
+
+RTE_EXPORT_SYMBOL(rte_memseg_list_get_dmabuf_offset)
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_offset_unsafe(msl, offset);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+static int
+extmem_register(void *va_addr, size_t len,
+	int dmabuf_fd, uint64_t dmabuf_offset,
+	rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int socket_id, n;
@@ -967,10 +1083,19 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 
 	/* we can create a new memseg */
 	n = len / page_sz;
-	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+	if (dmabuf_fd < 0) {
+		if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
 			page_sz, "extmem", socket_id) == NULL) {
-		ret = -1;
-		goto unlock;
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n,
+			page_sz, "extmem_dmabuf", socket_id,
+			dmabuf_fd, dmabuf_offset) == NULL) {
+			ret = -1;
+			goto unlock;
+		}
 	}
 
 	/* memseg list successfully created - increment next socket ID */
@@ -980,6 +1105,34 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 	return ret;
 }
 
+RTE_EXPORT_SYMBOL(rte_extmem_register)
+int
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	return rte_extmem_register_dmabuf(va_addr, len, -1, 0, iova_addrs, n_pages, page_sz);
+}
+
+RTE_EXPORT_SYMBOL(rte_extmem_register_dmabuf)
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	if (dmabuf_fd < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	
+	return extmem_register(va_addr,
+		len,
+		dmabuf_fd,
+		dmabuf_offset,
+		iova_addrs,
+		n_pages,
+		page_sz);
+}
+
 RTE_EXPORT_SYMBOL(rte_extmem_unregister)
 int
 rte_extmem_unregister(void *va_addr, size_t len)
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index 0c267066d9..e7e807ddcb 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -90,6 +90,27 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd);
 int
 eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
 
+/*
+ * Set dma-buf info for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset);
+
+/*
+ * Get dma-buf fd for a memseg list.
+ * Returns fd (>= 0) on success, -1 if not dma-buf backed, -errno on error.
+ */
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx);
+
+/*
+ * Get dma-buf offset for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset);
+
 int
 eal_memalloc_init(void)
 	__rte_requires_shared_capability(rte_mcfg_mem_get_lock());
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index 39240c261c..bf986fe654 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -1232,6 +1232,33 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	msl->version = 0;
 	msl->external = 1;
 
+	/* initialize dma-buf info to "not dma-buf backed" */
+	eal_memseg_list_set_dmabuf_info(i, -1, 0);
+
+	return msl;
+}
+
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int msl_idx;
+
+	/* Create the base external segment */
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, seg_name, socket_id);
+	if (msl == NULL)
+		return NULL;
+
+	/* Get memseg list index */
+	msl_idx = msl - mcfg->memsegs;
+
+	/* Set dma-buf info in the internal side-table */
+	eal_memseg_list_set_dmabuf_info(msl_idx, dmabuf_fd, dmabuf_offset);
+
 	return msl;
 }
 
diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
index dfc56d4ae3..87525d1a68 100644
--- a/lib/eal/common/malloc_heap.h
+++ b/lib/eal/common/malloc_heap.h
@@ -51,6 +51,11 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset);
+
 struct rte_memseg_list *
 malloc_heap_find_external_seg(void *va_addr, size_t len);
 
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index b6e97ad695..4e92897dd9 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -405,6 +405,98 @@ int
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
 /**
  * Register external memory chunk with DPDK.
  *
@@ -443,6 +535,59 @@ int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register external memory chunk backed by a dma-buf file descriptor and offset.
+ *
+ * This is similar to rte_extmem_register() but additionally stores dma-buf
+ * file descriptor information, allowing drivers to use dma-buf based
+ * memory registration (e.g., ibv_reg_dmabuf_mr for RDMA devices).
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves via rte_dev_dma_map().
+ *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
+ * @param va_addr
+ *   Start of virtual area to register (mmap'd address of the dma-buf).
+ *   Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ *   This is independent of dma-buf offset.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf where the registered region starts.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+ __rte_experimental
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Unregister external memory chunk with DPDK.
  *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/2] common/mlx5: support dmabuf
  2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
  2026-02-03 22:26   ` [PATCH v2 1/2] eal: " Cliff Burdick
@ 2026-02-03 22:26   ` Cliff Burdick
  2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
  2 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 22:26 UTC (permalink / raw)
  To: dev
  Cc: anatoly.burakov, Thomas Monjalon, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                                      |   2 +-
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 +++++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 +++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 ++-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 11 files changed, 249 insertions(+), 9 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4c2b2f921d..0a8a67098f 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,8 +291,8 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
-Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
+Cliff Burdick <cburdick@nvidia.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
 Conor Fogarty <conor.fogarty@intel.com>
diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 3767e7a69b..8e83104165 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -203,6 +203,8 @@ has_sym_args = [
             'mlx5dv_dr_domain_allow_duplicate_rules' ],
         [ 'HAVE_MLX5_IBV_REG_MR_IOVA', 'infiniband/verbs.h',
             'ibv_reg_mr_iova' ],
+        [ 'HAVE_IBV_REG_DMABUF_MR', 'infiniband/verbs.h',
+            'ibv_reg_dmabuf_mr' ],
         [ 'HAVE_MLX5_IBV_IMPORT_CTX_PD_AND_MR', 'infiniband/verbs.h',
             'ibv_import_device' ],
         [ 'HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index 98260df470..f6d18fd5df 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -129,6 +129,47 @@ mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 	return 0;
 }
 
+/**
+ * Register mr for dma-buf backed memory. Given protection domain pointer,
+ * dma-buf fd, offset and length, register the memory region.
+ *
+ * @param[in] pd
+ *   Pointer to protection domain context.
+ * @param[in] offset
+ *   Offset within the dma-buf.
+ * @param[in] length
+ *   Length of the memory to register.
+ * @param[in] fd
+ *   File descriptor of the dma-buf.
+ * @param[out] pmd_mr
+ *   pmd_mr struct set with lkey, address, length and pointer to mr object
+ *
+ * @return
+ *   0 on successful registration, -1 otherwise
+ */
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_common_verbs_reg_dmabuf_mr)
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr)
+{
+	struct ibv_mr *ibv_mr;
+	ibv_mr = mlx5_glue->reg_dmabuf_mr(pd, offset, length, iova, fd,
+					  IBV_ACCESS_LOCAL_WRITE |
+					  (haswell_broadwell_cpu ? 0 :
+					  IBV_ACCESS_RELAXED_ORDERING));
+	if (!ibv_mr)
+		return -1;
+
+	*pmd_mr = (struct mlx5_pmd_mr){
+		.lkey = ibv_mr->lkey,
+		.addr = ibv_mr->addr,
+		.len = ibv_mr->length,
+		.obj = (void *)ibv_mr,
+	};
+	return 0;
+}
+
 /**
  * Deregister mr. Given the mlx5 pmd MR - deregister the MR
  *
@@ -151,13 +192,18 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_common_verbs_reg_mr;
+	*reg_dmabuf_mr_cb = mlx5_common_verbs_reg_dmabuf_mr;
 	*dereg_mr_cb = mlx5_common_verbs_dereg_mr;
 }
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index a91eaa429d..6fac7f2bcd 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -291,6 +291,24 @@ mlx5_glue_reg_mr_iova(struct ibv_pd *pd, void *addr, size_t length,
 #endif
 }
 
+static struct ibv_mr *
+mlx5_glue_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,
+			uint64_t iova, int fd, int access)
+{
+#ifdef HAVE_IBV_REG_DMABUF_MR
+	return ibv_reg_dmabuf_mr(pd, offset, length, iova, fd, access);
+#else
+	(void)pd;
+	(void)offset;
+	(void)length;
+	(void)iova;
+	(void)fd;
+	(void)access;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static struct ibv_mr *
 mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
 {
@@ -1619,6 +1637,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
 	.reg_mr_iova = mlx5_glue_reg_mr_iova,
+	.reg_dmabuf_mr = mlx5_glue_reg_dmabuf_mr,
 	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 81d6b0aaf9..66216d1194 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -219,6 +219,9 @@ struct mlx5_glue {
 	struct ibv_mr *(*reg_mr_iova)(struct ibv_pd *pd, void *addr,
 				      size_t length, uint64_t iova,
 				      int access);
+	struct ibv_mr *(*reg_dmabuf_mr)(struct ibv_pd *pd, uint64_t offset,
+					size_t length, uint64_t iova,
+					int fd, int access);
 	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 84a93e7dbd..82cf17ca78 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -13,6 +13,7 @@
 #include <rte_class.h>
 #include <rte_malloc.h>
 #include <rte_eal_paging.h>
+#include <rte_memory.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
@@ -1125,6 +1126,7 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 	struct mlx5_common_device *dev;
 	struct mlx5_mr_btree *bt;
 	struct mlx5_mr *mr;
+	struct rte_memseg_list *msl;
 
 	dev = to_mlx5_device(rte_dev);
 	if (!dev) {
@@ -1134,8 +1136,44 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 		rte_errno = ENODEV;
 		return -1;
 	}
-	mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
-				SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	/* Check if this is dma-buf backed external memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl != NULL && msl->external) {
+		int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+		if (dmabuf_fd >= 0) {
+			uint64_t dmabuf_off;
+			/* Get base offset from memseg list */
+			int ret = rte_memseg_list_get_dmabuf_offset(
+				msl, &dmabuf_off);
+			if (ret < 0) {
+				DRV_LOG(ERR,
+					"Failed to get dma-buf offset for memseg list %p",
+					(void *)msl);
+				return -1;
+			}
+			/* Calculate offset within dma-buf address */
+			dmabuf_off += ((uintptr_t)addr - (uintptr_t)msl->base_va);
+			/* Use dma-buf MR registration */
+			mr = mlx5_create_mr_ext_dmabuf(dev->pd,
+						       (uintptr_t)addr,
+						       len,
+						       SOCKET_ID_ANY,
+						       dmabuf_fd,
+						       dmabuf_off,
+						       dev->mr_scache.reg_dmabuf_mr_cb);
+		} else {
+			/* Use regular MR registration */
+			mr = mlx5_create_mr_ext(dev->pd,
+						(uintptr_t)addr,
+						len,
+						SOCKET_ID_ANY,
+						dev->mr_scache.reg_mr_cb);
+		}
+	} else {
+		/* Use regular MR registration */
+		mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
+					SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	}
 	if (!mr) {
 		DRV_LOG(WARNING, "Device %s unable to DMA map", rte_dev->name);
 		rte_errno = EINVAL;
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 8ed988dec9..8f31eaefe8 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -8,6 +8,7 @@
 #include <rte_eal_memconfig.h>
 #include <rte_eal_paging.h>
 #include <rte_errno.h>
+#include <rte_memory.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
@@ -1141,6 +1142,7 @@ mlx5_mr_create_cache(struct mlx5_mr_share_cache *share_cache, int socket)
 {
 	/* Set the reg_mr and dereg_mr callback functions */
 	mlx5_os_set_reg_mr_cb(&share_cache->reg_mr_cb,
+			      &share_cache->reg_dmabuf_mr_cb,
 			      &share_cache->dereg_mr_cb);
 	rte_rwlock_init(&share_cache->rwlock);
 	rte_rwlock_init(&share_cache->mprwlock);
@@ -1221,6 +1223,74 @@ mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 	return mr;
 }
 
+/**
+ * Creates a memory region for dma-buf backed external memory.
+ *
+ * @param pd
+ *   Pointer to pd of a device (net, regex, vdpa,...).
+ * @param addr
+ *   Starting virtual address of memory (mmap'd address).
+ * @param len
+ *   Length of memory segment being mapped.
+ * @param socket_id
+ *   Socket to allocate heap memory for the control structures.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf.
+ * @param reg_dmabuf_mr_cb
+ *   Callback function for dma-buf MR registration.
+ *
+ * @return
+ *   Pointer to MR structure on success, NULL otherwise.
+ */
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb)
+{
+	struct mlx5_mr *mr = NULL;
+
+	if (reg_dmabuf_mr_cb == NULL) {
+		DRV_LOG(WARNING, "dma-buf MR registration not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	mr = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO,
+			 RTE_ALIGN_CEIL(sizeof(*mr), RTE_CACHE_LINE_SIZE),
+			 RTE_CACHE_LINE_SIZE, socket_id);
+	if (mr == NULL)
+		return NULL;
+	if (reg_dmabuf_mr_cb(pd, dmabuf_offset, len, addr, dmabuf_fd,
+			     &mr->pmd_mr) < 0) {
+		DRV_LOG(WARNING,
+			"Fail to create dma-buf MR for address (%p) fd=%d",
+			(void *)addr, dmabuf_fd);
+		mlx5_free(mr);
+		return NULL;
+	}
+	mr->msl = NULL; /* Mark it is external memory. */
+	mr->ms_bmp = NULL;
+	mr->ms_n = 1;
+	mr->ms_bmp_n = 1;
+	/*
+	 * For dma-buf MR, the returned addr may be NULL since there's no VA
+	 * in the registration. Store the user-provided addr for cache lookup.
+	 */
+	if (mr->pmd_mr.addr == NULL)
+		mr->pmd_mr.addr = (void *)addr;
+	if (mr->pmd_mr.len == 0)
+		mr->pmd_mr.len = len;
+	DRV_LOG(DEBUG,
+		"MR CREATED (%p) for dma-buf external memory %p (fd=%d):\n"
+		"  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+		" lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+		(void *)mr, (void *)addr, dmabuf_fd,
+		addr, addr + len, rte_cpu_to_be_32(mr->pmd_mr.lkey),
+		mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+	return mr;
+}
+
 /**
  * Callback for memory free event. Iterate freed memsegs and check whether it
  * belongs to an existing MR. If found, clear the bit from bitmap of MR. As a
@@ -1747,9 +1817,48 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		struct mlx5_mempool_mr *mr = &new_mpr->mrs[i];
 		const struct mlx5_range *range = &ranges[i];
 		size_t len = range->end - range->start;
+		struct rte_memseg_list *msl;
+		int reg_result;
+
+		/* Check if this is dma-buf backed external memory */
+		msl = rte_mem_virt2memseg_list((void *)range->start);
+		if (msl != NULL && msl->external &&
+		    share_cache->reg_dmabuf_mr_cb != NULL) {
+			int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+			if (dmabuf_fd >= 0) {
+				uint64_t dmabuf_off;
+				/* Get base offset from memseg list */
+				ret = rte_memseg_list_get_dmabuf_offset(msl, &dmabuf_off);
+				if (ret < 0) {
+					DRV_LOG(ERR, "Failed to get dma-buf offset for memseg list %p",
+						(void *)msl);
+					goto exit;
+				}
+				/* Calculate offset within dma-buf for this specific range */
+				dmabuf_off += (range->start - (uintptr_t)msl->base_va);
+				/* Use dma-buf MR registration */
+				reg_result = share_cache->reg_dmabuf_mr_cb(pd,
+					dmabuf_off, len, range->start, dmabuf_fd,
+					&mr->pmd_mr);
+				if (reg_result == 0) {
+					/* For dma-buf MR, set addr if not set by driver */
+					if (mr->pmd_mr.addr == NULL)
+						mr->pmd_mr.addr = (void *)range->start;
+					if (mr->pmd_mr.len == 0)
+						mr->pmd_mr.len = len;
+				}
+			} else {
+				/* Use regular MR registration */
+				reg_result = share_cache->reg_mr_cb(pd,
+					(void *)range->start, len, &mr->pmd_mr);
+			}
+		} else {
+			/* Use regular MR registration */
+			reg_result = share_cache->reg_mr_cb(pd,
+				(void *)range->start, len, &mr->pmd_mr);
+		}
 
-		if (share_cache->reg_mr_cb(pd, (void *)range->start, len,
-		    &mr->pmd_mr) < 0) {
+		if (reg_result < 0) {
 			DRV_LOG(ERR,
 				"Failed to create an MR in PD %p for address range "
 				"[0x%" PRIxPTR ", 0x%" PRIxPTR "] (%zu bytes) for mempool %s",
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index cf7c685e9b..3b967b1323 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -35,6 +35,9 @@ struct mlx5_pmd_mr {
  */
 typedef int (*mlx5_reg_mr_t)(void *pd, void *addr, size_t length,
 			     struct mlx5_pmd_mr *pmd_mr);
+typedef int (*mlx5_reg_dmabuf_mr_t)(void *pd, uint64_t offset, size_t length,
+				    uint64_t iova, int fd,
+				    struct mlx5_pmd_mr *pmd_mr);
 typedef void (*mlx5_dereg_mr_t)(struct mlx5_pmd_mr *pmd_mr);
 
 /* Memory Region object. */
@@ -87,6 +90,7 @@ struct __rte_packed_begin mlx5_mr_share_cache {
 	struct mlx5_mr_list mr_free_list; /* Freed MR list. */
 	struct mlx5_mempool_reg_list mempool_reg_list; /* Mempool database. */
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 } __rte_packed_end;
 
@@ -233,6 +237,10 @@ mlx5_mr_lookup_list(struct mlx5_mr_share_cache *share_cache,
 struct mlx5_mr *
 mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 		   mlx5_reg_mr_t reg_mr_cb);
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb);
 void mlx5_mr_free(struct mlx5_mr *mr, mlx5_dereg_mr_t dereg_mr_cb);
 __rte_internal
 uint32_t
@@ -251,12 +259,19 @@ int
 mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 			 struct mlx5_pmd_mr *pmd_mr);
 __rte_internal
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr);
+__rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
 __rte_internal
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb);
 
 __rte_internal
 int
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c b/drivers/common/mlx5/windows/mlx5_common_os.c
index 7fac361460..5e284742ab 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -17,6 +17,7 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_malloc.h"
+#include "mlx5_common_mr.h"
 
 /**
  * Initialization routine for run-time dependency on external lib.
@@ -442,15 +443,20 @@ mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func (NULL on Windows - not supported)
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  *
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_os_reg_mr;
+	*reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */
 	*dereg_mr_cb = mlx5_os_dereg_mr;
 }
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index f9f127e9e6..b2712c9a8d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -41,6 +41,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 89f32c7722..380689cfeb 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -1186,7 +1186,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
-	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb,  &priv->reg_dmabuf_mr_cb,
+			&priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
 	if (mlx5_crypto_is_ipsec_opt(priv)) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 0/2] support dmabuf
  2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
  2026-02-03 22:26   ` [PATCH v2 1/2] eal: " Cliff Burdick
  2026-02-03 22:26   ` [PATCH v2 2/2] common/mlx5: " Cliff Burdick
@ 2026-02-03 23:02   ` Cliff Burdick
  2026-02-03 23:02     ` [PATCH v3 1/2] eal: " Cliff Burdick
                       ` (2 more replies)
  2 siblings, 3 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 23:02 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov

Fixes since v2:
* Fixed missing EXPERIMENTAL macro on new symbols
* Fixed style issue

Add support for kernel dmabuf feature and integrate it in the mlx5 driver. 
This feature is needed to support GPUDirect on newer kernels.

Cliff Burdick (2):
  eal: support dmabuf
  common/mlx5: support dmabuf

 .mailmap                                      |   1 +
 doc/guides/rel_notes/release_26_03.rst        |   6 +
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
 lib/eal/common/eal_memalloc.h                 |  21 +++
 lib/eal/common/malloc_heap.c                  |  27 +++
 lib/eal/common/malloc_heap.h                  |   5 +
 lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
 17 files changed, 612 insertions(+), 14 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/2] eal: support dmabuf
  2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
@ 2026-02-03 23:02     ` Cliff Burdick
  2026-02-03 23:02     ` [PATCH v3 2/2] common/mlx5: " Cliff Burdick
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
  2 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 23:02 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov, Thomas Monjalon

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

A new api function rte_extmem_register_dmabuf is introduced to create
the mapping from a dmabuf file descriptor. dmabuf uses a file descriptor
and an offset that has been pre-opened with the kernel. The kernel uses
the file descriptor to map to a VA pointer. To avoid ABI changes, a
static struct is used inside of eal_common_memory.c, and lookups are
done on this struct rather than from the rte_memseg_list.

Ideally we would like to add both the dmabuf file descriptor and offset
to rte_memseg_list, but it's not clear if we can reuse existing fields
when using the dmabuf API.

We could rename the external flag to a more generic "properties" flag
where "external" is the lowest bit, then we can use the second bit to
indicate the presence of dmabuf. In the presence of the flag for
dmabuf we could reuse the base_va address field for the dmabuf offset,
and the socket_id for the file descriptor.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                               |   1 +
 doc/guides/rel_notes/release_26_03.rst |   6 +
 lib/eal/common/eal_common_memory.c     | 165 ++++++++++++++++++++++++-
 lib/eal/common/eal_memalloc.h          |  21 ++++
 lib/eal/common/malloc_heap.c           |  27 ++++
 lib/eal/common/malloc_heap.h           |   5 +
 lib/eal/include/rte_memory.h           | 145 ++++++++++++++++++++++
 7 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/.mailmap b/.mailmap
index 2f089326ff..4c2b2f921d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,6 +291,7 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
+Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 15dabee7a1..56457d0382 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added dma-buf-backed external memory support.**
+
+  Added EAL support for registering dma-buf-backed external memory with
+  ``rte_extmem_register_dmabuf``, and enabled mlx5 common code to consume
+  dma-buf mappings for device access.
+
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c62edf5e55..7415479fff 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -45,6 +45,15 @@
 static void *next_baseaddr;
 static uint64_t system_page_sz;
 
+/* Internal storage for dma-buf info, indexed by memseg list index.
+ * This keeps dma-buf metadata out of the public rte_memseg_list structure
+ * to preserve ABI compatibility.
+ */
+static struct {
+	int fd;          /**< dma-buf fd, -1 if not dma-buf backed */
+	uint64_t offset; /**< offset within dma-buf */
+} dmabuf_info[RTE_MAX_MEMSEG_LISTS];
+
 #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5
 void *
 eal_get_virtual_area(void *requested_addr, size_t *size,
@@ -232,6 +241,10 @@ eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz,
 {
 	char name[RTE_FBARRAY_NAME_LEN];
 
+	/* Initialize dma-buf info to "not dma-buf backed" */
+	dmabuf_info[type_msl_idx].fd = -1;
+	dmabuf_info[type_msl_idx].offset = 0;
+
 	snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
 		 type_msl_idx);
 
@@ -930,10 +943,113 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
-RTE_EXPORT_SYMBOL(rte_extmem_register)
+/* Internal dma-buf info functions */
 int
-rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
-		unsigned int n_pages, size_t page_sz)
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	dmabuf_info[list_idx].fd = fd;
+	dmabuf_info[list_idx].offset = offset;
+	return 0;
+}
+
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	return dmabuf_info[list_idx].fd;
+}
+
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS || offset == NULL)
+		return -EINVAL;
+
+	*offset = dmabuf_info[list_idx].offset;
+	return 0;
+}
+
+/* Public dma-buf info API functions */
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_fd_unsafe)
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dmabuf_info[msl_idx].fd;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_fd)
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_fd_unsafe(msl);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_offset_unsafe)
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL || offset == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	*offset = dmabuf_info[msl_idx].offset;
+	return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_offset)
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_offset_unsafe(msl, offset);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+static int
+extmem_register(void *va_addr, size_t len,
+	int dmabuf_fd, uint64_t dmabuf_offset,
+	rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int socket_id, n;
@@ -967,10 +1083,19 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 
 	/* we can create a new memseg */
 	n = len / page_sz;
-	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+	if (dmabuf_fd < 0) {
+		if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
 			page_sz, "extmem", socket_id) == NULL) {
-		ret = -1;
-		goto unlock;
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n,
+			page_sz, "extmem_dmabuf", socket_id,
+			dmabuf_fd, dmabuf_offset) == NULL) {
+			ret = -1;
+			goto unlock;
+		}
 	}
 
 	/* memseg list successfully created - increment next socket ID */
@@ -980,6 +1105,34 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 	return ret;
 }
 
+RTE_EXPORT_SYMBOL(rte_extmem_register)
+int
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	return rte_extmem_register_dmabuf(va_addr, len, -1, 0, iova_addrs, n_pages, page_sz);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_extmem_register_dmabuf)
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	if (dmabuf_fd < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return extmem_register(va_addr,
+		len,
+		dmabuf_fd,
+		dmabuf_offset,
+		iova_addrs,
+		n_pages,
+		page_sz);
+}
+
 RTE_EXPORT_SYMBOL(rte_extmem_unregister)
 int
 rte_extmem_unregister(void *va_addr, size_t len)
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index 0c267066d9..e7e807ddcb 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -90,6 +90,27 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd);
 int
 eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
 
+/*
+ * Set dma-buf info for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset);
+
+/*
+ * Get dma-buf fd for a memseg list.
+ * Returns fd (>= 0) on success, -1 if not dma-buf backed, -errno on error.
+ */
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx);
+
+/*
+ * Get dma-buf offset for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset);
+
 int
 eal_memalloc_init(void)
 	__rte_requires_shared_capability(rte_mcfg_mem_get_lock());
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index 39240c261c..bf986fe654 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -1232,6 +1232,33 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	msl->version = 0;
 	msl->external = 1;
 
+	/* initialize dma-buf info to "not dma-buf backed" */
+	eal_memseg_list_set_dmabuf_info(i, -1, 0);
+
+	return msl;
+}
+
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int msl_idx;
+
+	/* Create the base external segment */
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, seg_name, socket_id);
+	if (msl == NULL)
+		return NULL;
+
+	/* Get memseg list index */
+	msl_idx = msl - mcfg->memsegs;
+
+	/* Set dma-buf info in the internal side-table */
+	eal_memseg_list_set_dmabuf_info(msl_idx, dmabuf_fd, dmabuf_offset);
+
 	return msl;
 }
 
diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
index dfc56d4ae3..87525d1a68 100644
--- a/lib/eal/common/malloc_heap.h
+++ b/lib/eal/common/malloc_heap.h
@@ -51,6 +51,11 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset);
+
 struct rte_memseg_list *
 malloc_heap_find_external_seg(void *va_addr, size_t len);
 
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index b6e97ad695..fffeb8fcf5 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -405,6 +405,98 @@ int
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
 /**
  * Register external memory chunk with DPDK.
  *
@@ -443,6 +535,59 @@ int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register external memory chunk backed by a dma-buf file descriptor and offset.
+ *
+ * This is similar to rte_extmem_register() but additionally stores dma-buf
+ * file descriptor information, allowing drivers to use dma-buf based
+ * memory registration (e.g., ibv_reg_dmabuf_mr for RDMA devices).
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves via rte_dev_dma_map().
+ *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
+ * @param va_addr
+ *   Start of virtual area to register (mmap'd address of the dma-buf).
+ *   Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ *   This is independent of dma-buf offset.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf where the registered region starts.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+__rte_experimental
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Unregister external memory chunk with DPDK.
  *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/2] common/mlx5: support dmabuf
  2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
  2026-02-03 23:02     ` [PATCH v3 1/2] eal: " Cliff Burdick
@ 2026-02-03 23:02     ` Cliff Burdick
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
  2 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-03 23:02 UTC (permalink / raw)
  To: dev
  Cc: anatoly.burakov, Thomas Monjalon, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                                      |   2 +-
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 +++++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 +++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 ++-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 11 files changed, 249 insertions(+), 9 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4c2b2f921d..0a8a67098f 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,8 +291,8 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
-Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
+Cliff Burdick <cburdick@nvidia.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
 Conor Fogarty <conor.fogarty@intel.com>
diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 3767e7a69b..8e83104165 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -203,6 +203,8 @@ has_sym_args = [
             'mlx5dv_dr_domain_allow_duplicate_rules' ],
         [ 'HAVE_MLX5_IBV_REG_MR_IOVA', 'infiniband/verbs.h',
             'ibv_reg_mr_iova' ],
+        [ 'HAVE_IBV_REG_DMABUF_MR', 'infiniband/verbs.h',
+            'ibv_reg_dmabuf_mr' ],
         [ 'HAVE_MLX5_IBV_IMPORT_CTX_PD_AND_MR', 'infiniband/verbs.h',
             'ibv_import_device' ],
         [ 'HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index 98260df470..f6d18fd5df 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -129,6 +129,47 @@ mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 	return 0;
 }
 
+/**
+ * Register mr for dma-buf backed memory. Given protection domain pointer,
+ * dma-buf fd, offset and length, register the memory region.
+ *
+ * @param[in] pd
+ *   Pointer to protection domain context.
+ * @param[in] offset
+ *   Offset within the dma-buf.
+ * @param[in] length
+ *   Length of the memory to register.
+ * @param[in] fd
+ *   File descriptor of the dma-buf.
+ * @param[out] pmd_mr
+ *   pmd_mr struct set with lkey, address, length and pointer to mr object
+ *
+ * @return
+ *   0 on successful registration, -1 otherwise
+ */
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_common_verbs_reg_dmabuf_mr)
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr)
+{
+	struct ibv_mr *ibv_mr;
+	ibv_mr = mlx5_glue->reg_dmabuf_mr(pd, offset, length, iova, fd,
+					  IBV_ACCESS_LOCAL_WRITE |
+					  (haswell_broadwell_cpu ? 0 :
+					  IBV_ACCESS_RELAXED_ORDERING));
+	if (!ibv_mr)
+		return -1;
+
+	*pmd_mr = (struct mlx5_pmd_mr){
+		.lkey = ibv_mr->lkey,
+		.addr = ibv_mr->addr,
+		.len = ibv_mr->length,
+		.obj = (void *)ibv_mr,
+	};
+	return 0;
+}
+
 /**
  * Deregister mr. Given the mlx5 pmd MR - deregister the MR
  *
@@ -151,13 +192,18 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_common_verbs_reg_mr;
+	*reg_dmabuf_mr_cb = mlx5_common_verbs_reg_dmabuf_mr;
 	*dereg_mr_cb = mlx5_common_verbs_dereg_mr;
 }
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index a91eaa429d..6fac7f2bcd 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -291,6 +291,24 @@ mlx5_glue_reg_mr_iova(struct ibv_pd *pd, void *addr, size_t length,
 #endif
 }
 
+static struct ibv_mr *
+mlx5_glue_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,
+			uint64_t iova, int fd, int access)
+{
+#ifdef HAVE_IBV_REG_DMABUF_MR
+	return ibv_reg_dmabuf_mr(pd, offset, length, iova, fd, access);
+#else
+	(void)pd;
+	(void)offset;
+	(void)length;
+	(void)iova;
+	(void)fd;
+	(void)access;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static struct ibv_mr *
 mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
 {
@@ -1619,6 +1637,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
 	.reg_mr_iova = mlx5_glue_reg_mr_iova,
+	.reg_dmabuf_mr = mlx5_glue_reg_dmabuf_mr,
 	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 81d6b0aaf9..66216d1194 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -219,6 +219,9 @@ struct mlx5_glue {
 	struct ibv_mr *(*reg_mr_iova)(struct ibv_pd *pd, void *addr,
 				      size_t length, uint64_t iova,
 				      int access);
+	struct ibv_mr *(*reg_dmabuf_mr)(struct ibv_pd *pd, uint64_t offset,
+					size_t length, uint64_t iova,
+					int fd, int access);
 	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 84a93e7dbd..82cf17ca78 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -13,6 +13,7 @@
 #include <rte_class.h>
 #include <rte_malloc.h>
 #include <rte_eal_paging.h>
+#include <rte_memory.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
@@ -1125,6 +1126,7 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 	struct mlx5_common_device *dev;
 	struct mlx5_mr_btree *bt;
 	struct mlx5_mr *mr;
+	struct rte_memseg_list *msl;
 
 	dev = to_mlx5_device(rte_dev);
 	if (!dev) {
@@ -1134,8 +1136,44 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 		rte_errno = ENODEV;
 		return -1;
 	}
-	mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
-				SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	/* Check if this is dma-buf backed external memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl != NULL && msl->external) {
+		int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+		if (dmabuf_fd >= 0) {
+			uint64_t dmabuf_off;
+			/* Get base offset from memseg list */
+			int ret = rte_memseg_list_get_dmabuf_offset(
+				msl, &dmabuf_off);
+			if (ret < 0) {
+				DRV_LOG(ERR,
+					"Failed to get dma-buf offset for memseg list %p",
+					(void *)msl);
+				return -1;
+			}
+			/* Calculate offset within dma-buf address */
+			dmabuf_off += ((uintptr_t)addr - (uintptr_t)msl->base_va);
+			/* Use dma-buf MR registration */
+			mr = mlx5_create_mr_ext_dmabuf(dev->pd,
+						       (uintptr_t)addr,
+						       len,
+						       SOCKET_ID_ANY,
+						       dmabuf_fd,
+						       dmabuf_off,
+						       dev->mr_scache.reg_dmabuf_mr_cb);
+		} else {
+			/* Use regular MR registration */
+			mr = mlx5_create_mr_ext(dev->pd,
+						(uintptr_t)addr,
+						len,
+						SOCKET_ID_ANY,
+						dev->mr_scache.reg_mr_cb);
+		}
+	} else {
+		/* Use regular MR registration */
+		mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
+					SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	}
 	if (!mr) {
 		DRV_LOG(WARNING, "Device %s unable to DMA map", rte_dev->name);
 		rte_errno = EINVAL;
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 8ed988dec9..8f31eaefe8 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -8,6 +8,7 @@
 #include <rte_eal_memconfig.h>
 #include <rte_eal_paging.h>
 #include <rte_errno.h>
+#include <rte_memory.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
@@ -1141,6 +1142,7 @@ mlx5_mr_create_cache(struct mlx5_mr_share_cache *share_cache, int socket)
 {
 	/* Set the reg_mr and dereg_mr callback functions */
 	mlx5_os_set_reg_mr_cb(&share_cache->reg_mr_cb,
+			      &share_cache->reg_dmabuf_mr_cb,
 			      &share_cache->dereg_mr_cb);
 	rte_rwlock_init(&share_cache->rwlock);
 	rte_rwlock_init(&share_cache->mprwlock);
@@ -1221,6 +1223,74 @@ mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 	return mr;
 }
 
+/**
+ * Creates a memory region for dma-buf backed external memory.
+ *
+ * @param pd
+ *   Pointer to pd of a device (net, regex, vdpa,...).
+ * @param addr
+ *   Starting virtual address of memory (mmap'd address).
+ * @param len
+ *   Length of memory segment being mapped.
+ * @param socket_id
+ *   Socket to allocate heap memory for the control structures.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf.
+ * @param reg_dmabuf_mr_cb
+ *   Callback function for dma-buf MR registration.
+ *
+ * @return
+ *   Pointer to MR structure on success, NULL otherwise.
+ */
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb)
+{
+	struct mlx5_mr *mr = NULL;
+
+	if (reg_dmabuf_mr_cb == NULL) {
+		DRV_LOG(WARNING, "dma-buf MR registration not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	mr = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO,
+			 RTE_ALIGN_CEIL(sizeof(*mr), RTE_CACHE_LINE_SIZE),
+			 RTE_CACHE_LINE_SIZE, socket_id);
+	if (mr == NULL)
+		return NULL;
+	if (reg_dmabuf_mr_cb(pd, dmabuf_offset, len, addr, dmabuf_fd,
+			     &mr->pmd_mr) < 0) {
+		DRV_LOG(WARNING,
+			"Fail to create dma-buf MR for address (%p) fd=%d",
+			(void *)addr, dmabuf_fd);
+		mlx5_free(mr);
+		return NULL;
+	}
+	mr->msl = NULL; /* Mark it is external memory. */
+	mr->ms_bmp = NULL;
+	mr->ms_n = 1;
+	mr->ms_bmp_n = 1;
+	/*
+	 * For dma-buf MR, the returned addr may be NULL since there's no VA
+	 * in the registration. Store the user-provided addr for cache lookup.
+	 */
+	if (mr->pmd_mr.addr == NULL)
+		mr->pmd_mr.addr = (void *)addr;
+	if (mr->pmd_mr.len == 0)
+		mr->pmd_mr.len = len;
+	DRV_LOG(DEBUG,
+		"MR CREATED (%p) for dma-buf external memory %p (fd=%d):\n"
+		"  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+		" lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+		(void *)mr, (void *)addr, dmabuf_fd,
+		addr, addr + len, rte_cpu_to_be_32(mr->pmd_mr.lkey),
+		mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+	return mr;
+}
+
 /**
  * Callback for memory free event. Iterate freed memsegs and check whether it
  * belongs to an existing MR. If found, clear the bit from bitmap of MR. As a
@@ -1747,9 +1817,48 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		struct mlx5_mempool_mr *mr = &new_mpr->mrs[i];
 		const struct mlx5_range *range = &ranges[i];
 		size_t len = range->end - range->start;
+		struct rte_memseg_list *msl;
+		int reg_result;
+
+		/* Check if this is dma-buf backed external memory */
+		msl = rte_mem_virt2memseg_list((void *)range->start);
+		if (msl != NULL && msl->external &&
+		    share_cache->reg_dmabuf_mr_cb != NULL) {
+			int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+			if (dmabuf_fd >= 0) {
+				uint64_t dmabuf_off;
+				/* Get base offset from memseg list */
+				ret = rte_memseg_list_get_dmabuf_offset(msl, &dmabuf_off);
+				if (ret < 0) {
+					DRV_LOG(ERR, "Failed to get dma-buf offset for memseg list %p",
+						(void *)msl);
+					goto exit;
+				}
+				/* Calculate offset within dma-buf for this specific range */
+				dmabuf_off += (range->start - (uintptr_t)msl->base_va);
+				/* Use dma-buf MR registration */
+				reg_result = share_cache->reg_dmabuf_mr_cb(pd,
+					dmabuf_off, len, range->start, dmabuf_fd,
+					&mr->pmd_mr);
+				if (reg_result == 0) {
+					/* For dma-buf MR, set addr if not set by driver */
+					if (mr->pmd_mr.addr == NULL)
+						mr->pmd_mr.addr = (void *)range->start;
+					if (mr->pmd_mr.len == 0)
+						mr->pmd_mr.len = len;
+				}
+			} else {
+				/* Use regular MR registration */
+				reg_result = share_cache->reg_mr_cb(pd,
+					(void *)range->start, len, &mr->pmd_mr);
+			}
+		} else {
+			/* Use regular MR registration */
+			reg_result = share_cache->reg_mr_cb(pd,
+				(void *)range->start, len, &mr->pmd_mr);
+		}
 
-		if (share_cache->reg_mr_cb(pd, (void *)range->start, len,
-		    &mr->pmd_mr) < 0) {
+		if (reg_result < 0) {
 			DRV_LOG(ERR,
 				"Failed to create an MR in PD %p for address range "
 				"[0x%" PRIxPTR ", 0x%" PRIxPTR "] (%zu bytes) for mempool %s",
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index cf7c685e9b..3b967b1323 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -35,6 +35,9 @@ struct mlx5_pmd_mr {
  */
 typedef int (*mlx5_reg_mr_t)(void *pd, void *addr, size_t length,
 			     struct mlx5_pmd_mr *pmd_mr);
+typedef int (*mlx5_reg_dmabuf_mr_t)(void *pd, uint64_t offset, size_t length,
+				    uint64_t iova, int fd,
+				    struct mlx5_pmd_mr *pmd_mr);
 typedef void (*mlx5_dereg_mr_t)(struct mlx5_pmd_mr *pmd_mr);
 
 /* Memory Region object. */
@@ -87,6 +90,7 @@ struct __rte_packed_begin mlx5_mr_share_cache {
 	struct mlx5_mr_list mr_free_list; /* Freed MR list. */
 	struct mlx5_mempool_reg_list mempool_reg_list; /* Mempool database. */
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 } __rte_packed_end;
 
@@ -233,6 +237,10 @@ mlx5_mr_lookup_list(struct mlx5_mr_share_cache *share_cache,
 struct mlx5_mr *
 mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 		   mlx5_reg_mr_t reg_mr_cb);
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb);
 void mlx5_mr_free(struct mlx5_mr *mr, mlx5_dereg_mr_t dereg_mr_cb);
 __rte_internal
 uint32_t
@@ -251,12 +259,19 @@ int
 mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 			 struct mlx5_pmd_mr *pmd_mr);
 __rte_internal
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr);
+__rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
 __rte_internal
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb);
 
 __rte_internal
 int
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c b/drivers/common/mlx5/windows/mlx5_common_os.c
index 7fac361460..5e284742ab 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -17,6 +17,7 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_malloc.h"
+#include "mlx5_common_mr.h"
 
 /**
  * Initialization routine for run-time dependency on external lib.
@@ -442,15 +443,20 @@ mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func (NULL on Windows - not supported)
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  *
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_os_reg_mr;
+	*reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */
 	*dereg_mr_cb = mlx5_os_dereg_mr;
 }
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index f9f127e9e6..b2712c9a8d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -41,6 +41,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 89f32c7722..380689cfeb 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -1186,7 +1186,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
-	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb,  &priv->reg_dmabuf_mr_cb,
+			&priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
 	if (mlx5_crypto_is_ipsec_opt(priv)) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 0/2] support dmabuf
  2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
  2026-02-03 23:02     ` [PATCH v3 1/2] eal: " Cliff Burdick
  2026-02-03 23:02     ` [PATCH v3 2/2] common/mlx5: " Cliff Burdick
@ 2026-02-04 15:50     ` Cliff Burdick
  2026-02-04 15:50       ` [PATCH v4 1/2] eal: " Cliff Burdick
                         ` (3 more replies)
  2 siblings, 4 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-04 15:50 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov

Fixes since v3:
* Fixed version in RTE_EXPORT_EXPERIMENTAL_SYMBOL

Add support for kernel dmabuf feature and integrate it in the mlx5 driver.
This feature is needed to support GPUDirect on newer kernels.

I apologize for all the patches. Still trying to learn how to submit these.

Cliff Burdick (2):
  eal: support dmabuf
  common/mlx5: support dmabuf

 .mailmap                                      |   1 +
 doc/guides/rel_notes/release_26_03.rst        |   6 +
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
 lib/eal/common/eal_memalloc.h                 |  21 +++
 lib/eal/common/malloc_heap.c                  |  27 +++
 lib/eal/common/malloc_heap.h                  |   5 +
 lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
 17 files changed, 612 insertions(+), 14 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/2] eal: support dmabuf
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
@ 2026-02-04 15:50       ` Cliff Burdick
  2026-02-12 13:57         ` Burakov, Anatoly
  2026-02-04 15:50       ` [PATCH v4 2/2] common/mlx5: " Cliff Burdick
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Cliff Burdick @ 2026-02-04 15:50 UTC (permalink / raw)
  To: dev; +Cc: anatoly.burakov, Thomas Monjalon

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

A new api function rte_extmem_register_dmabuf is introduced to create
the mapping from a dmabuf file descriptor. dmabuf uses a file descriptor
and an offset that has been pre-opened with the kernel. The kernel uses
the file descriptor to map to a VA pointer. To avoid ABI changes, a
static struct is used inside of eal_common_memory.c, and lookups are
done on this struct rather than from the rte_memseg_list.

Ideally we would like to add both the dmabuf file descriptor and offset
to rte_memseg_list, but it's not clear if we can reuse existing fields
when using the dmabuf API.

We could rename the external flag to a more generic "properties" flag
where "external" is the lowest bit, then we can use the second bit to
indicate the presence of dmabuf. In the presence of the flag for
dmabuf we could reuse the base_va address field for the dmabuf offset,
and the socket_id for the file descriptor.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                               |   1 +
 doc/guides/rel_notes/release_26_03.rst |   6 +
 lib/eal/common/eal_common_memory.c     | 165 ++++++++++++++++++++++++-
 lib/eal/common/eal_memalloc.h          |  21 ++++
 lib/eal/common/malloc_heap.c           |  27 ++++
 lib/eal/common/malloc_heap.h           |   5 +
 lib/eal/include/rte_memory.h           | 145 ++++++++++++++++++++++
 7 files changed, 364 insertions(+), 6 deletions(-)

diff --git a/.mailmap b/.mailmap
index 2f089326ff..4c2b2f921d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,6 +291,7 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
+Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 15dabee7a1..56457d0382 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -55,6 +55,12 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added dma-buf-backed external memory support.**
+
+  Added EAL support for registering dma-buf-backed external memory with
+  ``rte_extmem_register_dmabuf``, and enabled mlx5 common code to consume
+  dma-buf mappings for device access.
+
 
 Removed Items
 -------------
diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c
index c62edf5e55..4b8b1c8b59 100644
--- a/lib/eal/common/eal_common_memory.c
+++ b/lib/eal/common/eal_common_memory.c
@@ -45,6 +45,15 @@
 static void *next_baseaddr;
 static uint64_t system_page_sz;
 
+/* Internal storage for dma-buf info, indexed by memseg list index.
+ * This keeps dma-buf metadata out of the public rte_memseg_list structure
+ * to preserve ABI compatibility.
+ */
+static struct {
+	int fd;          /**< dma-buf fd, -1 if not dma-buf backed */
+	uint64_t offset; /**< offset within dma-buf */
+} dmabuf_info[RTE_MAX_MEMSEG_LISTS];
+
 #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5
 void *
 eal_get_virtual_area(void *requested_addr, size_t *size,
@@ -232,6 +241,10 @@ eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz,
 {
 	char name[RTE_FBARRAY_NAME_LEN];
 
+	/* Initialize dma-buf info to "not dma-buf backed" */
+	dmabuf_info[type_msl_idx].fd = -1;
+	dmabuf_info[type_msl_idx].offset = 0;
+
 	snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id,
 		 type_msl_idx);
 
@@ -930,10 +943,113 @@ rte_memseg_get_fd_offset(const struct rte_memseg *ms, size_t *offset)
 	return ret;
 }
 
-RTE_EXPORT_SYMBOL(rte_extmem_register)
+/* Internal dma-buf info functions */
 int
-rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
-		unsigned int n_pages, size_t page_sz)
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	dmabuf_info[list_idx].fd = fd;
+	dmabuf_info[list_idx].offset = offset;
+	return 0;
+}
+
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS)
+		return -EINVAL;
+
+	return dmabuf_info[list_idx].fd;
+}
+
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset)
+{
+	if (list_idx < 0 || list_idx >= RTE_MAX_MEMSEG_LISTS || offset == NULL)
+		return -EINVAL;
+
+	*offset = dmabuf_info[list_idx].offset;
+	return 0;
+}
+
+/* Public dma-buf info API functions */
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_fd_unsafe, 26.03)
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dmabuf_info[msl_idx].fd;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_fd, 26.03)
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_fd_unsafe(msl);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_offset_unsafe, 26.03)
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	int msl_idx;
+
+	if (msl == NULL || offset == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	msl_idx = msl - mcfg->memsegs;
+	if (msl_idx < 0 || msl_idx >= RTE_MAX_MEMSEG_LISTS) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	*offset = dmabuf_info[msl_idx].offset;
+	return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memseg_list_get_dmabuf_offset, 26.03)
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset)
+{
+	int ret;
+
+	rte_mcfg_mem_read_lock();
+	ret = rte_memseg_list_get_dmabuf_offset_unsafe(msl, offset);
+	rte_mcfg_mem_read_unlock();
+
+	return ret;
+}
+
+static int
+extmem_register(void *va_addr, size_t len,
+	int dmabuf_fd, uint64_t dmabuf_offset,
+	rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
 {
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 	unsigned int socket_id, n;
@@ -967,10 +1083,19 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 
 	/* we can create a new memseg */
 	n = len / page_sz;
-	if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
+	if (dmabuf_fd < 0) {
+		if (malloc_heap_create_external_seg(va_addr, iova_addrs, n,
 			page_sz, "extmem", socket_id) == NULL) {
-		ret = -1;
-		goto unlock;
+			ret = -1;
+			goto unlock;
+		}
+	} else {
+		if (malloc_heap_create_external_seg_dmabuf(va_addr, iova_addrs, n,
+			page_sz, "extmem_dmabuf", socket_id,
+			dmabuf_fd, dmabuf_offset) == NULL) {
+			ret = -1;
+			goto unlock;
+		}
 	}
 
 	/* memseg list successfully created - increment next socket ID */
@@ -980,6 +1105,34 @@ rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 	return ret;
 }
 
+RTE_EXPORT_SYMBOL(rte_extmem_register)
+int
+rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz)
+{
+	return rte_extmem_register_dmabuf(va_addr, len, -1, 0, iova_addrs, n_pages, page_sz);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_extmem_register_dmabuf, 26.03)
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
+{
+	if (dmabuf_fd < 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return extmem_register(va_addr,
+		len,
+		dmabuf_fd,
+		dmabuf_offset,
+		iova_addrs,
+		n_pages,
+		page_sz);
+}
+
 RTE_EXPORT_SYMBOL(rte_extmem_unregister)
 int
 rte_extmem_unregister(void *va_addr, size_t len)
diff --git a/lib/eal/common/eal_memalloc.h b/lib/eal/common/eal_memalloc.h
index 0c267066d9..e7e807ddcb 100644
--- a/lib/eal/common/eal_memalloc.h
+++ b/lib/eal/common/eal_memalloc.h
@@ -90,6 +90,27 @@ eal_memalloc_set_seg_list_fd(int list_idx, int fd);
 int
 eal_memalloc_get_seg_fd_offset(int list_idx, int seg_idx, size_t *offset);
 
+/*
+ * Set dma-buf info for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_set_dmabuf_info(int list_idx, int fd, uint64_t offset);
+
+/*
+ * Get dma-buf fd for a memseg list.
+ * Returns fd (>= 0) on success, -1 if not dma-buf backed, -errno on error.
+ */
+int
+eal_memseg_list_get_dmabuf_fd(int list_idx);
+
+/*
+ * Get dma-buf offset for a memseg list.
+ * Returns 0 on success, -errno on failure.
+ */
+int
+eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset);
+
 int
 eal_memalloc_init(void)
 	__rte_requires_shared_capability(rte_mcfg_mem_get_lock());
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index 39240c261c..bf986fe654 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -1232,6 +1232,33 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 	msl->version = 0;
 	msl->external = 1;
 
+	/* initialize dma-buf info to "not dma-buf backed" */
+	eal_memseg_list_set_dmabuf_info(i, -1, 0);
+
+	return msl;
+}
+
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset)
+{
+	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
+	struct rte_memseg_list *msl;
+	int msl_idx;
+
+	/* Create the base external segment */
+	msl = malloc_heap_create_external_seg(va_addr, iova_addrs, n_pages,
+			page_sz, seg_name, socket_id);
+	if (msl == NULL)
+		return NULL;
+
+	/* Get memseg list index */
+	msl_idx = msl - mcfg->memsegs;
+
+	/* Set dma-buf info in the internal side-table */
+	eal_memseg_list_set_dmabuf_info(msl_idx, dmabuf_fd, dmabuf_offset);
+
 	return msl;
 }
 
diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
index dfc56d4ae3..87525d1a68 100644
--- a/lib/eal/common/malloc_heap.h
+++ b/lib/eal/common/malloc_heap.h
@@ -51,6 +51,11 @@ malloc_heap_create_external_seg(void *va_addr, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz, const char *seg_name,
 		unsigned int socket_id);
 
+struct rte_memseg_list *
+malloc_heap_create_external_seg_dmabuf(void *va_addr, rte_iova_t iova_addrs[],
+		unsigned int n_pages, size_t page_sz, const char *seg_name,
+		unsigned int socket_id, int dmabuf_fd, uint64_t dmabuf_offset);
+
 struct rte_memseg_list *
 malloc_heap_find_external_seg(void *va_addr, size_t len);
 
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index b6e97ad695..fffeb8fcf5 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -405,6 +405,98 @@ int
 rte_memseg_get_fd_offset_thread_unsafe(const struct rte_memseg *ms,
 		size_t *offset);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf file descriptor associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf fd.
+ *
+ * @return
+ *   Valid dma-buf file descriptor (>= 0) in case of success.
+ *   -1 if not dma-buf backed or in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_fd_unsafe(const struct rte_memseg_list *msl);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function read-locks the memory hotplug subsystem, and thus cannot
+ *       be used within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get dma-buf offset associated with a memseg list.
+ *
+ * @note This function does not perform any locking, and is only safe to call
+ *       from within memory-related callback functions.
+ *
+ * @param msl
+ *   A pointer to memseg list for which to get dma-buf offset.
+ * @param offset
+ *   A pointer to offset value where the result will be stored.
+ *
+ * @return
+ *   0 on success.
+ *   -1 in case of error, with ``rte_errno`` set to:
+ *     - EINVAL  - ``msl`` pointer was NULL or did not point to a valid memseg list
+ *     - EINVAL  - ``offset`` pointer was NULL
+ */
+__rte_experimental
+int
+rte_memseg_list_get_dmabuf_offset_unsafe(const struct rte_memseg_list *msl,
+		uint64_t *offset);
+
 /**
  * Register external memory chunk with DPDK.
  *
@@ -443,6 +535,59 @@ int
 rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
 		unsigned int n_pages, size_t page_sz);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Register external memory chunk backed by a dma-buf file descriptor and offset.
+ *
+ * This is similar to rte_extmem_register() but additionally stores dma-buf
+ * file descriptor information, allowing drivers to use dma-buf based
+ * memory registration (e.g., ibv_reg_dmabuf_mr for RDMA devices).
+ *
+ * @note Using this API is mutually exclusive with ``rte_malloc`` family of
+ *   API's.
+ *
+ * @note This API will not perform any DMA mapping. It is expected that user
+ *   will do that themselves via rte_dev_dma_map().
+ *
+ * @note Before accessing this memory in other processes, it needs to be
+ *   attached in each of those processes by calling ``rte_extmem_attach`` in
+ *   each other process.
+ *
+ * @param va_addr
+ *   Start of virtual area to register (mmap'd address of the dma-buf).
+ *   Must be aligned by ``page_sz``.
+ * @param len
+ *   Length of virtual area to register. Must be aligned by ``page_sz``.
+ *   This is independent of dma-buf offset.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf where the registered region starts.
+ * @param iova_addrs
+ *   Array of page IOVA addresses corresponding to each page in this memory
+ *   area. Can be NULL, in which case page IOVA addresses will be set to
+ *   RTE_BAD_IOVA.
+ * @param n_pages
+ *   Number of elements in the iova_addrs array. Ignored if ``iova_addrs``
+ *   is NULL.
+ * @param page_sz
+ *   Page size of the underlying memory
+ *
+ * @return
+ *   - 0 on success
+ *   - -1 in case of error, with rte_errno set to one of the following:
+ *     EINVAL - one of the parameters was invalid
+ *     EEXIST - memory chunk is already registered
+ *     ENOSPC - no more space in internal config to store a new memory chunk
+ */
+__rte_experimental
+int
+rte_extmem_register_dmabuf(void *va_addr, size_t len,
+		int dmabuf_fd, uint64_t dmabuf_offset,
+		rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz);
+
 /**
  * Unregister external memory chunk with DPDK.
  *
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 2/2] common/mlx5: support dmabuf
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
  2026-02-04 15:50       ` [PATCH v4 1/2] eal: " Cliff Burdick
@ 2026-02-04 15:50       ` Cliff Burdick
  2026-02-05 18:48       ` [PATCH v4 0/2] " Stephen Hemminger
  2026-03-31  3:15       ` Stephen Hemminger
  3 siblings, 0 replies; 27+ messages in thread
From: Cliff Burdick @ 2026-02-04 15:50 UTC (permalink / raw)
  To: dev
  Cc: anatoly.burakov, Thomas Monjalon, Dariusz Sosnowski,
	Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad

dmabuf is a modern Linux kernel feature to allow DMA transfers between
two drivers. Common examples of usage are streaming video devices and
NIC to GPU transfers. Prior to dmabuf users had to load proprietary
drivers to expose the DMA mappings. With dmabuf the proprietary drivers
are no longer required.

Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
---
 .mailmap                                      |   2 +-
 drivers/common/mlx5/linux/meson.build         |   2 +
 drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 +++++++-
 drivers/common/mlx5/linux/mlx5_glue.c         |  19 +++
 drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
 drivers/common/mlx5/mlx5_common.c             |  42 ++++++-
 drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++++++++-
 drivers/common/mlx5/mlx5_common_mr.h          |  17 ++-
 drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
 drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
 drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
 11 files changed, 249 insertions(+), 9 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4c2b2f921d..0a8a67098f 100644
--- a/.mailmap
+++ b/.mailmap
@@ -291,8 +291,8 @@ Cian Ferriter <cian.ferriter@intel.com>
 Ciara Loftus <ciara.loftus@intel.com>
 Ciara Power <ciara.power@intel.com>
 Claire Murphy <claire.k.murphy@intel.com>
-Cliff Burdick <cburdick@nvidia.com>
 Clemens Famulla-Conrad <cfamullaconrad@suse.com>
+Cliff Burdick <cburdick@nvidia.com>
 Cody Doucette <doucette@bu.edu>
 Congwen Zhang <zhang.congwen@zte.com.cn>
 Conor Fogarty <conor.fogarty@intel.com>
diff --git a/drivers/common/mlx5/linux/meson.build b/drivers/common/mlx5/linux/meson.build
index 3767e7a69b..8e83104165 100644
--- a/drivers/common/mlx5/linux/meson.build
+++ b/drivers/common/mlx5/linux/meson.build
@@ -203,6 +203,8 @@ has_sym_args = [
             'mlx5dv_dr_domain_allow_duplicate_rules' ],
         [ 'HAVE_MLX5_IBV_REG_MR_IOVA', 'infiniband/verbs.h',
             'ibv_reg_mr_iova' ],
+        [ 'HAVE_IBV_REG_DMABUF_MR', 'infiniband/verbs.h',
+            'ibv_reg_dmabuf_mr' ],
         [ 'HAVE_MLX5_IBV_IMPORT_CTX_PD_AND_MR', 'infiniband/verbs.h',
             'ibv_import_device' ],
         [ 'HAVE_MLX5DV_DR_ACTION_CREATE_DEST_ROOT_TABLE', 'infiniband/mlx5dv.h',
diff --git a/drivers/common/mlx5/linux/mlx5_common_verbs.c b/drivers/common/mlx5/linux/mlx5_common_verbs.c
index 98260df470..f6d18fd5df 100644
--- a/drivers/common/mlx5/linux/mlx5_common_verbs.c
+++ b/drivers/common/mlx5/linux/mlx5_common_verbs.c
@@ -129,6 +129,47 @@ mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 	return 0;
 }
 
+/**
+ * Register mr for dma-buf backed memory. Given protection domain pointer,
+ * dma-buf fd, offset and length, register the memory region.
+ *
+ * @param[in] pd
+ *   Pointer to protection domain context.
+ * @param[in] offset
+ *   Offset within the dma-buf.
+ * @param[in] length
+ *   Length of the memory to register.
+ * @param[in] fd
+ *   File descriptor of the dma-buf.
+ * @param[out] pmd_mr
+ *   pmd_mr struct set with lkey, address, length and pointer to mr object
+ *
+ * @return
+ *   0 on successful registration, -1 otherwise
+ */
+RTE_EXPORT_INTERNAL_SYMBOL(mlx5_common_verbs_reg_dmabuf_mr)
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr)
+{
+	struct ibv_mr *ibv_mr;
+	ibv_mr = mlx5_glue->reg_dmabuf_mr(pd, offset, length, iova, fd,
+					  IBV_ACCESS_LOCAL_WRITE |
+					  (haswell_broadwell_cpu ? 0 :
+					  IBV_ACCESS_RELAXED_ORDERING));
+	if (!ibv_mr)
+		return -1;
+
+	*pmd_mr = (struct mlx5_pmd_mr){
+		.lkey = ibv_mr->lkey,
+		.addr = ibv_mr->addr,
+		.len = ibv_mr->length,
+		.obj = (void *)ibv_mr,
+	};
+	return 0;
+}
+
 /**
  * Deregister mr. Given the mlx5 pmd MR - deregister the MR
  *
@@ -151,13 +192,18 @@ mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_common_verbs_reg_mr;
+	*reg_dmabuf_mr_cb = mlx5_common_verbs_reg_dmabuf_mr;
 	*dereg_mr_cb = mlx5_common_verbs_dereg_mr;
 }
diff --git a/drivers/common/mlx5/linux/mlx5_glue.c b/drivers/common/mlx5/linux/mlx5_glue.c
index a91eaa429d..6fac7f2bcd 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.c
+++ b/drivers/common/mlx5/linux/mlx5_glue.c
@@ -291,6 +291,24 @@ mlx5_glue_reg_mr_iova(struct ibv_pd *pd, void *addr, size_t length,
 #endif
 }
 
+static struct ibv_mr *
+mlx5_glue_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,
+			uint64_t iova, int fd, int access)
+{
+#ifdef HAVE_IBV_REG_DMABUF_MR
+	return ibv_reg_dmabuf_mr(pd, offset, length, iova, fd, access);
+#else
+	(void)pd;
+	(void)offset;
+	(void)length;
+	(void)iova;
+	(void)fd;
+	(void)access;
+	errno = ENOTSUP;
+	return NULL;
+#endif
+}
+
 static struct ibv_mr *
 mlx5_glue_alloc_null_mr(struct ibv_pd *pd)
 {
@@ -1619,6 +1637,7 @@ const struct mlx5_glue *mlx5_glue = &(const struct mlx5_glue) {
 	.modify_qp = mlx5_glue_modify_qp,
 	.reg_mr = mlx5_glue_reg_mr,
 	.reg_mr_iova = mlx5_glue_reg_mr_iova,
+	.reg_dmabuf_mr = mlx5_glue_reg_dmabuf_mr,
 	.alloc_null_mr = mlx5_glue_alloc_null_mr,
 	.dereg_mr = mlx5_glue_dereg_mr,
 	.create_counter_set = mlx5_glue_create_counter_set,
diff --git a/drivers/common/mlx5/linux/mlx5_glue.h b/drivers/common/mlx5/linux/mlx5_glue.h
index 81d6b0aaf9..66216d1194 100644
--- a/drivers/common/mlx5/linux/mlx5_glue.h
+++ b/drivers/common/mlx5/linux/mlx5_glue.h
@@ -219,6 +219,9 @@ struct mlx5_glue {
 	struct ibv_mr *(*reg_mr_iova)(struct ibv_pd *pd, void *addr,
 				      size_t length, uint64_t iova,
 				      int access);
+	struct ibv_mr *(*reg_dmabuf_mr)(struct ibv_pd *pd, uint64_t offset,
+					size_t length, uint64_t iova,
+					int fd, int access);
 	struct ibv_mr *(*alloc_null_mr)(struct ibv_pd *pd);
 	int (*dereg_mr)(struct ibv_mr *mr);
 	struct ibv_counter_set *(*create_counter_set)
diff --git a/drivers/common/mlx5/mlx5_common.c b/drivers/common/mlx5/mlx5_common.c
index 84a93e7dbd..82cf17ca78 100644
--- a/drivers/common/mlx5/mlx5_common.c
+++ b/drivers/common/mlx5/mlx5_common.c
@@ -13,6 +13,7 @@
 #include <rte_class.h>
 #include <rte_malloc.h>
 #include <rte_eal_paging.h>
+#include <rte_memory.h>
 
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
@@ -1125,6 +1126,7 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 	struct mlx5_common_device *dev;
 	struct mlx5_mr_btree *bt;
 	struct mlx5_mr *mr;
+	struct rte_memseg_list *msl;
 
 	dev = to_mlx5_device(rte_dev);
 	if (!dev) {
@@ -1134,8 +1136,44 @@ mlx5_common_dev_dma_map(struct rte_device *rte_dev, void *addr,
 		rte_errno = ENODEV;
 		return -1;
 	}
-	mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
-				SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	/* Check if this is dma-buf backed external memory */
+	msl = rte_mem_virt2memseg_list(addr);
+	if (msl != NULL && msl->external) {
+		int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+		if (dmabuf_fd >= 0) {
+			uint64_t dmabuf_off;
+			/* Get base offset from memseg list */
+			int ret = rte_memseg_list_get_dmabuf_offset(
+				msl, &dmabuf_off);
+			if (ret < 0) {
+				DRV_LOG(ERR,
+					"Failed to get dma-buf offset for memseg list %p",
+					(void *)msl);
+				return -1;
+			}
+			/* Calculate offset within dma-buf address */
+			dmabuf_off += ((uintptr_t)addr - (uintptr_t)msl->base_va);
+			/* Use dma-buf MR registration */
+			mr = mlx5_create_mr_ext_dmabuf(dev->pd,
+						       (uintptr_t)addr,
+						       len,
+						       SOCKET_ID_ANY,
+						       dmabuf_fd,
+						       dmabuf_off,
+						       dev->mr_scache.reg_dmabuf_mr_cb);
+		} else {
+			/* Use regular MR registration */
+			mr = mlx5_create_mr_ext(dev->pd,
+						(uintptr_t)addr,
+						len,
+						SOCKET_ID_ANY,
+						dev->mr_scache.reg_mr_cb);
+		}
+	} else {
+		/* Use regular MR registration */
+		mr = mlx5_create_mr_ext(dev->pd, (uintptr_t)addr, len,
+					SOCKET_ID_ANY, dev->mr_scache.reg_mr_cb);
+	}
 	if (!mr) {
 		DRV_LOG(WARNING, "Device %s unable to DMA map", rte_dev->name);
 		rte_errno = EINVAL;
diff --git a/drivers/common/mlx5/mlx5_common_mr.c b/drivers/common/mlx5/mlx5_common_mr.c
index 8ed988dec9..8f31eaefe8 100644
--- a/drivers/common/mlx5/mlx5_common_mr.c
+++ b/drivers/common/mlx5/mlx5_common_mr.c
@@ -8,6 +8,7 @@
 #include <rte_eal_memconfig.h>
 #include <rte_eal_paging.h>
 #include <rte_errno.h>
+#include <rte_memory.h>
 #include <rte_mempool.h>
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
@@ -1141,6 +1142,7 @@ mlx5_mr_create_cache(struct mlx5_mr_share_cache *share_cache, int socket)
 {
 	/* Set the reg_mr and dereg_mr callback functions */
 	mlx5_os_set_reg_mr_cb(&share_cache->reg_mr_cb,
+			      &share_cache->reg_dmabuf_mr_cb,
 			      &share_cache->dereg_mr_cb);
 	rte_rwlock_init(&share_cache->rwlock);
 	rte_rwlock_init(&share_cache->mprwlock);
@@ -1221,6 +1223,74 @@ mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 	return mr;
 }
 
+/**
+ * Creates a memory region for dma-buf backed external memory.
+ *
+ * @param pd
+ *   Pointer to pd of a device (net, regex, vdpa,...).
+ * @param addr
+ *   Starting virtual address of memory (mmap'd address).
+ * @param len
+ *   Length of memory segment being mapped.
+ * @param socket_id
+ *   Socket to allocate heap memory for the control structures.
+ * @param dmabuf_fd
+ *   File descriptor of the dma-buf.
+ * @param dmabuf_offset
+ *   Offset within the dma-buf.
+ * @param reg_dmabuf_mr_cb
+ *   Callback function for dma-buf MR registration.
+ *
+ * @return
+ *   Pointer to MR structure on success, NULL otherwise.
+ */
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb)
+{
+	struct mlx5_mr *mr = NULL;
+
+	if (reg_dmabuf_mr_cb == NULL) {
+		DRV_LOG(WARNING, "dma-buf MR registration not supported");
+		rte_errno = ENOTSUP;
+		return NULL;
+	}
+	mr = mlx5_malloc(MLX5_MEM_RTE | MLX5_MEM_ZERO,
+			 RTE_ALIGN_CEIL(sizeof(*mr), RTE_CACHE_LINE_SIZE),
+			 RTE_CACHE_LINE_SIZE, socket_id);
+	if (mr == NULL)
+		return NULL;
+	if (reg_dmabuf_mr_cb(pd, dmabuf_offset, len, addr, dmabuf_fd,
+			     &mr->pmd_mr) < 0) {
+		DRV_LOG(WARNING,
+			"Fail to create dma-buf MR for address (%p) fd=%d",
+			(void *)addr, dmabuf_fd);
+		mlx5_free(mr);
+		return NULL;
+	}
+	mr->msl = NULL; /* Mark it is external memory. */
+	mr->ms_bmp = NULL;
+	mr->ms_n = 1;
+	mr->ms_bmp_n = 1;
+	/*
+	 * For dma-buf MR, the returned addr may be NULL since there's no VA
+	 * in the registration. Store the user-provided addr for cache lookup.
+	 */
+	if (mr->pmd_mr.addr == NULL)
+		mr->pmd_mr.addr = (void *)addr;
+	if (mr->pmd_mr.len == 0)
+		mr->pmd_mr.len = len;
+	DRV_LOG(DEBUG,
+		"MR CREATED (%p) for dma-buf external memory %p (fd=%d):\n"
+		"  [0x%" PRIxPTR ", 0x%" PRIxPTR "),"
+		" lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u",
+		(void *)mr, (void *)addr, dmabuf_fd,
+		addr, addr + len, rte_cpu_to_be_32(mr->pmd_mr.lkey),
+		mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n);
+	return mr;
+}
+
 /**
  * Callback for memory free event. Iterate freed memsegs and check whether it
  * belongs to an existing MR. If found, clear the bit from bitmap of MR. As a
@@ -1747,9 +1817,48 @@ mlx5_mr_mempool_register_primary(struct mlx5_mr_share_cache *share_cache,
 		struct mlx5_mempool_mr *mr = &new_mpr->mrs[i];
 		const struct mlx5_range *range = &ranges[i];
 		size_t len = range->end - range->start;
+		struct rte_memseg_list *msl;
+		int reg_result;
+
+		/* Check if this is dma-buf backed external memory */
+		msl = rte_mem_virt2memseg_list((void *)range->start);
+		if (msl != NULL && msl->external &&
+		    share_cache->reg_dmabuf_mr_cb != NULL) {
+			int dmabuf_fd = rte_memseg_list_get_dmabuf_fd(msl);
+			if (dmabuf_fd >= 0) {
+				uint64_t dmabuf_off;
+				/* Get base offset from memseg list */
+				ret = rte_memseg_list_get_dmabuf_offset(msl, &dmabuf_off);
+				if (ret < 0) {
+					DRV_LOG(ERR, "Failed to get dma-buf offset for memseg list %p",
+						(void *)msl);
+					goto exit;
+				}
+				/* Calculate offset within dma-buf for this specific range */
+				dmabuf_off += (range->start - (uintptr_t)msl->base_va);
+				/* Use dma-buf MR registration */
+				reg_result = share_cache->reg_dmabuf_mr_cb(pd,
+					dmabuf_off, len, range->start, dmabuf_fd,
+					&mr->pmd_mr);
+				if (reg_result == 0) {
+					/* For dma-buf MR, set addr if not set by driver */
+					if (mr->pmd_mr.addr == NULL)
+						mr->pmd_mr.addr = (void *)range->start;
+					if (mr->pmd_mr.len == 0)
+						mr->pmd_mr.len = len;
+				}
+			} else {
+				/* Use regular MR registration */
+				reg_result = share_cache->reg_mr_cb(pd,
+					(void *)range->start, len, &mr->pmd_mr);
+			}
+		} else {
+			/* Use regular MR registration */
+			reg_result = share_cache->reg_mr_cb(pd,
+				(void *)range->start, len, &mr->pmd_mr);
+		}
 
-		if (share_cache->reg_mr_cb(pd, (void *)range->start, len,
-		    &mr->pmd_mr) < 0) {
+		if (reg_result < 0) {
 			DRV_LOG(ERR,
 				"Failed to create an MR in PD %p for address range "
 				"[0x%" PRIxPTR ", 0x%" PRIxPTR "] (%zu bytes) for mempool %s",
diff --git a/drivers/common/mlx5/mlx5_common_mr.h b/drivers/common/mlx5/mlx5_common_mr.h
index cf7c685e9b..3b967b1323 100644
--- a/drivers/common/mlx5/mlx5_common_mr.h
+++ b/drivers/common/mlx5/mlx5_common_mr.h
@@ -35,6 +35,9 @@ struct mlx5_pmd_mr {
  */
 typedef int (*mlx5_reg_mr_t)(void *pd, void *addr, size_t length,
 			     struct mlx5_pmd_mr *pmd_mr);
+typedef int (*mlx5_reg_dmabuf_mr_t)(void *pd, uint64_t offset, size_t length,
+				    uint64_t iova, int fd,
+				    struct mlx5_pmd_mr *pmd_mr);
 typedef void (*mlx5_dereg_mr_t)(struct mlx5_pmd_mr *pmd_mr);
 
 /* Memory Region object. */
@@ -87,6 +90,7 @@ struct __rte_packed_begin mlx5_mr_share_cache {
 	struct mlx5_mr_list mr_free_list; /* Freed MR list. */
 	struct mlx5_mempool_reg_list mempool_reg_list; /* Mempool database. */
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 } __rte_packed_end;
 
@@ -233,6 +237,10 @@ mlx5_mr_lookup_list(struct mlx5_mr_share_cache *share_cache,
 struct mlx5_mr *
 mlx5_create_mr_ext(void *pd, uintptr_t addr, size_t len, int socket_id,
 		   mlx5_reg_mr_t reg_mr_cb);
+struct mlx5_mr *
+mlx5_create_mr_ext_dmabuf(void *pd, uintptr_t addr, size_t len, int socket_id,
+			  int dmabuf_fd, uint64_t dmabuf_offset,
+			  mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb);
 void mlx5_mr_free(struct mlx5_mr *mr, mlx5_dereg_mr_t dereg_mr_cb);
 __rte_internal
 uint32_t
@@ -251,12 +259,19 @@ int
 mlx5_common_verbs_reg_mr(void *pd, void *addr, size_t length,
 			 struct mlx5_pmd_mr *pmd_mr);
 __rte_internal
+int
+mlx5_common_verbs_reg_dmabuf_mr(void *pd, uint64_t offset, size_t length,
+				uint64_t iova, int fd,
+				struct mlx5_pmd_mr *pmd_mr);
+__rte_internal
 void
 mlx5_common_verbs_dereg_mr(struct mlx5_pmd_mr *pmd_mr);
 
 __rte_internal
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb);
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb);
 
 __rte_internal
 int
diff --git a/drivers/common/mlx5/windows/mlx5_common_os.c b/drivers/common/mlx5/windows/mlx5_common_os.c
index 7fac361460..5e284742ab 100644
--- a/drivers/common/mlx5/windows/mlx5_common_os.c
+++ b/drivers/common/mlx5/windows/mlx5_common_os.c
@@ -17,6 +17,7 @@
 #include "mlx5_common.h"
 #include "mlx5_common_os.h"
 #include "mlx5_malloc.h"
+#include "mlx5_common_mr.h"
 
 /**
  * Initialization routine for run-time dependency on external lib.
@@ -442,15 +443,20 @@ mlx5_os_dereg_mr(struct mlx5_pmd_mr *pmd_mr)
  *
  * @param[out] reg_mr_cb
  *   Pointer to reg_mr func
+ * @param[out] reg_dmabuf_mr_cb
+ *   Pointer to reg_dmabuf_mr func (NULL on Windows - not supported)
  * @param[out] dereg_mr_cb
  *   Pointer to dereg_mr func
  *
  */
 RTE_EXPORT_INTERNAL_SYMBOL(mlx5_os_set_reg_mr_cb)
 void
-mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb, mlx5_dereg_mr_t *dereg_mr_cb)
+mlx5_os_set_reg_mr_cb(mlx5_reg_mr_t *reg_mr_cb,
+		      mlx5_reg_dmabuf_mr_t *reg_dmabuf_mr_cb,
+		      mlx5_dereg_mr_t *dereg_mr_cb)
 {
 	*reg_mr_cb = mlx5_os_reg_mr;
+	*reg_dmabuf_mr_cb = NULL; /* dma-buf not supported on Windows */
 	*dereg_mr_cb = mlx5_os_dereg_mr;
 }
 
diff --git a/drivers/crypto/mlx5/mlx5_crypto.h b/drivers/crypto/mlx5/mlx5_crypto.h
index f9f127e9e6..b2712c9a8d 100644
--- a/drivers/crypto/mlx5/mlx5_crypto.h
+++ b/drivers/crypto/mlx5/mlx5_crypto.h
@@ -41,6 +41,7 @@ struct mlx5_crypto_priv {
 	struct mlx5_common_device *cdev; /* Backend mlx5 device. */
 	struct rte_cryptodev *crypto_dev;
 	mlx5_reg_mr_t reg_mr_cb; /* Callback to reg_mr func */
+	mlx5_reg_dmabuf_mr_t reg_dmabuf_mr_cb; /* Callback to reg_dmabuf_mr func */
 	mlx5_dereg_mr_t dereg_mr_cb; /* Callback to dereg_mr func */
 	struct mlx5_uar uar; /* User Access Region. */
 	uint32_t max_segs_num; /* Maximum supported data segs. */
diff --git a/drivers/crypto/mlx5/mlx5_crypto_gcm.c b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
index 89f32c7722..380689cfeb 100644
--- a/drivers/crypto/mlx5/mlx5_crypto_gcm.c
+++ b/drivers/crypto/mlx5/mlx5_crypto_gcm.c
@@ -1186,7 +1186,8 @@ mlx5_crypto_gcm_init(struct mlx5_crypto_priv *priv)
 
 	/* Override AES-GCM specified ops. */
 	dev_ops->sym_session_configure = mlx5_crypto_sym_gcm_session_configure;
-	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb, &priv->dereg_mr_cb);
+	mlx5_os_set_reg_mr_cb(&priv->reg_mr_cb,  &priv->reg_dmabuf_mr_cb,
+			&priv->dereg_mr_cb);
 	dev_ops->queue_pair_setup = mlx5_crypto_gcm_qp_setup;
 	dev_ops->queue_pair_release = mlx5_crypto_gcm_qp_release;
 	if (mlx5_crypto_is_ipsec_opt(priv)) {
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/2] support dmabuf
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
  2026-02-04 15:50       ` [PATCH v4 1/2] eal: " Cliff Burdick
  2026-02-04 15:50       ` [PATCH v4 2/2] common/mlx5: " Cliff Burdick
@ 2026-02-05 18:48       ` Stephen Hemminger
  2026-02-05 20:25         ` Cliff Burdick
  2026-03-31  3:15       ` Stephen Hemminger
  3 siblings, 1 reply; 27+ messages in thread
From: Stephen Hemminger @ 2026-02-05 18:48 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev, anatoly.burakov

On Wed, 4 Feb 2026 15:50:07 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> Fixes since v3:
> * Fixed version in RTE_EXPORT_EXPERIMENTAL_SYMBOL
> 
> Add support for kernel dmabuf feature and integrate it in the mlx5 driver.
> This feature is needed to support GPUDirect on newer kernels.
> 
> I apologize for all the patches. Still trying to learn how to submit these.
> 
> Cliff Burdick (2):
>   eal: support dmabuf
>   common/mlx5: support dmabuf
> 
>  .mailmap                                      |   1 +
>  doc/guides/rel_notes/release_26_03.rst        |   6 +
>  drivers/common/mlx5/linux/meson.build         |   2 +
>  drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
>  drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
>  drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
>  drivers/common/mlx5/mlx5_common.c             |  42 ++++-
>  drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
>  drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
>  drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
>  drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
>  drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
>  lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
>  lib/eal/common/eal_memalloc.h                 |  21 +++
>  lib/eal/common/malloc_heap.c                  |  27 +++
>  lib/eal/common/malloc_heap.h                  |   5 +
>  lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
>  17 files changed, 612 insertions(+), 14 deletions(-)
> 

Any new library like this needs standalone tests so that it gets
covered in CI etc.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 0/2] support dmabuf
  2026-02-05 18:48       ` [PATCH v4 0/2] " Stephen Hemminger
@ 2026-02-05 20:25         ` Cliff Burdick
  2026-02-05 22:50           ` Stephen Hemminger
  0 siblings, 1 reply; 27+ messages in thread
From: Cliff Burdick @ 2026-02-05 20:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev@dpdk.org, anatoly.burakov@intel.com

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, February 5, 2026 10:49 AM
> To: Cliff Burdick <cburdick@nvidia.com>
> Cc: dev@dpdk.org; anatoly.burakov@intel.com
> Subject: Re: [PATCH v4 0/2] support dmabuf]()
> 
> External email: Use caution opening links or attachments
> 
> On Wed, 4 Feb 2026 15:50:07 +0000
> Cliff Burdick <cburdick@nvidia.com> wrote:
> 
> > Fixes since v3:
> > * Fixed version in RTE_EXPORT_EXPERIMENTAL_SYMBOL
> > 
> > Add support for kernel dmabuf feature and integrate it in the mlx5 driver.
> > This feature is needed to support GPUDirect on newer kernels.
> > 
> > I apologize for all the patches. Still trying to learn how to submit these.
> > 
> > Cliff Burdick (2):
> >   eal: support dmabuf
> >   common/mlx5: support dmabuf
> > 
> >  .mailmap                              |   1 +
> >  doc/guides/rel_notes/release_26_03.rst        |   6 +
> >  drivers/common/mlx5/linux/meson.build         |   2 +
> >  drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
> >  drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
> >  drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
> >  drivers/common/mlx5/mlx5_common.c             |  42 ++++-
> >  drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
> >  drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
> >  drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
> >  drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
> >  drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
> >  lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
> >  lib/eal/common/eal_memalloc.h                 |  21 +++
> >  lib/eal/common/malloc_heap.c                  |  27 +++
> >  lib/eal/common/malloc_heap.h                  |   5 +
> >  lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
> >  17 files changed, 612 insertions(+), 14 deletions(-)
> > 
> Any new library like this needs standalone tests so that it gets covered in CI etc.

I did not see any existing GPU tests, and this would require a GPU to test along with a 5.15+ Linux kernel. Do those systems exist in the test infrastructure?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/2] support dmabuf
  2026-02-05 20:25         ` Cliff Burdick
@ 2026-02-05 22:50           ` Stephen Hemminger
  0 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-02-05 22:50 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev@dpdk.org, anatoly.burakov@intel.com

On Thu, 5 Feb 2026 20:25:01 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> > Any new library like this needs standalone tests so that it gets covered in CI etc.  
> 
> I did not see any existing GPU tests, and this would require a GPU to test along with a 5.15+ Linux kernel. Do those systems exist in the test infrastructure?

That was a mistake in allowing them in.
The problem was GPU requires CUDA, but that could have been handled.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/2] eal: support dmabuf
  2026-02-04 15:50       ` [PATCH v4 1/2] eal: " Cliff Burdick
@ 2026-02-12 13:57         ` Burakov, Anatoly
  0 siblings, 0 replies; 27+ messages in thread
From: Burakov, Anatoly @ 2026-02-12 13:57 UTC (permalink / raw)
  To: Cliff Burdick, dev; +Cc: Thomas Monjalon

On 2/4/2026 4:50 PM, Cliff Burdick wrote:
> dmabuf is a modern Linux kernel feature to allow DMA transfers between
> two drivers. Common examples of usage are streaming video devices and
> NIC to GPU transfers. Prior to dmabuf users had to load proprietary
> drivers to expose the DMA mappings. With dmabuf the proprietary drivers
> are no longer required.
> 
> A new api function rte_extmem_register_dmabuf is introduced to create
> the mapping from a dmabuf file descriptor. dmabuf uses a file descriptor
> and an offset that has been pre-opened with the kernel. The kernel uses
> the file descriptor to map to a VA pointer. To avoid ABI changes, a
> static struct is used inside of eal_common_memory.c, and lookups are
> done on this struct rather than from the rte_memseg_list.
> 
> Ideally we would like to add both the dmabuf file descriptor and offset
> to rte_memseg_list, but it's not clear if we can reuse existing fields
> when using the dmabuf API.
> 
> We could rename the external flag to a more generic "properties" flag
> where "external" is the lowest bit, then we can use the second bit to
> indicate the presence of dmabuf. In the presence of the flag for
> dmabuf we could reuse the base_va address field for the dmabuf offset,
> and the socket_id for the file descriptor.
> 
> Signed-off-by: Cliff Burdick <cburdick@nvidia.com>
> ---

Hi,

A few random thoughts about the patchset.

For one, this API is obviously Linux-only. This in itself is not a 
problem (we do have VFIO API...) but I would really like to avoid that 
if possible.

For another, I don't see any support for secondary processes - the 
dmabuf array is process-local, and calling register() from secondary 
process would presumably either fail or create a duplicate segment, 
depending on exactly what you pass into the register call. If this 
scenario isn't supported, it should at least be explicitly disallowed 
and documented to be such.

My biggest concern is that this is creating another type of external 
memory segment and thus segregating the API, but isn't doing it in a way 
that is generic. I can see a valid usecase for this, but what we're 
essentially doing here is storing some metadata together with the 
segment. So, perhaps, this is what we should do? That would seem like a 
cleanest solution for me, and it would extend usefulness of the API to 
other use cases where there may be a requirement to store some 
metadata/fd/whatever with the segment.

You could then build another API on top of this (a library?) that would 
handle things like secondary process synchronization with IPC, so that 
you have all fd's valid in all processes.

Thoughts?
-- 
Thanks,
Anatoly

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 0/2] support dmabuf
  2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
                         ` (2 preceding siblings ...)
  2026-02-05 18:48       ` [PATCH v4 0/2] " Stephen Hemminger
@ 2026-03-31  3:15       ` Stephen Hemminger
  3 siblings, 0 replies; 27+ messages in thread
From: Stephen Hemminger @ 2026-03-31  3:15 UTC (permalink / raw)
  To: Cliff Burdick; +Cc: dev, anatoly.burakov

On Wed, 4 Feb 2026 15:50:07 +0000
Cliff Burdick <cburdick@nvidia.com> wrote:

> Fixes since v3:
> * Fixed version in RTE_EXPORT_EXPERIMENTAL_SYMBOL
> 
> Add support for kernel dmabuf feature and integrate it in the mlx5 driver.
> This feature is needed to support GPUDirect on newer kernels.
> 
> I apologize for all the patches. Still trying to learn how to submit these.
> 
> Cliff Burdick (2):
>   eal: support dmabuf
>   common/mlx5: support dmabuf
> 
>  .mailmap                                      |   1 +
>  doc/guides/rel_notes/release_26_03.rst        |   6 +
>  drivers/common/mlx5/linux/meson.build         |   2 +
>  drivers/common/mlx5/linux/mlx5_common_verbs.c |  48 ++++-
>  drivers/common/mlx5/linux/mlx5_glue.c         |  19 ++
>  drivers/common/mlx5/linux/mlx5_glue.h         |   3 +
>  drivers/common/mlx5/mlx5_common.c             |  42 ++++-
>  drivers/common/mlx5/mlx5_common_mr.c          | 113 +++++++++++-
>  drivers/common/mlx5/mlx5_common_mr.h          |  17 +-
>  drivers/common/mlx5/windows/mlx5_common_os.c  |   8 +-
>  drivers/crypto/mlx5/mlx5_crypto.h             |   1 +
>  drivers/crypto/mlx5/mlx5_crypto_gcm.c         |   3 +-
>  lib/eal/common/eal_common_memory.c            | 165 +++++++++++++++++-
>  lib/eal/common/eal_memalloc.h                 |  21 +++
>  lib/eal/common/malloc_heap.c                  |  27 +++
>  lib/eal/common/malloc_heap.h                  |   5 +
>  lib/eal/include/rte_memory.h                  | 145 +++++++++++++++
>  17 files changed, 612 insertions(+), 14 deletions(-)
> 

I don't think anyone look at the details here.
If they did they would see the same thing as what AI did.


Review: [PATCH v4 1/2] eal: support dmabuf
        [PATCH v4 2/2] common/mlx5: support dmabuf

Good approach using a side-table to avoid ABI changes to
rte_memseg_list. The dmabuf support for mlx5 is well-structured
with proper compile-time gating via HAVE_IBV_REG_DMABUF_MR.

Patch 1/2 - eal: support dmabuf

Error: rte_extmem_register is broken by rte_extmem_register_dmabuf.

rte_extmem_register now calls rte_extmem_register_dmabuf with
fd=-1. But rte_extmem_register_dmabuf rejects dmabuf_fd < 0:

  int
  rte_extmem_register_dmabuf(void *va_addr, size_t len,
      int dmabuf_fd, uint64_t dmabuf_offset,
      rte_iova_t iova_addrs[], unsigned int n_pages, size_t page_sz)
  {
      if (dmabuf_fd < 0) {
          rte_errno = EINVAL;
          return -1;
      }
      ...

So every call to rte_extmem_register will now fail with EINVAL.
rte_extmem_register should call extmem_register directly (with
fd=-1), not rte_extmem_register_dmabuf:

  int
  rte_extmem_register(void *va_addr, size_t len, rte_iova_t iova_addrs[],
      unsigned int n_pages, size_t page_sz)
  {
      return extmem_register(va_addr, len, -1, 0, iova_addrs,
          n_pages, page_sz);
  }

Error: Input validation from rte_extmem_register is lost.

The original rte_extmem_register validated va_addr, page_sz, len,
alignment, and n_pages before doing anything. The new
extmem_register function has no input validation -- it only has
the locking and memseg creation logic. The parameter checks that
were in the original function (va_addr == NULL, page_sz == 0,
len == 0, power-of-2, alignment) need to be in extmem_register
or duplicated in both rte_extmem_register and
rte_extmem_register_dmabuf.

Error: eal_memseg_list_init uses type_msl_idx to index dmabuf_info
but this is not the same as the memseg list index in mcfg->memsegs.

eal_memseg_list_init initializes dmabuf_info[type_msl_idx], but
type_msl_idx is a per-type index used for naming, not the global
memseg list position. When malloc_heap_create_external_seg_dmabuf
later sets dmabuf_info[msl_idx] where msl_idx = msl - mcfg->memsegs,
these are different index spaces. The init in eal_memseg_list_init
may write to wrong slots, and external segments may use indices
that were never initialized by eal_memseg_list_init. The static
dmabuf_info array is zero-initialized at program start (fd=0,
offset=0), but fd=0 is a valid file descriptor (stdin), so fd
should be initialized to -1 for all entries, or the
eal_memseg_list_init initialization should be removed and
initialization done only in malloc_heap_create_external_seg and
malloc_heap_create_external_seg_dmabuf.

Warning: malloc_heap_create_external_seg now redundantly
initializes dmabuf_info.

malloc_heap_create_external_seg calls
eal_memseg_list_set_dmabuf_info(i, -1, 0) at the end, but
malloc_heap_create_external_seg_dmabuf also calls
malloc_heap_create_external_seg (which sets fd=-1) and then
immediately overwrites with the actual fd. This works but is
confusing -- the redundant initialization in the base function
was added for non-dmabuf paths but makes the dmabuf path do a
useless set-then-overwrite.

Warning: dmabuf_info is process-local static storage.

The side-table approach means dmabuf metadata is not shared with
secondary processes via shared memory. The commit message mentions
rte_extmem_attach for cross-process use, but secondary processes
will not have the dmabuf_info populated. If a secondary process
calls rte_memseg_list_get_dmabuf_fd, it will always get -1. This
limitation should be documented in the Doxygen for
rte_extmem_register_dmabuf.

Warning: dmabuf fd lifetime / ownership is not documented.

The API does not specify whether DPDK takes ownership of the
dmabuf_fd (will it close it?) or whether the caller must keep it
open. Since ibv_reg_dmabuf_mr likely holds a reference through
the kernel, the fd can probably be closed after registration,
but this should be explicitly documented in the Doxygen. Also,
rte_extmem_unregister does not clean up the dmabuf_info entry
(reset fd to -1), so stale metadata remains after unregistration.

Warning: the public API functions in eal_memalloc.h have the same
names as the public API functions in rte_memory.h but different
signatures.

eal_memalloc.h declares:
  int eal_memseg_list_get_dmabuf_fd(int list_idx);
  int eal_memseg_list_get_dmabuf_offset(int list_idx, uint64_t *offset);

rte_memory.h declares:
  int rte_memseg_list_get_dmabuf_fd(const struct rte_memseg_list *msl);
  int rte_memseg_list_get_dmabuf_offset(const struct rte_memseg_list *msl, ...);

These are distinct functions, but the eal_memalloc.h header claims
to declare:
  "Get dma-buf fd for a memseg list."
  "Get dma-buf offset for a memseg list."
...with names prefixed eal_ vs rte_. This is fine functionally but
the internal and public function signatures should use the same
error convention. The internal eal_memseg_list_get_dmabuf_fd
returns -EINVAL on error, while the public
rte_memseg_list_get_dmabuf_fd_unsafe returns -1 with rte_errno.
The public function does not call the internal function at all --
it accesses dmabuf_info directly. The internal functions in
eal_memalloc.h appear unused.

Patch 2/2 - common/mlx5: support dmabuf

Warning: mlx5_os_set_reg_mr_cb signature change is an internal
ABI break.

The function signature changes from 2 to 3 parameters. This is
an internal symbol so it's not a public ABI break, but all
callers must be updated atomically. The Windows and Linux
implementations are both updated, and the crypto caller is
updated, so this appears complete. Verify no other callers exist
outside this patch.

Warning: duplicate dmabuf detection logic in mlx5_common.c and
mlx5_common_mr.c.

The pattern of checking msl->external, getting dmabuf_fd, getting
dmabuf_offset, and calculating the adjusted offset is repeated
nearly identically in mlx5_common_dev_dma_map and
mlx5_mr_mempool_register_primary. This should be factored into a
helper function to avoid the code duplication and ensure consistent
behavior.

Info: mlx5_common_verbs_reg_dmabuf_mr passes addr as iova.

The call is:
  reg_dmabuf_mr_cb(pd, dmabuf_offset, len, addr, dmabuf_fd, ...)

where addr is the user-space virtual address. For ibv_reg_dmabuf_mr,
the iova parameter is the "IO virtual address the device will use
to access the region." Using the user-space VA as iova means the
MR maps virtual addresses 1:1. This is the common pattern for
DPDK but worth noting that it won't work if the device expects
different IO addresses.

Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2026-03-31  3:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 17:44 [PATCH 0/2] support dmabuf Cliff Burdick
2026-01-27 17:44 ` [PATCH 1/2] eal: " Cliff Burdick
2026-01-29  1:48   ` Stephen Hemminger
2026-01-29  1:51   ` Stephen Hemminger
2026-01-27 17:44 ` [PATCH 2/2] common/mlx5: " Cliff Burdick
2026-01-27 19:21   ` [REVIEW] " Stephen Hemminger
2026-01-28 14:30     ` David Marchand
2026-01-28 17:10       ` Stephen Hemminger
2026-01-28 17:43       ` Stephen Hemminger
2026-02-03 17:34     ` Cliff Burdick
2026-01-29  1:51   ` [PATCH 2/2] " Stephen Hemminger
2026-01-28  0:04 ` [PATCH 0/2] " Stephen Hemminger
2026-02-03 17:18   ` Cliff Burdick
2026-02-03 22:26 ` [PATCH v2 " Cliff Burdick
2026-02-03 22:26   ` [PATCH v2 1/2] eal: " Cliff Burdick
2026-02-03 22:26   ` [PATCH v2 2/2] common/mlx5: " Cliff Burdick
2026-02-03 23:02   ` [PATCH v3 0/2] " Cliff Burdick
2026-02-03 23:02     ` [PATCH v3 1/2] eal: " Cliff Burdick
2026-02-03 23:02     ` [PATCH v3 2/2] common/mlx5: " Cliff Burdick
2026-02-04 15:50     ` [PATCH v4 0/2] " Cliff Burdick
2026-02-04 15:50       ` [PATCH v4 1/2] eal: " Cliff Burdick
2026-02-12 13:57         ` Burakov, Anatoly
2026-02-04 15:50       ` [PATCH v4 2/2] common/mlx5: " Cliff Burdick
2026-02-05 18:48       ` [PATCH v4 0/2] " Stephen Hemminger
2026-02-05 20:25         ` Cliff Burdick
2026-02-05 22:50           ` Stephen Hemminger
2026-03-31  3:15       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox