* [PATCH 0/2] RDMA/rxe: Add dma-buf support for Soft-RoCE
@ 2026-03-26 5:27 Zhu Yanjun
2026-03-26 5:27 ` [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device Zhu Yanjun
2026-03-26 5:27 ` [PATCH 2/2] RDMA/rxe: Add dma-buf support Zhu Yanjun
0 siblings, 2 replies; 4+ messages in thread
From: Zhu Yanjun @ 2026-03-26 5:27 UTC (permalink / raw)
To: jgg, leon, zyjzyj2000, linux-rdma, yanjun.zhu, mie
This patchset introduces dma-buf support for the Soft-RoCE (RXE) driver.
By enabling dma-buf, RXE can now participate in zero-copy data transfers
with other providers (such as GPUs) that export memory via dma-buf fds.
Traditionally, RXE only supported user-space memory regions (UMEM) based
on system RAM. This change extends RXE’s capability to handle peer-to-peer
(P2P) like workflows in a software-defined RDMA environment.
This patchset pass the rdma-core tests with the following link for the
rdma-core:
https://github.com/linux-rdma/rdma-core/pull/1055
Zhu Yanjun (2):
RDMA/umem: Change for rdma devices has not dma device
RDMA/rxe: Add dma-buf support
drivers/infiniband/core/umem_dmabuf.c | 35 ++++++++++-
drivers/infiniband/sw/rxe/rxe.c | 2 +
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 89 ++++++++++++++++++++++++---
drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
drivers/infiniband/sw/rxe/rxe_verbs.c | 40 ++++++++++++
drivers/infiniband/sw/rxe/rxe_verbs.h | 2 +-
include/rdma/ib_umem.h | 1 +
8 files changed, 161 insertions(+), 12 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device
2026-03-26 5:27 [PATCH 0/2] RDMA/rxe: Add dma-buf support for Soft-RoCE Zhu Yanjun
@ 2026-03-26 5:27 ` Zhu Yanjun
2026-03-30 13:13 ` Leon Romanovsky
2026-03-26 5:27 ` [PATCH 2/2] RDMA/rxe: Add dma-buf support Zhu Yanjun
1 sibling, 1 reply; 4+ messages in thread
From: Zhu Yanjun @ 2026-03-26 5:27 UTC (permalink / raw)
To: jgg, leon, zyjzyj2000, linux-rdma, yanjun.zhu, mie
Current implementation requires a dma device for RDMA driver to use
dma-buf memory space as RDMA buffer.
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/core/umem_dmabuf.c | 35 ++++++++++++++++++++++++++-
include/rdma/ib_umem.h | 1 +
2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
index d30f24b90bca..65c5f09f380f 100644
--- a/drivers/infiniband/core/umem_dmabuf.c
+++ b/drivers/infiniband/core/umem_dmabuf.c
@@ -142,6 +142,8 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
goto out_release_dmabuf;
}
+ umem_dmabuf->dmabuf = dmabuf;
+
umem = &umem_dmabuf->umem;
umem->ibdev = device;
umem->length = size;
@@ -152,6 +154,24 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
if (!ib_umem_num_pages(umem))
goto out_free_umem;
+ /* Software RDMA drivers has not dma device. Just get dmabuf from fd */
+ if (!device->dma_device) {
+ struct sg_table *sgt;
+
+ dma_resv_lock(dmabuf->resv, NULL);
+ sgt = dmabuf->ops->map_dma_buf(NULL, DMA_BIDIRECTIONAL);
+ dma_resv_unlock(dmabuf->resv);
+ if (IS_ERR(sgt)) {
+ ret = ERR_CAST(sgt);
+ goto out_free_umem;
+ }
+ umem_dmabuf->sgt = sgt;
+ goto done;
+ }
+
+ if (unlikely(!ops || !ops->move_notify))
+ goto out_free_umem;
+
umem_dmabuf->attach = dma_buf_dynamic_attach(
dmabuf,
dma_device,
@@ -161,6 +181,7 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
ret = ERR_CAST(umem_dmabuf->attach);
goto out_free_umem;
}
+done:
return umem_dmabuf;
out_free_umem:
@@ -260,11 +281,23 @@ EXPORT_SYMBOL(ib_umem_dmabuf_revoke);
void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf)
{
- struct dma_buf *dmabuf = umem_dmabuf->attach->dmabuf;
+ struct dma_buf *dmabuf = umem_dmabuf->dmabuf;
+
+ if (!umem_dmabuf->attach) {
+ if (umem_dmabuf->sgt) {
+ dma_resv_lock(dmabuf->resv, NULL);
+ dmabuf->ops->unmap_dma_buf(NULL, umem_dmabuf->sgt,
+ DMA_BIDIRECTIONAL);
+ dma_resv_unlock(dmabuf->resv);
+ }
+ goto free_dmabuf;
+ }
ib_umem_dmabuf_revoke(umem_dmabuf);
dma_buf_detach(dmabuf, umem_dmabuf->attach);
+
+free_dmabuf:
dma_buf_put(dmabuf);
kfree(umem_dmabuf);
}
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 0a8e092c0ea8..6eb5760d5ca3 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -31,6 +31,7 @@ struct ib_umem {
struct ib_umem_dmabuf {
struct ib_umem umem;
struct dma_buf_attachment *attach;
+ struct dma_buf *dmabuf;
struct sg_table *sgt;
struct scatterlist *first_sg;
struct scatterlist *last_sg;
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] RDMA/rxe: Add dma-buf support
2026-03-26 5:27 [PATCH 0/2] RDMA/rxe: Add dma-buf support for Soft-RoCE Zhu Yanjun
2026-03-26 5:27 ` [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device Zhu Yanjun
@ 2026-03-26 5:27 ` Zhu Yanjun
1 sibling, 0 replies; 4+ messages in thread
From: Zhu Yanjun @ 2026-03-26 5:27 UTC (permalink / raw)
To: jgg, leon, zyjzyj2000, linux-rdma, yanjun.zhu, mie
Implement a ib device operation ‘reg_user_mr_dmabuf’. Generate a
rxe_map from the memory space linked the passed dma-buf.
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
drivers/infiniband/sw/rxe/rxe.c | 2 +
drivers/infiniband/sw/rxe/rxe_loc.h | 2 +
drivers/infiniband/sw/rxe/rxe_mr.c | 89 ++++++++++++++++++++++++---
drivers/infiniband/sw/rxe/rxe_odp.c | 2 +-
drivers/infiniband/sw/rxe/rxe_verbs.c | 40 ++++++++++++
drivers/infiniband/sw/rxe/rxe_verbs.h | 2 +-
6 files changed, 126 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index e891199cbdef..9920ea3104be 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -278,3 +278,5 @@ late_initcall(rxe_module_init);
module_exit(rxe_module_exit);
MODULE_ALIAS_RDMA_LINK("rxe");
+
+MODULE_IMPORT_NS("DMA_BUF");
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 7992290886e1..dc9a56450c82 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -66,6 +66,8 @@ int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
unsigned int length, enum rxe_mr_copy_dir dir);
+int rxe_mr_dmabuf_init_user(struct rxe_pd *pd, int fd, u64 start, u64 length,
+ u64 iova, int access, struct rxe_mr *mr);
int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
void *addr, int length, enum rxe_mr_copy_dir dir);
int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index c696ff874980..5c129a488b83 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -5,6 +5,7 @@
*/
#include <linux/libnvdimm.h>
+#include <linux/dma-buf.h>
#include "rxe.h"
#include "rxe_loc.h"
@@ -90,7 +91,7 @@ static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
{
int idx;
- if (mr_page_size(mr) > PAGE_SIZE)
+ if (rxe_mr_page_size(mr) > PAGE_SIZE)
idx = (iova - (mr->ibmr.iova & mr->page_mask)) >> PAGE_SHIFT;
else
idx = (iova >> mr->page_shift) -
@@ -103,15 +104,15 @@ static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
/*
* Convert iova to offset within the page_info entry.
*
- * For mr_page_size > PAGE_SIZE, the offset is within the system page.
- * For mr_page_size <= PAGE_SIZE, the offset is within the MR page size.
+ * For rxe_mr_page_size > PAGE_SIZE, the offset is within the system page.
+ * For rxe_mr_page_size <= PAGE_SIZE, the offset is within the MR page size.
*/
static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova)
{
- if (mr_page_size(mr) > PAGE_SIZE)
+ if (rxe_mr_page_size(mr) > PAGE_SIZE)
return iova & (PAGE_SIZE - 1);
else
- return iova & (mr_page_size(mr) - 1);
+ return iova & (rxe_mr_page_size(mr) - 1);
}
static bool is_pmem_page(struct page *pg)
@@ -129,7 +130,7 @@ static int rxe_mr_fill_pages_from_sgt(struct rxe_mr *mr, struct sg_table *sgt)
struct page *page;
bool persistent = !!(mr->access & IB_ACCESS_FLUSH_PERSISTENT);
- WARN_ON(mr_page_size(mr) != PAGE_SIZE);
+ WARN_ON(rxe_mr_page_size(mr) != PAGE_SIZE);
__sg_page_iter_start(&sg_iter, sgt->sgl, sgt->orig_nents, 0);
if (!__sg_page_iter_next(&sg_iter))
@@ -224,6 +225,75 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
return err;
}
+static int rxe_map_dmabuf_mr(struct rxe_mr *mr, struct ib_umem_dmabuf *umem_dmabuf)
+{
+ unsigned int page_size = rxe_mr_page_size(mr);
+ struct sg_table *sgt = umem_dmabuf->sgt;
+ struct scatterlist *sg;
+ struct page *page;
+ int i, j, n = 0;
+
+ mr->page_shift = ilog2(page_size);
+ mr->page_mask = ~((u64)page_size - 1);
+ mr->nbuf = 0;
+
+ for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+ page = sg_page(sg);
+ for (j = 0; j < (sg->length >> PAGE_SHIFT); j++) {
+ mr->page_info[n].page = page + j;
+ mr->page_info[n].offset = 0;
+ n++;
+ }
+ }
+
+ mr->nbuf = n;
+ return 0;
+}
+
+int rxe_mr_dmabuf_init_user(struct rxe_pd *pd, int fd, u64 start, u64 length,
+ u64 iova, int access, struct rxe_mr *mr)
+{
+ struct ib_umem_dmabuf *umem_dmabuf;
+ int err;
+
+ umem_dmabuf = ib_umem_dmabuf_get(pd->ibpd.device, start, length, fd,
+ access, NULL);
+ if (IS_ERR(umem_dmabuf)) {
+ err = PTR_ERR(umem_dmabuf);
+ goto err_out;
+ }
+
+ rxe_mr_init(access, mr);
+
+ err = alloc_mr_page_info(mr, ib_umem_num_pages(&umem_dmabuf->umem));
+ if (err) {
+ pr_warn("%s: Unable to allocate memory for map\n", __func__);
+ goto err_release_umem;
+ }
+
+ mr->ibmr.pd = &pd->ibpd;
+ mr->ibmr.iova = iova;
+ mr->umem = &umem_dmabuf->umem;
+ mr->access = access;
+ mr->state = RXE_MR_STATE_VALID;
+ mr->ibmr.type = IB_MR_TYPE_USER;
+
+ err = rxe_map_dmabuf_mr(mr, umem_dmabuf);
+ if (err)
+ goto err_free_mr_map;
+
+ return 0;
+
+err_free_mr_map:
+ free_mr_page_info(mr);
+
+err_release_umem:
+ ib_umem_release(&umem_dmabuf->umem);
+
+err_out:
+ return err;
+}
+
int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr)
{
int err;
@@ -260,7 +330,7 @@ static int rxe_set_page(struct ib_mr *ibmr, u64 dma_addr)
{
struct rxe_mr *mr = to_rmr(ibmr);
bool persistent = !!(mr->access & IB_ACCESS_FLUSH_PERSISTENT);
- u32 i, pages_per_mr = mr_page_size(mr) >> PAGE_SHIFT;
+ u32 i, pages_per_mr = rxe_mr_page_size(mr) >> PAGE_SHIFT;
pages_per_mr = MAX(1, pages_per_mr);
@@ -288,7 +358,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
int sg_nents, unsigned int *sg_offset)
{
struct rxe_mr *mr = to_rmr(ibmr);
- unsigned int page_size = mr_page_size(mr);
+ unsigned int page_size = rxe_mr_page_size(mr);
/*
* Ensure page_size and PAGE_SIZE are compatible for mapping.
@@ -302,7 +372,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
return -EINVAL;
}
- if (mr_page_size(mr) > PAGE_SIZE) {
+ if (rxe_mr_page_size(mr) > PAGE_SIZE) {
/* resize page_info if needed */
u32 map_mr_pages = (page_size >> PAGE_SHIFT) * mr->num_buf;
@@ -809,6 +879,7 @@ void rxe_mr_cleanup(struct rxe_pool_elem *elem)
struct rxe_mr *mr = container_of(elem, typeof(*mr), elem);
rxe_put(mr_pd(mr));
+
ib_umem_release(mr->umem);
if (mr->ibmr.type != IB_MR_TYPE_DMA)
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index bc11b1ec59ac..12c48f0cae47 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -351,7 +351,7 @@ int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
page = hmm_pfn_to_page(umem_odp->map.pfn_list[index]);
bytes = min_t(unsigned int, length,
- mr_page_size(mr) - page_offset);
+ rxe_mr_page_size(mr) - page_offset);
va = kmap_local_page(page);
arch_wb_cache_pmem(va + page_offset, bytes);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index fe41362c5144..1b5381b14d4b 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1358,6 +1358,45 @@ static struct ib_mr *rxe_rereg_user_mr(struct ib_mr *ibmr, int flags,
return NULL;
}
+static struct ib_mr *rxe_reg_user_mr_dmabuf(struct ib_pd *ibpd, u64 start,
+ u64 length, u64 iova, int fd,
+ int access, struct ib_dmah *dmah,
+ struct uverbs_attr_bundle *udata)
+{
+ int err;
+ struct rxe_dev *rxe = to_rdev(ibpd->device);
+ struct rxe_pd *pd = to_rpd(ibpd);
+ struct rxe_mr *mr;
+
+ mr = kzalloc_obj(*mr);
+ if (!mr) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ err = rxe_add_to_pool(&rxe->mr_pool, mr);
+ if (err)
+ goto err_free;
+
+ rxe_get(pd);
+
+ err = rxe_mr_dmabuf_init_user(pd, fd, start, length, iova, access, mr);
+ if (err)
+ goto err3;
+
+ return &mr->ibmr;
+
+err3:
+ rxe_put(pd);
+ rxe_put(mr);
+
+err_free:
+ kfree(mr);
+
+err_out:
+ return ERR_PTR(err);
+}
+
static struct ib_mr *rxe_alloc_mr(struct ib_pd *ibpd, enum ib_mr_type mr_type,
u32 max_num_sg)
{
@@ -1517,6 +1556,7 @@ static const struct ib_device_ops rxe_dev_ops = {
.query_qp = rxe_query_qp,
.query_srq = rxe_query_srq,
.reg_user_mr = rxe_reg_user_mr,
+ .reg_user_mr_dmabuf = rxe_reg_user_mr_dmabuf,
.req_notify_cq = rxe_req_notify_cq,
.rereg_user_mr = rxe_rereg_user_mr,
.resize_cq = rxe_resize_cq,
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index fb149f37e91d..9d77bec0bf3c 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -364,7 +364,7 @@ struct rxe_mr {
struct rxe_mr_page *page_info;
};
-static inline unsigned int mr_page_size(struct rxe_mr *mr)
+static inline unsigned int rxe_mr_page_size(struct rxe_mr *mr)
{
return mr ? mr->ibmr.page_size : PAGE_SIZE;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device
2026-03-26 5:27 ` [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device Zhu Yanjun
@ 2026-03-30 13:13 ` Leon Romanovsky
0 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2026-03-30 13:13 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: jgg, zyjzyj2000, linux-rdma, mie
On Wed, Mar 25, 2026 at 10:27:38PM -0700, Zhu Yanjun wrote:
> Current implementation requires a dma device for RDMA driver to use
> dma-buf memory space as RDMA buffer.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
> ---
> drivers/infiniband/core/umem_dmabuf.c | 35 ++++++++++++++++++++++++++-
> include/rdma/ib_umem.h | 1 +
> 2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c
> index d30f24b90bca..65c5f09f380f 100644
> --- a/drivers/infiniband/core/umem_dmabuf.c
> +++ b/drivers/infiniband/core/umem_dmabuf.c
> @@ -142,6 +142,8 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> goto out_release_dmabuf;
> }
>
> + umem_dmabuf->dmabuf = dmabuf;
> +
> umem = &umem_dmabuf->umem;
> umem->ibdev = device;
> umem->length = size;
> @@ -152,6 +154,24 @@ ib_umem_dmabuf_get_with_dma_device(struct ib_device *device,
> if (!ib_umem_num_pages(umem))
> goto out_free_umem;
>
> + /* Software RDMA drivers has not dma device. Just get dmabuf from fd */
> + if (!device->dma_device) {
I understand your intent, but RXE cannot support dmabuf, as dmabuf
requires a real DMA‑capable device to be attached. In addition, it is
unclear how you expect PCI peer‑to‑peer operations to work when one
side (RXE) is not a PCI device at all.
Thanks
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-30 13:13 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 5:27 [PATCH 0/2] RDMA/rxe: Add dma-buf support for Soft-RoCE Zhu Yanjun
2026-03-26 5:27 ` [PATCH 1/2] RDMA/umem: Change for rdma devices has not dma device Zhu Yanjun
2026-03-30 13:13 ` Leon Romanovsky
2026-03-26 5:27 ` [PATCH 2/2] RDMA/rxe: Add dma-buf support Zhu Yanjun
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.