* [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP
@ 2025-03-14 8:10 Daisuke Matsuda
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-03-14 8:10 UTC (permalink / raw)
To: linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian, Daisuke Matsuda
RDMA FLUSH[1] and ATOMIC WRITE[2] were added to rxe, but they cannot run
in the ODP mode as of now. This series is for the kernel-side enablement.
There are also minor changes in libibverbs and pyverbs. The rdma-core tests
are also added so that people can test the features.
PR: https://github.com/linux-rdma/rdma-core/pull/1580
You can try the patches with the tree below:
https://github.com/ddmatsu/linux/tree/odp-extension
Note that the tree is a bit old (6.13-rc1), because there was an issue[3]
in the for-next tree that disabled ibv_query_device_ex(), which is used to
query ODP capabilities. However, there is already a fix[4], and it is to be
resolved in the next release. I will update the tree once it is ready.
[1] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/
[2] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/
[3] [bug report] RDMA/rxe: Failure of ibv_query_device() and ibv_query_device_ex() tests in rdma-core
https://lore.kernel.org/all/1b9d6286-62fc-4b42-b304-0054c4ebee02@linux.dev/T/
[4] [PATCH rdma-rc 1/1] RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
https://lore.kernel.org/linux-rdma/174102882930.42565.11864314726635251412.b4-ty@kernel.org/T/#t
Daisuke Matsuda (2):
RDMA/rxe: Enable ODP in RDMA FLUSH operation
RDMA/rxe: Enable ODP in ATOMIC WRITE operation
drivers/infiniband/sw/rxe/rxe.c | 2 +
drivers/infiniband/sw/rxe/rxe_loc.h | 12 +++
drivers/infiniband/sw/rxe/rxe_mr.c | 4 -
drivers/infiniband/sw/rxe/rxe_odp.c | 132 ++++++++++++++++++++++++++-
drivers/infiniband/sw/rxe/rxe_resp.c | 18 ++--
include/rdma/ib_verbs.h | 2 +
6 files changed, 155 insertions(+), 15 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation
2025-03-14 8:10 [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Daisuke Matsuda
@ 2025-03-14 8:10 ` Daisuke Matsuda
2025-03-15 19:23 ` Zhu Yanjun
2025-03-17 18:22 ` Leon Romanovsky
2025-03-14 8:10 ` [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation Daisuke Matsuda
2025-03-15 19:21 ` [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Zhu Yanjun
2 siblings, 2 replies; 9+ messages in thread
From: Daisuke Matsuda @ 2025-03-14 8:10 UTC (permalink / raw)
To: linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian, Daisuke Matsuda
For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific
steps are executed. Otherwise, no additional consideration is required.
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
drivers/infiniband/sw/rxe/rxe.c | 1 +
drivers/infiniband/sw/rxe/rxe_loc.h | 7 +++
drivers/infiniband/sw/rxe/rxe_odp.c | 73 ++++++++++++++++++++++++++--
drivers/infiniband/sw/rxe/rxe_resp.c | 13 ++---
include/rdma/ib_verbs.h | 1 +
5 files changed, 85 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 4e56a371deb5..df66f8f9efa1 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -109,6 +109,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
+ rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_FLUSH;
}
}
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index feb386d98d1d..0012bebe96ef 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -194,6 +194,8 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
enum rxe_mr_copy_dir dir);
int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
u64 compare, u64 swap_add, u64 *orig_val);
+int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
+ unsigned int length);
#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
static inline int
rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
@@ -212,6 +214,11 @@ rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
{
return RESPST_ERR_UNSUPPORTED_OPCODE;
}
+static inline int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
+ unsigned int length)
+{
+ return -EOPNOTSUPP;
+}
#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
#endif /* RXE_LOC_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index 94f7bbe14981..c1671e5efd70 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -4,6 +4,7 @@
*/
#include <linux/hmm.h>
+#include <linux/libnvdimm.h>
#include <rdma/ib_umem_odp.h>
@@ -147,6 +148,16 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
return need_fault;
}
+static unsigned long rxe_odp_iova_to_index(struct ib_umem_odp *umem_odp, u64 iova)
+{
+ return (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
+}
+
+static unsigned long rxe_odp_iova_to_page_offset(struct ib_umem_odp *umem_odp, u64 iova)
+{
+ return iova & (BIT(umem_odp->page_shift) - 1);
+}
+
static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags)
{
struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
@@ -190,8 +201,8 @@ static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
size_t offset;
u8 *user_va;
- idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
- offset = iova & (BIT(umem_odp->page_shift) - 1);
+ idx = rxe_odp_iova_to_index(umem_odp, iova);
+ offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
while (length > 0) {
u8 *src, *dest;
@@ -277,8 +288,8 @@ static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
return RESPST_ERR_RKEY_VIOLATION;
}
- idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
- page_offset = iova & (BIT(umem_odp->page_shift) - 1);
+ idx = rxe_odp_iova_to_index(umem_odp, iova);
+ page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
page = hmm_pfn_to_page(umem_odp->pfn_list[idx]);
if (!page)
return RESPST_ERR_RKEY_VIOLATION;
@@ -324,3 +335,57 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
return err;
}
+
+int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
+ unsigned int length)
+{
+ struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+ unsigned int page_offset;
+ unsigned long index;
+ struct page *page;
+ unsigned int bytes;
+ int err;
+ u8 *va;
+
+ /* mr must be valid even if length is zero */
+ if (WARN_ON(!mr))
+ return -EINVAL;
+
+ if (length == 0)
+ return 0;
+
+ err = mr_check_range(mr, iova, length);
+ if (err)
+ return err;
+
+ err = rxe_odp_map_range_and_lock(mr, iova, length,
+ RXE_PAGEFAULT_DEFAULT);
+ if (err)
+ return err;
+
+ while (length > 0) {
+ index = rxe_odp_iova_to_index(umem_odp, iova);
+ page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
+
+ page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
+ if (!page) {
+ mutex_unlock(&umem_odp->umem_mutex);
+ return -EFAULT;
+ }
+
+ bytes = min_t(unsigned int, length,
+ mr_page_size(mr) - page_offset);
+
+ va = kmap_local_page(page);
+ arch_wb_cache_pmem(va + page_offset, bytes);
+ kunmap_local(va);
+
+ length -= bytes;
+ iova += bytes;
+ page_offset = 0;
+ }
+
+ mutex_unlock(&umem_odp->umem_mutex);
+
+ return 0;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 54ba9ee1acc5..dd65a8872111 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -649,10 +649,6 @@ static enum resp_states process_flush(struct rxe_qp *qp,
struct rxe_mr *mr = qp->resp.mr;
struct resp_res *res = qp->resp.res;
- /* ODP is not supported right now. WIP. */
- if (mr->umem->is_odp)
- return RESPST_ERR_UNSUPPORTED_OPCODE;
-
/* oA19-14, oA19-15 */
if (res && res->replay)
return RESPST_ACKNOWLEDGE;
@@ -670,8 +666,13 @@ static enum resp_states process_flush(struct rxe_qp *qp,
}
if (res->flush.type & IB_FLUSH_PERSISTENT) {
- if (rxe_flush_pmem_iova(mr, start, length))
- return RESPST_ERR_RKEY_VIOLATION;
+ if (mr->umem->is_odp) {
+ if (rxe_odp_flush_pmem_iova(mr, start, length))
+ return RESPST_ERR_RKEY_VIOLATION;
+ } else {
+ if (rxe_flush_pmem_iova(mr, start, length))
+ return RESPST_ERR_RKEY_VIOLATION;
+ }
/* Make data persistent. */
wmb();
} else if (res->flush.type & IB_FLUSH_GLOBAL) {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9941f4185c79..da07d3e2db1d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -325,6 +325,7 @@ enum ib_odp_transport_cap_bits {
IB_ODP_SUPPORT_READ = 1 << 3,
IB_ODP_SUPPORT_ATOMIC = 1 << 4,
IB_ODP_SUPPORT_SRQ_RECV = 1 << 5,
+ IB_ODP_SUPPORT_FLUSH = 1 << 6,
};
struct ib_odp_caps {
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation
2025-03-14 8:10 [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Daisuke Matsuda
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
@ 2025-03-14 8:10 ` Daisuke Matsuda
2025-03-15 19:23 ` Zhu Yanjun
2025-03-15 19:21 ` [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Zhu Yanjun
2 siblings, 1 reply; 9+ messages in thread
From: Daisuke Matsuda @ 2025-03-14 8:10 UTC (permalink / raw)
To: linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian, Daisuke Matsuda
Add rxe_odp_do_atomic_write() so that ODP specific steps are applied to
ATOMIC WRITE requests.
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
drivers/infiniband/sw/rxe/rxe.c | 1 +
drivers/infiniband/sw/rxe/rxe_loc.h | 5 +++
drivers/infiniband/sw/rxe/rxe_mr.c | 4 --
drivers/infiniband/sw/rxe/rxe_odp.c | 59 ++++++++++++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_resp.c | 5 ++-
include/rdma/ib_verbs.h | 1 +
6 files changed, 70 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index df66f8f9efa1..21ce2d876b42 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -110,6 +110,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_FLUSH;
+ rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC_WRITE;
}
}
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 0012bebe96ef..8b1517c0894c 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -196,6 +196,7 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
u64 compare, u64 swap_add, u64 *orig_val);
int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
unsigned int length);
+int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value);
#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
static inline int
rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
@@ -219,6 +220,10 @@ static inline int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
{
return -EOPNOTSUPP;
}
+static inline int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
+{
+ return RESPST_ERR_UNSUPPORTED_OPCODE;
+}
#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
#endif /* RXE_LOC_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 868d2f0b74e9..3aecb5be26d9 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -535,10 +535,6 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
struct page *page;
u64 *va;
- /* ODP is not supported right now. WIP. */
- if (mr->umem->is_odp)
- return RESPST_ERR_UNSUPPORTED_OPCODE;
-
/* See IBA oA19-28 */
if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
rxe_dbg_mr(mr, "mr not in valid state\n");
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index c1671e5efd70..79ef5fe41f8e 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -389,3 +389,62 @@ int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
return 0;
}
+
+#if defined CONFIG_64BIT
+/* only implemented or called for 64 bit architectures */
+int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
+{
+ struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+ unsigned int page_offset;
+ unsigned long index;
+ struct page *page;
+ int err;
+ u64 *va;
+
+ /* See IBA oA19-28 */
+ if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
+ rxe_dbg_mr(mr, "mr not in valid state\n");
+ return RESPST_ERR_RKEY_VIOLATION;
+ }
+
+ /* See IBA oA19-28 */
+ err = mr_check_range(mr, iova, sizeof(value));
+ if (unlikely(err)) {
+ rxe_dbg_mr(mr, "iova out of range\n");
+ return RESPST_ERR_RKEY_VIOLATION;
+ }
+
+ err = rxe_odp_map_range_and_lock(mr, iova, sizeof(value),
+ RXE_PAGEFAULT_DEFAULT);
+ if (err)
+ return RESPST_ERR_RKEY_VIOLATION;
+
+ page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
+ index = rxe_odp_iova_to_index(umem_odp, iova);
+ page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
+ if (!page) {
+ mutex_unlock(&umem_odp->umem_mutex);
+ return RESPST_ERR_RKEY_VIOLATION;
+ }
+ /* See IBA A19.4.2 */
+ if (unlikely(page_offset & 0x7)) {
+ mutex_unlock(&umem_odp->umem_mutex);
+ rxe_dbg_mr(mr, "misaligned address\n");
+ return RESPST_ERR_MISALIGNED_ATOMIC;
+ }
+
+ va = kmap_local_page(page);
+ /* Do atomic write after all prior operations have completed */
+ smp_store_release(&va[page_offset >> 3], value);
+ kunmap_local(va);
+
+ mutex_unlock(&umem_odp->umem_mutex);
+
+ return 0;
+}
+#else
+int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
+{
+ return RESPST_ERR_UNSUPPORTED_OPCODE;
+}
+#endif
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index dd65a8872111..1505d933c09b 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -754,7 +754,10 @@ static enum resp_states atomic_write_reply(struct rxe_qp *qp,
value = *(u64 *)payload_addr(pkt);
iova = qp->resp.va + qp->resp.offset;
- err = rxe_mr_do_atomic_write(mr, iova, value);
+ if (mr->umem->is_odp)
+ err = rxe_odp_do_atomic_write(mr, iova, value);
+ else
+ err = rxe_mr_do_atomic_write(mr, iova, value);
if (err)
return err;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index da07d3e2db1d..bfa1bff3c720 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -326,6 +326,7 @@ enum ib_odp_transport_cap_bits {
IB_ODP_SUPPORT_ATOMIC = 1 << 4,
IB_ODP_SUPPORT_SRQ_RECV = 1 << 5,
IB_ODP_SUPPORT_FLUSH = 1 << 6,
+ IB_ODP_SUPPORT_ATOMIC_WRITE = 1 << 7,
};
struct ib_odp_caps {
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP
2025-03-14 8:10 [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Daisuke Matsuda
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
2025-03-14 8:10 ` [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation Daisuke Matsuda
@ 2025-03-15 19:21 ` Zhu Yanjun
2025-03-17 5:22 ` Daisuke Matsuda (Fujitsu)
2 siblings, 1 reply; 9+ messages in thread
From: Zhu Yanjun @ 2025-03-15 19:21 UTC (permalink / raw)
To: Daisuke Matsuda, linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian
在 2025/3/14 9:10, Daisuke Matsuda 写道:
> RDMA FLUSH[1] and ATOMIC WRITE[2] were added to rxe, but they cannot run
> in the ODP mode as of now. This series is for the kernel-side enablement.
>
> There are also minor changes in libibverbs and pyverbs. The rdma-core tests
> are also added so that people can test the features.
> PR: https://github.com/linux-rdma/rdma-core/pull/1580
>
> You can try the patches with the tree below:
> https://github.com/ddmatsu/linux/tree/odp-extension
>
> Note that the tree is a bit old (6.13-rc1), because there was an issue[3]
> in the for-next tree that disabled ibv_query_device_ex(), which is used to
> query ODP capabilities. However, there is already a fix[4], and it is to be
> resolved in the next release. I will update the tree once it is ready.
>
> [1] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
> https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/
>
> [2] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
> https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/
>
> [3] [bug report] RDMA/rxe: Failure of ibv_query_device() and ibv_query_device_ex() tests in rdma-core
> https://lore.kernel.org/all/1b9d6286-62fc-4b42-b304-0054c4ebee02@linux.dev/T/
>
> [4] [PATCH rdma-rc 1/1] RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
> https://lore.kernel.org/linux-rdma/174102882930.42565.11864314726635251412.b4-ty@kernel.org/T/#t
Today I read these commits carefully. The 2 commits introduces
ATOMIC_WRITE and ATOMIC_FLUSH operations with ODP enabled. In the
rdma-core, the corresponding test cases are also added. I am fine with
these 2 commits.
But I notice that there are no perftest results with the 2 operations.
Perftest is a stress-test tools. With this tool, it can test the 2
commits with some stress.
Anyway, I am fine with the 2 commits. It is better if the perftest
results are attached.
Zhu Yanjun
>
> Daisuke Matsuda (2):
> RDMA/rxe: Enable ODP in RDMA FLUSH operation
> RDMA/rxe: Enable ODP in ATOMIC WRITE operation
>
> drivers/infiniband/sw/rxe/rxe.c | 2 +
> drivers/infiniband/sw/rxe/rxe_loc.h | 12 +++
> drivers/infiniband/sw/rxe/rxe_mr.c | 4 -
> drivers/infiniband/sw/rxe/rxe_odp.c | 132 ++++++++++++++++++++++++++-
> drivers/infiniband/sw/rxe/rxe_resp.c | 18 ++--
> include/rdma/ib_verbs.h | 2 +
> 6 files changed, 155 insertions(+), 15 deletions(-)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
@ 2025-03-15 19:23 ` Zhu Yanjun
2025-03-17 18:22 ` Leon Romanovsky
1 sibling, 0 replies; 9+ messages in thread
From: Zhu Yanjun @ 2025-03-15 19:23 UTC (permalink / raw)
To: Daisuke Matsuda, linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian
在 2025/3/14 9:10, Daisuke Matsuda 写道:
> For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific
> steps are executed. Otherwise, no additional consideration is required.
Thanks a lot. It is better if the perftest results are also attached.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe.c | 1 +
> drivers/infiniband/sw/rxe/rxe_loc.h | 7 +++
> drivers/infiniband/sw/rxe/rxe_odp.c | 73 ++++++++++++++++++++++++++--
> drivers/infiniband/sw/rxe/rxe_resp.c | 13 ++---
> include/rdma/ib_verbs.h | 1 +
> 5 files changed, 85 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 4e56a371deb5..df66f8f9efa1 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -109,6 +109,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
> + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_FLUSH;
> }
> }
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index feb386d98d1d..0012bebe96ef 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -194,6 +194,8 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
> enum rxe_mr_copy_dir dir);
> int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> u64 compare, u64 swap_add, u64 *orig_val);
> +int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> + unsigned int length);
> #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
> static inline int
> rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
> @@ -212,6 +214,11 @@ rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> {
> return RESPST_ERR_UNSUPPORTED_OPCODE;
> }
> +static inline int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> + unsigned int length)
> +{
> + return -EOPNOTSUPP;
> +}
> #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
>
> #endif /* RXE_LOC_H */
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index 94f7bbe14981..c1671e5efd70 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -4,6 +4,7 @@
> */
>
> #include <linux/hmm.h>
> +#include <linux/libnvdimm.h>
>
> #include <rdma/ib_umem_odp.h>
>
> @@ -147,6 +148,16 @@ static inline bool rxe_check_pagefault(struct ib_umem_odp *umem_odp,
> return need_fault;
> }
>
> +static unsigned long rxe_odp_iova_to_index(struct ib_umem_odp *umem_odp, u64 iova)
> +{
> + return (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> +}
> +
> +static unsigned long rxe_odp_iova_to_page_offset(struct ib_umem_odp *umem_odp, u64 iova)
> +{
> + return iova & (BIT(umem_odp->page_shift) - 1);
> +}
> +
> static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags)
> {
> struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> @@ -190,8 +201,8 @@ static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
> size_t offset;
> u8 *user_va;
>
> - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> - offset = iova & (BIT(umem_odp->page_shift) - 1);
> + idx = rxe_odp_iova_to_index(umem_odp, iova);
> + offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
>
> while (length > 0) {
> u8 *src, *dest;
> @@ -277,8 +288,8 @@ static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> return RESPST_ERR_RKEY_VIOLATION;
> }
>
> - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> - page_offset = iova & (BIT(umem_odp->page_shift) - 1);
> + idx = rxe_odp_iova_to_index(umem_odp, iova);
> + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> page = hmm_pfn_to_page(umem_odp->pfn_list[idx]);
> if (!page)
> return RESPST_ERR_RKEY_VIOLATION;
> @@ -324,3 +335,57 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
>
> return err;
> }
> +
> +int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> + unsigned int length)
> +{
> + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> + unsigned int page_offset;
> + unsigned long index;
> + struct page *page;
> + unsigned int bytes;
> + int err;
> + u8 *va;
> +
> + /* mr must be valid even if length is zero */
> + if (WARN_ON(!mr))
> + return -EINVAL;
> +
> + if (length == 0)
> + return 0;
> +
> + err = mr_check_range(mr, iova, length);
> + if (err)
> + return err;
> +
> + err = rxe_odp_map_range_and_lock(mr, iova, length,
> + RXE_PAGEFAULT_DEFAULT);
> + if (err)
> + return err;
> +
> + while (length > 0) {
> + index = rxe_odp_iova_to_index(umem_odp, iova);
> + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> +
> + page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
> + if (!page) {
> + mutex_unlock(&umem_odp->umem_mutex);
> + return -EFAULT;
> + }
> +
> + bytes = min_t(unsigned int, length,
> + mr_page_size(mr) - page_offset);
> +
> + va = kmap_local_page(page);
> + arch_wb_cache_pmem(va + page_offset, bytes);
> + kunmap_local(va);
> +
> + length -= bytes;
> + iova += bytes;
> + page_offset = 0;
> + }
> +
> + mutex_unlock(&umem_odp->umem_mutex);
> +
> + return 0;
> +}
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index 54ba9ee1acc5..dd65a8872111 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -649,10 +649,6 @@ static enum resp_states process_flush(struct rxe_qp *qp,
> struct rxe_mr *mr = qp->resp.mr;
> struct resp_res *res = qp->resp.res;
>
> - /* ODP is not supported right now. WIP. */
> - if (mr->umem->is_odp)
> - return RESPST_ERR_UNSUPPORTED_OPCODE;
> -
> /* oA19-14, oA19-15 */
> if (res && res->replay)
> return RESPST_ACKNOWLEDGE;
> @@ -670,8 +666,13 @@ static enum resp_states process_flush(struct rxe_qp *qp,
> }
>
> if (res->flush.type & IB_FLUSH_PERSISTENT) {
> - if (rxe_flush_pmem_iova(mr, start, length))
> - return RESPST_ERR_RKEY_VIOLATION;
> + if (mr->umem->is_odp) {
> + if (rxe_odp_flush_pmem_iova(mr, start, length))
> + return RESPST_ERR_RKEY_VIOLATION;
> + } else {
> + if (rxe_flush_pmem_iova(mr, start, length))
> + return RESPST_ERR_RKEY_VIOLATION;
> + }
> /* Make data persistent. */
> wmb();
> } else if (res->flush.type & IB_FLUSH_GLOBAL) {
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 9941f4185c79..da07d3e2db1d 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -325,6 +325,7 @@ enum ib_odp_transport_cap_bits {
> IB_ODP_SUPPORT_READ = 1 << 3,
> IB_ODP_SUPPORT_ATOMIC = 1 << 4,
> IB_ODP_SUPPORT_SRQ_RECV = 1 << 5,
> + IB_ODP_SUPPORT_FLUSH = 1 << 6,
> };
>
> struct ib_odp_caps {
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation
2025-03-14 8:10 ` [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation Daisuke Matsuda
@ 2025-03-15 19:23 ` Zhu Yanjun
0 siblings, 0 replies; 9+ messages in thread
From: Zhu Yanjun @ 2025-03-15 19:23 UTC (permalink / raw)
To: Daisuke Matsuda, linux-rdma, leon, jgg, zyjzyj2000; +Cc: lizhijian
在 2025/3/14 9:10, Daisuke Matsuda 写道:
> Add rxe_odp_do_atomic_write() so that ODP specific steps are applied to
> ATOMIC WRITE requests.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Thanks a lot. It is better if the perftest results are also attached.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe.c | 1 +
> drivers/infiniband/sw/rxe/rxe_loc.h | 5 +++
> drivers/infiniband/sw/rxe/rxe_mr.c | 4 --
> drivers/infiniband/sw/rxe/rxe_odp.c | 59 ++++++++++++++++++++++++++++
> drivers/infiniband/sw/rxe/rxe_resp.c | 5 ++-
> include/rdma/ib_verbs.h | 1 +
> 6 files changed, 70 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index df66f8f9efa1..21ce2d876b42 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -110,6 +110,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
> rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_FLUSH;
> + rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC_WRITE;
> }
> }
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index 0012bebe96ef..8b1517c0894c 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -196,6 +196,7 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> u64 compare, u64 swap_add, u64 *orig_val);
> int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> unsigned int length);
> +int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value);
> #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
> static inline int
> rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
> @@ -219,6 +220,10 @@ static inline int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> {
> return -EOPNOTSUPP;
> }
> +static inline int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
> +{
> + return RESPST_ERR_UNSUPPORTED_OPCODE;
> +}
> #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
>
> #endif /* RXE_LOC_H */
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index 868d2f0b74e9..3aecb5be26d9 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -535,10 +535,6 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
> struct page *page;
> u64 *va;
>
> - /* ODP is not supported right now. WIP. */
> - if (mr->umem->is_odp)
> - return RESPST_ERR_UNSUPPORTED_OPCODE;
> -
> /* See IBA oA19-28 */
> if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
> rxe_dbg_mr(mr, "mr not in valid state\n");
> diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
> index c1671e5efd70..79ef5fe41f8e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_odp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_odp.c
> @@ -389,3 +389,62 @@ int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
>
> return 0;
> }
> +
> +#if defined CONFIG_64BIT
> +/* only implemented or called for 64 bit architectures */
> +int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
> +{
> + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> + unsigned int page_offset;
> + unsigned long index;
> + struct page *page;
> + int err;
> + u64 *va;
> +
> + /* See IBA oA19-28 */
> + if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
> + rxe_dbg_mr(mr, "mr not in valid state\n");
> + return RESPST_ERR_RKEY_VIOLATION;
> + }
> +
> + /* See IBA oA19-28 */
> + err = mr_check_range(mr, iova, sizeof(value));
> + if (unlikely(err)) {
> + rxe_dbg_mr(mr, "iova out of range\n");
> + return RESPST_ERR_RKEY_VIOLATION;
> + }
> +
> + err = rxe_odp_map_range_and_lock(mr, iova, sizeof(value),
> + RXE_PAGEFAULT_DEFAULT);
> + if (err)
> + return RESPST_ERR_RKEY_VIOLATION;
> +
> + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> + index = rxe_odp_iova_to_index(umem_odp, iova);
> + page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
> + if (!page) {
> + mutex_unlock(&umem_odp->umem_mutex);
> + return RESPST_ERR_RKEY_VIOLATION;
> + }
> + /* See IBA A19.4.2 */
> + if (unlikely(page_offset & 0x7)) {
> + mutex_unlock(&umem_odp->umem_mutex);
> + rxe_dbg_mr(mr, "misaligned address\n");
> + return RESPST_ERR_MISALIGNED_ATOMIC;
> + }
> +
> + va = kmap_local_page(page);
> + /* Do atomic write after all prior operations have completed */
> + smp_store_release(&va[page_offset >> 3], value);
> + kunmap_local(va);
> +
> + mutex_unlock(&umem_odp->umem_mutex);
> +
> + return 0;
> +}
> +#else
> +int rxe_odp_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
> +{
> + return RESPST_ERR_UNSUPPORTED_OPCODE;
> +}
> +#endif
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index dd65a8872111..1505d933c09b 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -754,7 +754,10 @@ static enum resp_states atomic_write_reply(struct rxe_qp *qp,
> value = *(u64 *)payload_addr(pkt);
> iova = qp->resp.va + qp->resp.offset;
>
> - err = rxe_mr_do_atomic_write(mr, iova, value);
> + if (mr->umem->is_odp)
> + err = rxe_odp_do_atomic_write(mr, iova, value);
> + else
> + err = rxe_mr_do_atomic_write(mr, iova, value);
> if (err)
> return err;
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index da07d3e2db1d..bfa1bff3c720 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -326,6 +326,7 @@ enum ib_odp_transport_cap_bits {
> IB_ODP_SUPPORT_ATOMIC = 1 << 4,
> IB_ODP_SUPPORT_SRQ_RECV = 1 << 5,
> IB_ODP_SUPPORT_FLUSH = 1 << 6,
> + IB_ODP_SUPPORT_ATOMIC_WRITE = 1 << 7,
> };
>
> struct ib_odp_caps {
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP
2025-03-15 19:21 ` [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Zhu Yanjun
@ 2025-03-17 5:22 ` Daisuke Matsuda (Fujitsu)
0 siblings, 0 replies; 9+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2025-03-17 5:22 UTC (permalink / raw)
To: 'Zhu Yanjun', linux-rdma@vger.kernel.org, leon@kernel.org,
jgg@ziepe.ca, zyjzyj2000@gmail.com
Cc: Zhijian Li (Fujitsu)
On Sun, March 16, 2025 4:21 AM Zhu Yanjun wrote:
> 在 2025/3/14 9:10, Daisuke Matsuda 写道:
> > RDMA FLUSH[1] and ATOMIC WRITE[2] were added to rxe, but they cannot run
> > in the ODP mode as of now. This series is for the kernel-side enablement.
> >
> > There are also minor changes in libibverbs and pyverbs. The rdma-core tests
> > are also added so that people can test the features.
> > PR: https://github.com/linux-rdma/rdma-core/pull/1580
> >
> > You can try the patches with the tree below:
> > https://github.com/ddmatsu/linux/tree/odp-extension
> >
> > Note that the tree is a bit old (6.13-rc1), because there was an issue[3]
> > in the for-next tree that disabled ibv_query_device_ex(), which is used to
> > query ODP capabilities. However, there is already a fix[4], and it is to be
> > resolved in the next release. I will update the tree once it is ready.
> >
> > [1] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
> > https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/
> >
> > [2] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
> > https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/
> >
> > [3] [bug report] RDMA/rxe: Failure of ibv_query_device() and ibv_query_device_ex() tests in rdma-core
> > https://lore.kernel.org/all/1b9d6286-62fc-4b42-b304-0054c4ebee02@linux.dev/T/
> >
> > [4] [PATCH rdma-rc 1/1] RDMA/rxe: Fix the failure of ibv_query_device() and ibv_query_device_ex() tests
> > https://lore.kernel.org/linux-rdma/174102882930.42565.11864314726635251412.b4-ty@kernel.org/T/#t
>
> Today I read these commits carefully. The 2 commits introduces
> ATOMIC_WRITE and ATOMIC_FLUSH operations with ODP enabled. In the
> rdma-core, the corresponding test cases are also added. I am fine with
> these 2 commits.
>
> But I notice that there are no perftest results with the 2 operations.
> Perftest is a stress-test tools. With this tool, it can test the 2
> commits with some stress.
Hi Zhu,
Thank you for the review.
I cannot measure the 2 operations with perftest right now because the tool does not
support them right now. However, they should ideally be supported before HW HCAs
start to enable RDMA FLUSH and ATOMIC WRITE. After I complete my works for rxe
ODP features, I think I can see to it. Using rxe will be very helpful in doing that.
Thanks,
Daisuke Matsuda
>
> Anyway, I am fine with the 2 commits. It is better if the perftest
> results are attached.
>
> Zhu Yanjun
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
2025-03-15 19:23 ` Zhu Yanjun
@ 2025-03-17 18:22 ` Leon Romanovsky
2025-03-18 9:47 ` Daisuke Matsuda (Fujitsu)
1 sibling, 1 reply; 9+ messages in thread
From: Leon Romanovsky @ 2025-03-17 18:22 UTC (permalink / raw)
To: Daisuke Matsuda; +Cc: linux-rdma, jgg, zyjzyj2000, lizhijian
On Fri, Mar 14, 2025 at 05:10:55PM +0900, Daisuke Matsuda wrote:
> For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific
> steps are executed. Otherwise, no additional consideration is required.
>
> Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> ---
> drivers/infiniband/sw/rxe/rxe.c | 1 +
> drivers/infiniband/sw/rxe/rxe_loc.h | 7 +++
> drivers/infiniband/sw/rxe/rxe_odp.c | 73 ++++++++++++++++++++++++++--
> drivers/infiniband/sw/rxe/rxe_resp.c | 13 ++---
> include/rdma/ib_verbs.h | 1 +
> 5 files changed, 85 insertions(+), 10 deletions(-)
<...>
>
> +static unsigned long rxe_odp_iova_to_index(struct ib_umem_odp *umem_odp, u64 iova)
> +{
> + return (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> +}
> +
> +static unsigned long rxe_odp_iova_to_page_offset(struct ib_umem_odp *umem_odp, u64 iova)
> +{
> + return iova & (BIT(umem_odp->page_shift) - 1);
> +}
> +
> static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags)
> {
> struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> @@ -190,8 +201,8 @@ static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
> size_t offset;
> u8 *user_va;
>
> - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> - offset = iova & (BIT(umem_odp->page_shift) - 1);
> + idx = rxe_odp_iova_to_index(umem_odp, iova);
> + offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
>
> while (length > 0) {
> u8 *src, *dest;
> @@ -277,8 +288,8 @@ static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> return RESPST_ERR_RKEY_VIOLATION;
> }
>
> - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> - page_offset = iova & (BIT(umem_odp->page_shift) - 1);
> + idx = rxe_odp_iova_to_index(umem_odp, iova);
> + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> page = hmm_pfn_to_page(umem_odp->pfn_list[idx]);
> if (!page)
> return RESPST_ERR_RKEY_VIOLATION;
> @@ -324,3 +335,57 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
>
> return err;
> }
> +
> +int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> + unsigned int length)
> +{
This function looks almost similar to existing rxe_flush_pmem_iova().
Can't you reuse existing functions instead of duplicating?
Thanks
> + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> + unsigned int page_offset;
> + unsigned long index;
> + struct page *page;
> + unsigned int bytes;
> + int err;
> + u8 *va;
> +
> + /* mr must be valid even if length is zero */
> + if (WARN_ON(!mr))
> + return -EINVAL;
> +
> + if (length == 0)
> + return 0;
> +
> + err = mr_check_range(mr, iova, length);
> + if (err)
> + return err;
> +
> + err = rxe_odp_map_range_and_lock(mr, iova, length,
> + RXE_PAGEFAULT_DEFAULT);
> + if (err)
> + return err;
> +
> + while (length > 0) {
> + index = rxe_odp_iova_to_index(umem_odp, iova);
> + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> +
> + page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
> + if (!page) {
> + mutex_unlock(&umem_odp->umem_mutex);
> + return -EFAULT;
> + }
> +
> + bytes = min_t(unsigned int, length,
> + mr_page_size(mr) - page_offset);
> +
> + va = kmap_local_page(page);
> + arch_wb_cache_pmem(va + page_offset, bytes);
> + kunmap_local(va);
> +
> + length -= bytes;
> + iova += bytes;
> + page_offset = 0;
> + }
> +
> + mutex_unlock(&umem_odp->umem_mutex);
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation
2025-03-17 18:22 ` Leon Romanovsky
@ 2025-03-18 9:47 ` Daisuke Matsuda (Fujitsu)
0 siblings, 0 replies; 9+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2025-03-18 9:47 UTC (permalink / raw)
To: 'Leon Romanovsky'
Cc: linux-rdma@vger.kernel.org, jgg@ziepe.ca, zyjzyj2000@gmail.com,
Zhijian Li (Fujitsu)
On Tue, Mar 18, 2025 3:23 AM Leon Romanovsky wrote:
> On Fri, Mar 14, 2025 at 05:10:55PM +0900, Daisuke Matsuda wrote:
> > For persistent memories, add rxe_odp_flush_pmem_iova() so that ODP specific
> > steps are executed. Otherwise, no additional consideration is required.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
> > ---
> > drivers/infiniband/sw/rxe/rxe.c | 1 +
> > drivers/infiniband/sw/rxe/rxe_loc.h | 7 +++
> > drivers/infiniband/sw/rxe/rxe_odp.c | 73 ++++++++++++++++++++++++++--
> > drivers/infiniband/sw/rxe/rxe_resp.c | 13 ++---
> > include/rdma/ib_verbs.h | 1 +
> > 5 files changed, 85 insertions(+), 10 deletions(-)
>
> <...>
>
> >
> > +static unsigned long rxe_odp_iova_to_index(struct ib_umem_odp *umem_odp, u64 iova)
> > +{
> > + return (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > +}
> > +
> > +static unsigned long rxe_odp_iova_to_page_offset(struct ib_umem_odp *umem_odp, u64 iova)
> > +{
> > + return iova & (BIT(umem_odp->page_shift) - 1);
> > +}
> > +
> > static int rxe_odp_map_range_and_lock(struct rxe_mr *mr, u64 iova, int length, u32 flags)
> > {
> > struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> > @@ -190,8 +201,8 @@ static int __rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
> > size_t offset;
> > u8 *user_va;
> >
> > - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > - offset = iova & (BIT(umem_odp->page_shift) - 1);
> > + idx = rxe_odp_iova_to_index(umem_odp, iova);
> > + offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> >
> > while (length > 0) {
> > u8 *src, *dest;
> > @@ -277,8 +288,8 @@ static int rxe_odp_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> > return RESPST_ERR_RKEY_VIOLATION;
> > }
> >
> > - idx = (iova - ib_umem_start(umem_odp)) >> umem_odp->page_shift;
> > - page_offset = iova & (BIT(umem_odp->page_shift) - 1);
> > + idx = rxe_odp_iova_to_index(umem_odp, iova);
> > + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> > page = hmm_pfn_to_page(umem_odp->pfn_list[idx]);
> > if (!page)
> > return RESPST_ERR_RKEY_VIOLATION;
> > @@ -324,3 +335,57 @@ int rxe_odp_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> >
> > return err;
> > }
> > +
> > +int rxe_odp_flush_pmem_iova(struct rxe_mr *mr, u64 iova,
> > + unsigned int length)
> > +{
>
> This function looks almost similar to existing rxe_flush_pmem_iova().
> Can't you reuse existing functions instead of duplicating?
Hi,
Thank you for the review.
I will send new patches to let these functions share the common elements,
but the while loops must stay separated. They use different page fetching logic,
and it is not feasible to unite the functions.
Thanks,
Daisuke
>
> Thanks
>
> > + struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
> > + unsigned int page_offset;
> > + unsigned long index;
> > + struct page *page;
> > + unsigned int bytes;
> > + int err;
> > + u8 *va;
> > +
> > + /* mr must be valid even if length is zero */
> > + if (WARN_ON(!mr))
> > + return -EINVAL;
> > +
> > + if (length == 0)
> > + return 0;
> > +
> > + err = mr_check_range(mr, iova, length);
> > + if (err)
> > + return err;
> > +
> > + err = rxe_odp_map_range_and_lock(mr, iova, length,
> > + RXE_PAGEFAULT_DEFAULT);
> > + if (err)
> > + return err;
> > +
> > + while (length > 0) {
> > + index = rxe_odp_iova_to_index(umem_odp, iova);
> > + page_offset = rxe_odp_iova_to_page_offset(umem_odp, iova);
> > +
> > + page = hmm_pfn_to_page(umem_odp->pfn_list[index]);
> > + if (!page) {
> > + mutex_unlock(&umem_odp->umem_mutex);
> > + return -EFAULT;
> > + }
> > +
> > + bytes = min_t(unsigned int, length,
> > + mr_page_size(mr) - page_offset);
> > +
> > + va = kmap_local_page(page);
> > + arch_wb_cache_pmem(va + page_offset, bytes);
> > + kunmap_local(va);
> > +
> > + length -= bytes;
> > + iova += bytes;
> > + page_offset = 0;
> > + }
> > +
> > + mutex_unlock(&umem_odp->umem_mutex);
> > +
> > + return 0;
> > +}
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-03-18 9:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-14 8:10 [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Daisuke Matsuda
2025-03-14 8:10 ` [PATCH for-next v1 1/2] RDMA/rxe: Enable ODP in RDMA FLUSH operation Daisuke Matsuda
2025-03-15 19:23 ` Zhu Yanjun
2025-03-17 18:22 ` Leon Romanovsky
2025-03-18 9:47 ` Daisuke Matsuda (Fujitsu)
2025-03-14 8:10 ` [PATCH for-next v1 2/2] RDMA/rxe: Enable ODP in ATOMIC WRITE operation Daisuke Matsuda
2025-03-15 19:23 ` Zhu Yanjun
2025-03-15 19:21 ` [PATCH for-next v1 0/2] RDMA/rxe: RDMA FLUSH and ATOMIC WRITE with ODP Zhu Yanjun
2025-03-17 5:22 ` Daisuke Matsuda (Fujitsu)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox