linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
@ 2023-11-09  5:44 Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Daisuke Matsuda
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

This patch series implements the On-Demand Paging feature on SoftRoCE(rxe)
driver, which has been available only in mlx5 driver[1] so far.

This series is dependent on the commit 9b4b7c1f9f54 ("RDMA/rxe: Add
workqueue support for rxe tasks"), which replaced the triple tasklets with
a workqueue. The patch has been suspected of introducing the hang issue of
srp 002 test[2]. According to investigation by Bob and Bart, it is likely
to be a timing issue that can potentially occur with both rxe and siw[3].
So I resume sending my ODP patches anyway.

I omitted some contents like the motive behind this series from the cover-
letter. Please see the cover letter of v3 for more details[4].

[Overview]
When applications register a memory region(MR), RDMA drivers normally pin
pages in the MR so that physical addresses are never changed during RDMA
communication. This requires the MR to fit in physical memory and
inevitably leads to memory pressure. On the other hand, On-Demand Paging
(ODP) allows applications to register MRs without pinning pages. They are
paged-in when the driver requires and paged-out when the OS reclaims. As a
result, it is possible to register a large MR that does not fit in physical
memory without taking up so much physical memory.

[How does ODP work?]
"struct ib_umem_odp" is used to manage pages. It is created for each
ODP-enabled MR on its registration. This struct holds a pair of arrays
(dma_list/pfn_list) that serve as a driver page table. DMA addresses and
PFNs are stored in the driver page table. They are updated on page-in and
page-out, both of which use the common interfaces in the ib_uverbs layer.

Page-in can occur when requester, responder or completer access an MR in
order to process RDMA operations. If they find that the pages being
accessed are not present on physical memory or requisite permissions are
not set on the pages, they provoke page fault to make the pages present
with proper permissions and at the same time update the driver page table.
After confirming the presence of the pages, they execute memory access such
as read, write or atomic operations.

Page-out is triggered by page reclaim or filesystem events (e.g. metadata
update of a file that is being used as an MR). When creating an ODP-enabled
MR, the driver registers an MMU notifier callback. When the kernel issues a
page invalidation notification, the callback is provoked to unmap DMA
addresses and update the driver page table. After that, the kernel releases
the pages.

[Supported operations]
All traditional operations are supported on RC connection. The new Atomic
write[5] and RDMA Flush[6] operations are not included in this patchset. I
will post them later after this patchset is merged. On UD connection, Send,
Recv, and SRQ-Recv are supported.

[How to test ODP?]
There are only a few resources available for testing. pyverbs testcases in
rdma-core and perftest[7] are recommendable ones. Other than them, the
ibv_rc_pingpong command can also be used for testing. Note that you may
have to build perftest from upstream because older versions do not handle
ODP capabilities correctly.

The ODP tree is available from github:
https://github.com/ddmatsu/linux/tree/odp_v7

[Future work]
My next work is to enable the new Atomic write[5] and RDMA Flush[6]
operations with ODP. After that, I am going to implement the prefetch
feature. It allows applications to trigger page fault using
ibv_advise_mr(3) to optimize performance. Some existing software like
librpma[8] use this feature. Additionally, I think we can also add the
implicit ODP feature in the future.

[1] [RFC 00/20] On demand paging
https://www.spinics.net/lists/linux-rdma/msg18906.html

[2] [PATCH for-next v3 0/7] On-Demand Paging on SoftRoCE
https://lore.kernel.org/lkml/cover.1671772917.git.matsuda-daisuke@fujitsu.com/

[3] [bug report] blktests srp/002 hang
https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

[4] srp/002 hang in blktests
https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/

[5] [PATCH v7 0/8] RDMA/rxe: Add atomic write operation
https://lore.kernel.org/linux-rdma/1669905432-14-1-git-send-email-yangx.jy@fujitsu.com/

[6] [for-next PATCH 00/10] RDMA/rxe: Add RDMA FLUSH operation
https://lore.kernel.org/lkml/20221206130201.30986-1-lizhijian@fujitsu.com/

[7] linux-rdma/perftest: Infiniband Verbs Performance Tests
https://github.com/linux-rdma/perftest

[8] librpma: Remote Persistent Memory Access Library
https://github.com/pmem/rpma

v6->v7:
 1) Rebased to 6.6.0
 2) Disabled using hugepages with ODP
 3) Addressed comments on v6 from Jason and Zhu
   cf. https://lore.kernel.org/lkml/cover.1694153251.git.matsuda-daisuke@fujitsu.com/

v5->v6:
 Fixed the implementation according to Jason's suggestions
   cf. https://lore.kernel.org/all/ZIdFXfDu4IMKE+BQ@nvidia.com/
   cf. https://lore.kernel.org/all/ZIdGU709e1h5h4JJ@nvidia.com/

v4->v5:
 1) Rebased to 6.4.0-rc2+
 2) Changed to schedule all works on responder and completer to workqueue

v3->v4:
 1) Re-designed functions that access MRs to use the MR xarray.
 2) Rebased onto the latest jgg-for-next tree.

v2->v3:
 1) Removed a patch that changes the common ib_uverbs layer.
 2) Re-implemented patches for conversion to workqueue.
 3) Fixed compile errors (happened when CONFIG_INFINIBAND_ON_DEMAND_PAGING=n).
 4) Fixed some functions that returned incorrect errors.
 5) Temporarily disabled ODP for RDMA Flush and Atomic Write.

v1->v2:
 1) Fixed a crash issue reported by Haris Iqbal.
 2) Tried to make lock patters clearer as pointed out by Romanovsky.
 3) Minor clean ups and fixes.

Daisuke Matsuda (7):
  RDMA/rxe: Always defer tasks on responder and completer to workqueue
  RDMA/rxe: Make MR functions accessible from other rxe source code
  RDMA/rxe: Move resp_states definition to rxe_verbs.h
  RDMA/rxe: Add page invalidation support
  RDMA/rxe: Allow registering MRs for On-Demand Paging
  RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
  RDMA/rxe: Add support for the traditional Atomic operations with ODP

 drivers/infiniband/sw/rxe/Makefile          |   2 +
 drivers/infiniband/sw/rxe/rxe.c             |  18 ++
 drivers/infiniband/sw/rxe/rxe.h             |  37 ---
 drivers/infiniband/sw/rxe/rxe_comp.c        |  12 +-
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |   1 -
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |   1 -
 drivers/infiniband/sw/rxe/rxe_loc.h         |  39 +++
 drivers/infiniband/sw/rxe/rxe_mr.c          |  34 ++-
 drivers/infiniband/sw/rxe/rxe_odp.c         | 289 ++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_resp.c        |  31 +--
 drivers/infiniband/sw/rxe/rxe_verbs.c       |   5 +-
 drivers/infiniband/sw/rxe/rxe_verbs.h       |  37 +++
 12 files changed, 428 insertions(+), 78 deletions(-)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c

-- 
2.39.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Daisuke Matsuda
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

Both responder and completer need to sleep to execute page-fault when used
with ODP. It can happen when they are going to access user MRs, so tasks
must be executed in process context for such cases.

Additionally, current implementation seldom defers tasks to workqueue, but
instead defers to a softirq context running do_task(). It is called from
rxe_resp_queue_pkt() and rxe_comp_queue_pkt() in SOFTIRQ_NET_RX context and
can last until maximum RXE_MAX_ITERATIONS (=1024) loops are executed. The
problem is the that task execuion appears to be anonymous loads in the
system and that the loop can throttle other softirqs on the same CPU.

This patch makes responder and completer codes run in process context for
ODP and the problem described above.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c        | 12 +-----------
 drivers/infiniband/sw/rxe/rxe_hw_counters.c |  1 -
 drivers/infiniband/sw/rxe/rxe_hw_counters.h |  1 -
 drivers/infiniband/sw/rxe/rxe_resp.c        | 13 +------------
 4 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index d0bdc2d8adc8..bb016a43330d 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -129,18 +129,8 @@ void retransmit_timer(struct timer_list *t)
 
 void rxe_comp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 {
-	int must_sched;
-
 	skb_queue_tail(&qp->resp_pkts, skb);
-
-	must_sched = skb_queue_len(&qp->resp_pkts) > 1;
-	if (must_sched != 0)
-		rxe_counter_inc(SKB_TO_PKT(skb)->rxe, RXE_CNT_COMPLETER_SCHED);
-
-	if (must_sched)
-		rxe_sched_task(&qp->comp.task);
-	else
-		rxe_run_task(&qp->comp.task);
+	rxe_sched_task(&qp->comp.task);
 }
 
 static inline enum comp_state get_wqe(struct rxe_qp *qp,
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.c b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
index a012522b577a..dc23cf3a6967 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.c
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.c
@@ -14,7 +14,6 @@ static const struct rdma_stat_desc rxe_counter_descs[] = {
 	[RXE_CNT_RCV_RNR].name             =  "rcvd_rnr_err",
 	[RXE_CNT_SND_RNR].name             =  "send_rnr_err",
 	[RXE_CNT_RCV_SEQ_ERR].name         =  "rcvd_seq_err",
-	[RXE_CNT_COMPLETER_SCHED].name     =  "ack_deferred",
 	[RXE_CNT_RETRY_EXCEEDED].name      =  "retry_exceeded_err",
 	[RXE_CNT_RNR_RETRY_EXCEEDED].name  =  "retry_rnr_exceeded_err",
 	[RXE_CNT_COMP_RETRY].name          =  "completer_retry_err",
diff --git a/drivers/infiniband/sw/rxe/rxe_hw_counters.h b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
index 71f4d4fa9dc8..303da0e3134a 100644
--- a/drivers/infiniband/sw/rxe/rxe_hw_counters.h
+++ b/drivers/infiniband/sw/rxe/rxe_hw_counters.h
@@ -18,7 +18,6 @@ enum rxe_counters {
 	RXE_CNT_RCV_RNR,
 	RXE_CNT_SND_RNR,
 	RXE_CNT_RCV_SEQ_ERR,
-	RXE_CNT_COMPLETER_SCHED,
 	RXE_CNT_RETRY_EXCEEDED,
 	RXE_CNT_RNR_RETRY_EXCEEDED,
 	RXE_CNT_COMP_RETRY,
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index da470a925efc..969e057bbfd1 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -46,21 +46,10 @@ static char *resp_state_name[] = {
 	[RESPST_EXIT]				= "EXIT",
 };
 
-/* rxe_recv calls here to add a request packet to the input queue */
 void rxe_resp_queue_pkt(struct rxe_qp *qp, struct sk_buff *skb)
 {
-	int must_sched;
-	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
-
 	skb_queue_tail(&qp->req_pkts, skb);
-
-	must_sched = (pkt->opcode == IB_OPCODE_RC_RDMA_READ_REQUEST) ||
-			(skb_queue_len(&qp->req_pkts) > 1);
-
-	if (must_sched)
-		rxe_sched_task(&qp->resp.task);
-	else
-		rxe_run_task(&qp->resp.task);
+	rxe_sched_task(&qp->resp.task);
 }
 
 static inline enum resp_states get_req(struct rxe_qp *qp,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Daisuke Matsuda
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

Some functions in rxe_mr.c are going to be used in rxe_odp.c, which is to
be created in the subsequent patch. List the declarations of the functions
in rxe_loc.h.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h |  8 ++++++++
 drivers/infiniband/sw/rxe/rxe_mr.c  | 11 +++--------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 4d2a8ef52c85..eb867f7d0d36 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -58,6 +58,7 @@ int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
 
 /* rxe_mr.c */
 u8 rxe_get_next_key(u32 last_key);
+void rxe_mr_init(int access, struct rxe_mr *mr);
 void rxe_mr_init_dma(int access, struct rxe_mr *mr);
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr);
@@ -69,6 +70,8 @@ int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
 	      void *addr, int length, enum rxe_mr_copy_dir dir);
 int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		  int sg_nents, unsigned int *sg_offset);
+int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
+		       unsigned int length, enum rxe_mr_copy_dir dir);
 int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
 			u64 compare, u64 swap_add, u64 *orig_val);
 int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value);
@@ -80,6 +83,11 @@ int rxe_invalidate_mr(struct rxe_qp *qp, u32 key);
 int rxe_reg_fast_mr(struct rxe_qp *qp, struct rxe_send_wqe *wqe);
 void rxe_mr_cleanup(struct rxe_pool_elem *elem);
 
+static inline unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
+{
+	return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift);
+}
+
 /* rxe_mw.c */
 int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata);
 int rxe_dealloc_mw(struct ib_mw *ibmw);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..86b1908d304b 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -45,7 +45,7 @@ int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length)
 	}
 }
 
-static void rxe_mr_init(int access, struct rxe_mr *mr)
+void rxe_mr_init(int access, struct rxe_mr *mr)
 {
 	u32 key = mr->elem.index << 8 | rxe_get_next_key(-1);
 
@@ -72,11 +72,6 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr)
 	mr->ibmr.type = IB_MR_TYPE_DMA;
 }
 
-static unsigned long rxe_mr_iova_to_index(struct rxe_mr *mr, u64 iova)
-{
-	return (iova >> mr->page_shift) - (mr->ibmr.iova >> mr->page_shift);
-}
-
 static unsigned long rxe_mr_iova_to_page_offset(struct rxe_mr *mr, u64 iova)
 {
 	return iova & (mr_page_size(mr) - 1);
@@ -242,8 +237,8 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
 	return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
 }
 
-static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
-			      unsigned int length, enum rxe_mr_copy_dir dir)
+int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
+		       unsigned int length, enum rxe_mr_copy_dir dir)
 {
 	unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
 	unsigned long index = rxe_mr_iova_to_index(mr, iova);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support Daisuke Matsuda
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

To use the resp_states values in rxe_loc.h, it is necessary to move the
definition to rxe_verbs.h, where other internal states of this driver are
defined.

Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe.h       | 37 ---------------------------
 drivers/infiniband/sw/rxe/rxe_verbs.h | 37 +++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h
index d33dd6cf83d3..9b4d044a1264 100644
--- a/drivers/infiniband/sw/rxe/rxe.h
+++ b/drivers/infiniband/sw/rxe/rxe.h
@@ -100,43 +100,6 @@
 #define rxe_info_mw(mw, fmt, ...) ibdev_info_ratelimited((mw)->ibmw.device, \
 		"mw#%d %s:  " fmt, (mw)->elem.index, __func__, ##__VA_ARGS__)
 
-/* responder states */
-enum resp_states {
-	RESPST_NONE,
-	RESPST_GET_REQ,
-	RESPST_CHK_PSN,
-	RESPST_CHK_OP_SEQ,
-	RESPST_CHK_OP_VALID,
-	RESPST_CHK_RESOURCE,
-	RESPST_CHK_LENGTH,
-	RESPST_CHK_RKEY,
-	RESPST_EXECUTE,
-	RESPST_READ_REPLY,
-	RESPST_ATOMIC_REPLY,
-	RESPST_ATOMIC_WRITE_REPLY,
-	RESPST_PROCESS_FLUSH,
-	RESPST_COMPLETE,
-	RESPST_ACKNOWLEDGE,
-	RESPST_CLEANUP,
-	RESPST_DUPLICATE_REQUEST,
-	RESPST_ERR_MALFORMED_WQE,
-	RESPST_ERR_UNSUPPORTED_OPCODE,
-	RESPST_ERR_MISALIGNED_ATOMIC,
-	RESPST_ERR_PSN_OUT_OF_SEQ,
-	RESPST_ERR_MISSING_OPCODE_FIRST,
-	RESPST_ERR_MISSING_OPCODE_LAST_C,
-	RESPST_ERR_MISSING_OPCODE_LAST_D1E,
-	RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
-	RESPST_ERR_RNR,
-	RESPST_ERR_RKEY_VIOLATION,
-	RESPST_ERR_INVALIDATE_RKEY,
-	RESPST_ERR_LENGTH,
-	RESPST_ERR_CQ_OVERFLOW,
-	RESPST_ERROR,
-	RESPST_DONE,
-	RESPST_EXIT,
-};
-
 void rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu);
 
 int rxe_add(struct rxe_dev *rxe, unsigned int mtu, const char *ibdev_name);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ccb9d19ffe8a..1058b5de8920 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -127,6 +127,43 @@ struct rxe_comp_info {
 	struct rxe_task		task;
 };
 
+/* responder states */
+enum resp_states {
+	RESPST_NONE,
+	RESPST_GET_REQ,
+	RESPST_CHK_PSN,
+	RESPST_CHK_OP_SEQ,
+	RESPST_CHK_OP_VALID,
+	RESPST_CHK_RESOURCE,
+	RESPST_CHK_LENGTH,
+	RESPST_CHK_RKEY,
+	RESPST_EXECUTE,
+	RESPST_READ_REPLY,
+	RESPST_ATOMIC_REPLY,
+	RESPST_ATOMIC_WRITE_REPLY,
+	RESPST_PROCESS_FLUSH,
+	RESPST_COMPLETE,
+	RESPST_ACKNOWLEDGE,
+	RESPST_CLEANUP,
+	RESPST_DUPLICATE_REQUEST,
+	RESPST_ERR_MALFORMED_WQE,
+	RESPST_ERR_UNSUPPORTED_OPCODE,
+	RESPST_ERR_MISALIGNED_ATOMIC,
+	RESPST_ERR_PSN_OUT_OF_SEQ,
+	RESPST_ERR_MISSING_OPCODE_FIRST,
+	RESPST_ERR_MISSING_OPCODE_LAST_C,
+	RESPST_ERR_MISSING_OPCODE_LAST_D1E,
+	RESPST_ERR_TOO_MANY_RDMA_ATM_REQ,
+	RESPST_ERR_RNR,
+	RESPST_ERR_RKEY_VIOLATION,
+	RESPST_ERR_INVALIDATE_RKEY,
+	RESPST_ERR_LENGTH,
+	RESPST_ERR_CQ_OVERFLOW,
+	RESPST_ERROR,
+	RESPST_DONE,
+	RESPST_EXIT,
+};
+
 enum rdatm_res_state {
 	rdatm_res_state_next,
 	rdatm_res_state_new,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
                   ` (2 preceding siblings ...)
  2023-11-09  5:44 ` [PATCH for-next v7 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

On page invalidation, an MMU notifier callback is invoked to unmap DMA
addresses and update the driver page table(umem_odp->dma_list). It also
sets the corresponding entries in MR xarray to NULL to prevent any access.
The callback is registered when an ODP-enabled MR is created.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/Makefile  |  2 +
 drivers/infiniband/sw/rxe/rxe_odp.c | 57 +++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)
 create mode 100644 drivers/infiniband/sw/rxe/rxe_odp.c

diff --git a/drivers/infiniband/sw/rxe/Makefile b/drivers/infiniband/sw/rxe/Makefile
index 5395a581f4bb..93134f1d1d0c 100644
--- a/drivers/infiniband/sw/rxe/Makefile
+++ b/drivers/infiniband/sw/rxe/Makefile
@@ -23,3 +23,5 @@ rdma_rxe-y := \
 	rxe_task.o \
 	rxe_net.o \
 	rxe_hw_counters.o
+
+rdma_rxe-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += rxe_odp.o
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
new file mode 100644
index 000000000000..ea55b79be0c6
--- /dev/null
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2022-2023 Fujitsu Ltd. All rights reserved.
+ */
+
+#include <linux/hmm.h>
+
+#include <rdma/ib_umem_odp.h>
+
+#include "rxe.h"
+
+static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start,
+				unsigned long end)
+{
+	unsigned long upper = rxe_mr_iova_to_index(mr, end - 1);
+	unsigned long lower = rxe_mr_iova_to_index(mr, start);
+	void *entry;
+
+	XA_STATE(xas, &mr->page_list, lower);
+
+	/* make elements in xarray NULL */
+	xas_lock(&xas);
+	xas_for_each(&xas, entry, upper)
+		xas_store(&xas, NULL);
+	xas_unlock(&xas);
+}
+
+static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni,
+				    const struct mmu_notifier_range *range,
+				    unsigned long cur_seq)
+{
+	struct ib_umem_odp *umem_odp =
+		container_of(mni, struct ib_umem_odp, notifier);
+	struct rxe_mr *mr = umem_odp->private;
+	unsigned long start, end;
+
+	if (!mmu_notifier_range_blockable(range))
+		return false;
+
+	mutex_lock(&umem_odp->umem_mutex);
+	mmu_interval_set_seq(mni, cur_seq);
+
+	start = max_t(u64, ib_umem_start(umem_odp), range->start);
+	end = min_t(u64, ib_umem_end(umem_odp), range->end);
+
+	rxe_mr_unset_xarray(mr, start, end);
+
+	/* update umem_odp->dma_list */
+	ib_umem_odp_unmap_dma_pages(umem_odp, start, end);
+
+	mutex_unlock(&umem_odp->umem_mutex);
+	return true;
+}
+
+const struct mmu_interval_notifier_ops rxe_mn_ops = {
+	.invalidate = rxe_ib_invalidate_range,
+};
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
                   ` (3 preceding siblings ...)
  2023-11-09  5:44 ` [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09 18:49   ` kernel test robot
  2023-11-09  5:44 ` [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Daisuke Matsuda
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

Allow userspace to register an ODP-enabled MR, in which case the flag
IB_ACCESS_ON_DEMAND is passed to rxe_reg_user_mr(). However, there is no
RDMA operation enabled right now. They will be supported later in the
subsequent two patches.

rxe_odp_do_pagefault() is called to initialize an ODP-enabled MR. It syncs
process address space from the CPU page table to the driver page table
(dma_list/pfn_list in umem_odp) when called with RXE_PAGEFAULT_SNAPSHOT
flag. Additionally, It can be used to trigger page fault when pages being
accessed are not present or do not have proper read/write permissions, and
possibly to prefetch pages in the future.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe.c       |   7 ++
 drivers/infiniband/sw/rxe/rxe_loc.h   |  14 +++
 drivers/infiniband/sw/rxe/rxe_mr.c    |   9 +-
 drivers/infiniband/sw/rxe/rxe_odp.c   | 122 ++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_resp.c  |  15 +++-
 drivers/infiniband/sw/rxe/rxe_verbs.c |   5 +-
 6 files changed, 166 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 54c723a6edda..f2284d27229b 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -73,6 +73,13 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
 			rxe->ndev->dev_addr);
 
 	rxe->max_ucontext			= RXE_MAX_UCONTEXT;
+
+	if (IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING)) {
+		rxe->attr.kernel_cap_flags |= IBK_ON_DEMAND_PAGING;
+
+		/* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */
+		rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT;
+	}
 }
 
 /* initialize port attributes */
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index eb867f7d0d36..4bda154a0248 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -188,4 +188,18 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp)
 	return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type];
 }
 
+/* rxe_odp.c */
+#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
+int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
+			 u64 iova, int access_flags, struct rxe_mr *mr);
+#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
+static inline int
+rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
+		     int access_flags, struct rxe_mr *mr)
+{
+	return -EOPNOTSUPP;
+}
+
+#endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
+
 #endif /* RXE_LOC_H */
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 86b1908d304b..384cb4ba1f2d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -318,7 +318,10 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 		return err;
 	}
 
-	return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
+	if (mr->umem->is_odp)
+		return -EOPNOTSUPP;
+	else
+		return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
 }
 
 /* copy data in or out of a wqe, i.e. sg list
@@ -527,6 +530,10 @@ int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value)
 	struct page *page;
 	u64 *va;
 
+	/* ODP is not supported right now. WIP. */
+	if (mr->umem->is_odp)
+		return RESPST_ERR_UNSUPPORTED_OPCODE;
+
 	/* See IBA oA19-28 */
 	if (unlikely(mr->state != RXE_MR_STATE_VALID)) {
 		rxe_dbg_mr(mr, "mr not in valid state");
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index ea55b79be0c6..c5e24901c141 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -9,6 +9,8 @@
 
 #include "rxe.h"
 
+#define RXE_ODP_WRITABLE_BIT    1UL
+
 static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start,
 				unsigned long end)
 {
@@ -25,6 +27,29 @@ static void rxe_mr_unset_xarray(struct rxe_mr *mr, unsigned long start,
 	xas_unlock(&xas);
 }
 
+static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start,
+			      unsigned long end, unsigned long *pfn_list)
+{
+	unsigned long upper = rxe_mr_iova_to_index(mr, end - 1);
+	unsigned long lower = rxe_mr_iova_to_index(mr, start);
+	void *page, *entry;
+
+	XA_STATE(xas, &mr->page_list, lower);
+
+	xas_lock(&xas);
+	while (xas.xa_index <= upper) {
+		if (pfn_list[xas.xa_index] & HMM_PFN_WRITE) {
+			page = xa_tag_pointer(hmm_pfn_to_page(pfn_list[xas.xa_index]),
+					      RXE_ODP_WRITABLE_BIT);
+		} else
+			page = hmm_pfn_to_page(pfn_list[xas.xa_index]);
+
+		xas_store(&xas, page);
+		entry = xas_next(&xas);
+	}
+	xas_unlock(&xas);
+}
+
 static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni,
 				    const struct mmu_notifier_range *range,
 				    unsigned long cur_seq)
@@ -55,3 +80,100 @@ static bool rxe_ib_invalidate_range(struct mmu_interval_notifier *mni,
 const struct mmu_interval_notifier_ops rxe_mn_ops = {
 	.invalidate = rxe_ib_invalidate_range,
 };
+
+#define RXE_PAGEFAULT_RDONLY BIT(1)
+#define RXE_PAGEFAULT_SNAPSHOT BIT(2)
+static int rxe_odp_do_pagefault_and_lock(struct rxe_mr *mr, u64 user_va, int bcnt, u32 flags)
+{
+	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+	bool fault = !(flags & RXE_PAGEFAULT_SNAPSHOT);
+	u64 access_mask;
+	int np;
+
+	access_mask = ODP_READ_ALLOWED_BIT;
+	if (umem_odp->umem.writable && !(flags & RXE_PAGEFAULT_RDONLY))
+		access_mask |= ODP_WRITE_ALLOWED_BIT;
+
+	/*
+	 * ib_umem_odp_map_dma_and_lock() locks umem_mutex on success.
+	 * Callers must release the lock later to let invalidation handler
+	 * do its work again.
+	 */
+	np = ib_umem_odp_map_dma_and_lock(umem_odp, user_va, bcnt,
+					  access_mask, fault);
+	if (np < 0)
+		return np;
+
+	/*
+	 * umem_mutex is still locked here, so we can use hmm_pfn_to_page()
+	 * safely to fetch pages in the range.
+	 */
+	rxe_mr_set_xarray(mr, user_va, user_va + bcnt, umem_odp->pfn_list);
+
+	return np;
+}
+
+static int rxe_odp_init_pages(struct rxe_mr *mr)
+{
+	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+	int ret;
+
+	ret = rxe_odp_do_pagefault_and_lock(mr, mr->umem->address,
+					    mr->umem->length,
+					    RXE_PAGEFAULT_SNAPSHOT);
+
+	if (ret >= 0)
+		mutex_unlock(&umem_odp->umem_mutex);
+
+	return ret >= 0 ? 0 : ret;
+}
+
+int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
+			 u64 iova, int access_flags, struct rxe_mr *mr)
+{
+	struct ib_umem_odp *umem_odp;
+	int err;
+
+	if (!IS_ENABLED(CONFIG_INFINIBAND_ON_DEMAND_PAGING))
+		return -EOPNOTSUPP;
+
+	rxe_mr_init(access_flags, mr);
+
+	xa_init(&mr->page_list);
+
+	if (!start && length == U64_MAX) {
+		if (iova != 0)
+			return -EINVAL;
+		if (!(rxe->attr.odp_caps.general_caps & IB_ODP_SUPPORT_IMPLICIT))
+			return -EINVAL;
+
+		/* Never reach here, for implicit ODP is not implemented. */
+	}
+
+	umem_odp = ib_umem_odp_get(&rxe->ib_dev, start, length, access_flags,
+				   &rxe_mn_ops);
+	if (IS_ERR(umem_odp)) {
+		rxe_dbg_mr(mr, "Unable to create umem_odp err = %d\n",
+			   (int)PTR_ERR(umem_odp));
+		return PTR_ERR(umem_odp);
+	}
+
+	umem_odp->private = mr;
+
+	mr->umem = &umem_odp->umem;
+	mr->access = access_flags;
+	mr->ibmr.length = length;
+	mr->ibmr.iova = iova;
+	mr->page_offset = ib_umem_offset(&umem_odp->umem);
+
+	err = rxe_odp_init_pages(mr);
+	if (err) {
+		ib_umem_odp_release(umem_odp);
+		return err;
+	}
+
+	mr->state = RXE_MR_STATE_VALID;
+	mr->ibmr.type = IB_MR_TYPE_USER;
+
+	return err;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 969e057bbfd1..9159f1bdfc6f 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -635,6 +635,10 @@ static enum resp_states process_flush(struct rxe_qp *qp,
 	struct rxe_mr *mr = qp->resp.mr;
 	struct resp_res *res = qp->resp.res;
 
+	/* ODP is not supported right now. WIP. */
+	if (mr->umem->is_odp)
+		return RESPST_ERR_UNSUPPORTED_OPCODE;
+
 	/* oA19-14, oA19-15 */
 	if (res && res->replay)
 		return RESPST_ACKNOWLEDGE;
@@ -688,10 +692,13 @@ static enum resp_states atomic_reply(struct rxe_qp *qp,
 	if (!res->replay) {
 		u64 iova = qp->resp.va + qp->resp.offset;
 
-		err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode,
-					  atmeth_comp(pkt),
-					  atmeth_swap_add(pkt),
-					  &res->atomic.orig_val);
+		if (mr->umem->is_odp)
+			err = RESPST_ERR_UNSUPPORTED_OPCODE;
+		else
+			err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode,
+						  atmeth_comp(pkt),
+						  atmeth_swap_add(pkt),
+						  &res->atomic.orig_val);
 		if (err)
 			return err;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 48f86839d36a..192ad835c712 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1278,7 +1278,10 @@ static struct ib_mr *rxe_reg_user_mr(struct ib_pd *ibpd, u64 start,
 	mr->ibmr.pd = ibpd;
 	mr->ibmr.device = ibpd->device;
 
-	err = rxe_mr_init_user(rxe, start, length, iova, access, mr);
+	if (access & IB_ACCESS_ON_DEMAND)
+		err = rxe_odp_mr_init_user(rxe, start, length, iova, access, mr);
+	else
+		err = rxe_mr_init_user(rxe, start, length, iova, access, mr);
 	if (err) {
 		rxe_dbg_mr(mr, "reg_user_mr failed, err = %d", err);
 		goto err_cleanup;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
                   ` (4 preceding siblings ...)
  2023-11-09  5:44 ` [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-11-09  5:44 ` [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations " Daisuke Matsuda
  2023-12-05  0:11 ` [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Jason Gunthorpe
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

rxe_mr_copy() is used widely to copy data to/from a user MR. requester uses
it to load payloads of requesting packets; responder uses it to process
Send, Write, and Read operaetions; completer uses it to copy data from
response packets of Read and Atomic operations to a user MR.

Allow these operations to be used with ODP by adding a subordinate function
rxe_odp_mr_copy(). It is comprised of the following steps:
 1. Check page presence and R/W permission.
 2. If OK, just execute data copy to/from the pages and exit.
 3. Otherwise, trigger page fault to map the pages.
 4. Update the MR xarray using PFNs in umem_odp->pfn_list.
 5. Execute data copy to/from the pages.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe.c     | 10 ++++
 drivers/infiniband/sw/rxe/rxe_loc.h |  8 +++
 drivers/infiniband/sw/rxe/rxe_mr.c  |  9 +++-
 drivers/infiniband/sw/rxe/rxe_odp.c | 77 +++++++++++++++++++++++++++++
 4 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index f2284d27229b..207a022156f0 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -79,6 +79,16 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
 
 		/* IB_ODP_SUPPORT_IMPLICIT is not supported right now. */
 		rxe->attr.odp_caps.general_caps |= IB_ODP_SUPPORT;
+
+		rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SEND;
+		rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_RECV;
+		rxe->attr.odp_caps.per_transport_caps.ud_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
+
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SEND;
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV;
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE;
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
 	}
 }
 
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 4bda154a0248..eeaeff8a1398 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -192,6 +192,8 @@ static inline unsigned int wr_opcode_mask(int opcode, struct rxe_qp *qp)
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
 			 u64 iova, int access_flags, struct rxe_mr *mr);
+int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
+		    enum rxe_mr_copy_dir dir);
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline int
 rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
@@ -199,6 +201,12 @@ rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 {
 	return -EOPNOTSUPP;
 }
+static inline int
+rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
+		int length, enum rxe_mr_copy_dir dir)
+{
+	return -EOPNOTSUPP;
+}
 
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 384cb4ba1f2d..f0ce87c0fc7d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -247,7 +247,12 @@ int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 	void *va;
 
 	while (length) {
-		page = xa_load(&mr->page_list, index);
+		if (mr->umem->is_odp)
+			page = xa_untag_pointer(xa_load(&mr->page_list,
+							index));
+		else
+			page = xa_load(&mr->page_list, index);
+
 		if (!page)
 			return -EFAULT;
 
@@ -319,7 +324,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 	}
 
 	if (mr->umem->is_odp)
-		return -EOPNOTSUPP;
+		return rxe_odp_mr_copy(mr, iova, addr, length, dir);
 	else
 		return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index c5e24901c141..5aa09b9c1095 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -177,3 +177,80 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
 
 	return err;
 }
+
+/* Take xarray spinlock before entry */
+static inline bool rxe_odp_check_pages(struct rxe_mr *mr, u64 iova,
+				       int length, u32 flags)
+{
+	unsigned long upper = rxe_mr_iova_to_index(mr, iova + length - 1);
+	unsigned long lower = rxe_mr_iova_to_index(mr, iova);
+	bool need_fault = false;
+	void *page, *entry;
+	size_t perm = 0;
+
+
+	if (!(flags & RXE_PAGEFAULT_RDONLY))
+		perm = RXE_ODP_WRITABLE_BIT;
+
+	XA_STATE(xas, &mr->page_list, lower);
+
+	while (xas.xa_index <= upper) {
+		page = xas_load(&xas);
+
+		/* Check page presence and write permission */
+		if (!page || (perm && !(xa_pointer_tag(page) & perm))) {
+			need_fault = true;
+			break;
+		}
+		entry = xas_next(&xas);
+	}
+
+	return need_fault;
+}
+
+int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
+		    enum rxe_mr_copy_dir dir)
+{
+	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+	u32 flags = 0;
+	int err;
+
+	if (unlikely(!mr->umem->is_odp))
+		return -EOPNOTSUPP;
+
+	switch (dir) {
+	case RXE_TO_MR_OBJ:
+		break;
+
+	case RXE_FROM_MR_OBJ:
+		flags = RXE_PAGEFAULT_RDONLY;
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	spin_lock(&mr->page_list.xa_lock);
+
+	if (rxe_odp_check_pages(mr, iova, length, flags)) {
+		spin_unlock(&mr->page_list.xa_lock);
+
+		/* umem_mutex is locked on success */
+		err = rxe_odp_do_pagefault_and_lock(mr, iova, length, flags);
+		if (err < 0)
+			return err;
+
+		/*
+		 * The spinlock is always locked under mutex_lock except
+		 * for MR initialization. No worry about deadlock.
+		 */
+		spin_lock(&mr->page_list.xa_lock);
+		mutex_unlock(&umem_odp->umem_mutex);
+	}
+
+	err =  rxe_mr_copy_xarray(mr, iova, addr, length, dir);
+
+	spin_unlock(&mr->page_list.xa_lock);
+
+	return err;
+}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations with ODP
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
                   ` (5 preceding siblings ...)
  2023-11-09  5:44 ` [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Daisuke Matsuda
@ 2023-11-09  5:44 ` Daisuke Matsuda
  2023-12-05  0:11 ` [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Jason Gunthorpe
  7 siblings, 0 replies; 16+ messages in thread
From: Daisuke Matsuda @ 2023-11-09  5:44 UTC (permalink / raw)
  To: linux-rdma, leon, jgg, zyjzyj2000
  Cc: linux-kernel, rpearsonhpe, yangx.jy, lizhijian, y-goto,
	Daisuke Matsuda

Enable 'fetch and add' and 'compare and swap' operations to be used with
ODP. This is comprised of the following steps:
 1. Verify that the page is present with write permission.
 2. If OK, execute the operation and exit.
 3. If not, then trigger page fault to map the page.
 4. Update the entry in the MR xarray.
 5. Execute the operation.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
---
 drivers/infiniband/sw/rxe/rxe.c      |  1 +
 drivers/infiniband/sw/rxe/rxe_loc.h  |  9 ++++++++
 drivers/infiniband/sw/rxe/rxe_mr.c   |  7 +++++-
 drivers/infiniband/sw/rxe/rxe_odp.c  | 33 ++++++++++++++++++++++++++++
 drivers/infiniband/sw/rxe/rxe_resp.c |  5 ++++-
 5 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 207a022156f0..abd3267c2873 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -88,6 +88,7 @@ static void rxe_init_device_param(struct rxe_dev *rxe)
 		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_RECV;
 		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_WRITE;
 		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_READ;
+		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_ATOMIC;
 		rxe->attr.odp_caps.per_transport_caps.rc_odp_caps |= IB_ODP_SUPPORT_SRQ_RECV;
 	}
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index eeaeff8a1398..0bae9044f362 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -194,6 +194,9 @@ int rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length,
 			 u64 iova, int access_flags, struct rxe_mr *mr);
 int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
 		    enum rxe_mr_copy_dir dir);
+int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
+			 u64 compare, u64 swap_add, u64 *orig_val);
+
 #else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 static inline int
 rxe_odp_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
@@ -207,6 +210,12 @@ rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 {
 	return -EOPNOTSUPP;
 }
+static inline int
+rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
+		     u64 compare, u64 swap_add, u64 *orig_val)
+{
+	return RESPST_ERR_UNSUPPORTED_OPCODE;
+}
 
 #endif /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
 
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f0ce87c0fc7d..0dc452ab772b 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -498,7 +498,12 @@ int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
 		}
 		page_offset = rxe_mr_iova_to_page_offset(mr, iova);
 		index = rxe_mr_iova_to_index(mr, iova);
-		page = xa_load(&mr->page_list, index);
+
+		if (mr->umem->is_odp)
+			page = xa_untag_pointer(xa_load(&mr->page_list, index));
+		else
+			page = xa_load(&mr->page_list, index);
+
 		if (!page)
 			return RESPST_ERR_RKEY_VIOLATION;
 	}
diff --git a/drivers/infiniband/sw/rxe/rxe_odp.c b/drivers/infiniband/sw/rxe/rxe_odp.c
index 5aa09b9c1095..45b54ba15210 100644
--- a/drivers/infiniband/sw/rxe/rxe_odp.c
+++ b/drivers/infiniband/sw/rxe/rxe_odp.c
@@ -254,3 +254,36 @@ int rxe_odp_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
 
 	return err;
 }
+
+int rxe_odp_mr_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
+			 u64 compare, u64 swap_add, u64 *orig_val)
+{
+	struct ib_umem_odp *umem_odp = to_ib_umem_odp(mr->umem);
+	int err;
+
+	spin_lock(&mr->page_list.xa_lock);
+
+	/* Atomic operations manipulate a single char. */
+	if (rxe_odp_check_pages(mr, iova, sizeof(char), 0)) {
+		spin_unlock(&mr->page_list.xa_lock);
+
+		/* umem_mutex is locked on success */
+		err = rxe_odp_do_pagefault_and_lock(mr, iova, sizeof(char), 0);
+		if (err < 0)
+			return err;
+
+		/*
+		 * The spinlock is always locked under mutex_lock except
+		 * for MR initialization. No worry about deadlock.
+		 */
+		spin_lock(&mr->page_list.xa_lock);
+		mutex_unlock(&umem_odp->umem_mutex);
+	}
+
+	err = rxe_mr_do_atomic_op(mr, iova, opcode, compare,
+				  swap_add, orig_val);
+
+	spin_unlock(&mr->page_list.xa_lock);
+
+	return err;
+}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 9159f1bdfc6f..af3e669679a0 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -693,7 +693,10 @@ static enum resp_states atomic_reply(struct rxe_qp *qp,
 		u64 iova = qp->resp.va + qp->resp.offset;
 
 		if (mr->umem->is_odp)
-			err = RESPST_ERR_UNSUPPORTED_OPCODE;
+			err = rxe_odp_mr_atomic_op(mr, iova, pkt->opcode,
+						   atmeth_comp(pkt),
+						   atmeth_swap_add(pkt),
+						   &res->atomic.orig_val);
 		else
 			err = rxe_mr_do_atomic_op(mr, iova, pkt->opcode,
 						  atmeth_comp(pkt),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging
  2023-11-09  5:44 ` [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
@ 2023-11-09 18:49   ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2023-11-09 18:49 UTC (permalink / raw)
  To: Daisuke Matsuda, linux-rdma, leon, jgg, zyjzyj2000
  Cc: oe-kbuild-all, linux-kernel, rpearsonhpe, yangx.jy, lizhijian,
	y-goto, Daisuke Matsuda

Hi Daisuke,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rdma/for-next]
[also build test WARNING on linus/master v6.6 next-20231109]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Daisuke-Matsuda/RDMA-rxe-Always-defer-tasks-on-responder-and-completer-to-workqueue/20231109-185612
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
patch link:    https://lore.kernel.org/r/5d46bd682aa8e3d5cabc38ca1cd67d2976f2731d.1699503619.git.matsuda-daisuke%40fujitsu.com
patch subject: [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging
config: sparc-allyesconfig (https://download.01.org/0day-ci/archive/20231110/202311100130.efgRVVKL-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311100130.efgRVVKL-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311100130.efgRVVKL-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/infiniband/sw/rxe/rxe_odp.c: In function 'rxe_mr_set_xarray':
>> drivers/infiniband/sw/rxe/rxe_odp.c:35:22: warning: variable 'entry' set but not used [-Wunused-but-set-variable]
      35 |         void *page, *entry;
         |                      ^~~~~


vim +/entry +35 drivers/infiniband/sw/rxe/rxe_odp.c

    29	
    30	static void rxe_mr_set_xarray(struct rxe_mr *mr, unsigned long start,
    31				      unsigned long end, unsigned long *pfn_list)
    32	{
    33		unsigned long upper = rxe_mr_iova_to_index(mr, end - 1);
    34		unsigned long lower = rxe_mr_iova_to_index(mr, start);
  > 35		void *page, *entry;
    36	
    37		XA_STATE(xas, &mr->page_list, lower);
    38	
    39		xas_lock(&xas);
    40		while (xas.xa_index <= upper) {
    41			if (pfn_list[xas.xa_index] & HMM_PFN_WRITE) {
    42				page = xa_tag_pointer(hmm_pfn_to_page(pfn_list[xas.xa_index]),
    43						      RXE_ODP_WRITABLE_BIT);
    44			} else
    45				page = hmm_pfn_to_page(pfn_list[xas.xa_index]);
    46	
    47			xas_store(&xas, page);
    48			entry = xas_next(&xas);
    49		}
    50		xas_unlock(&xas);
    51	}
    52	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
                   ` (6 preceding siblings ...)
  2023-11-09  5:44 ` [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations " Daisuke Matsuda
@ 2023-12-05  0:11 ` Jason Gunthorpe
  2023-12-05  1:50   ` Zhu Yanjun
  7 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2023-12-05  0:11 UTC (permalink / raw)
  To: Daisuke Matsuda
  Cc: linux-rdma, leon, zyjzyj2000, linux-kernel, rpearsonhpe, yangx.jy,
	lizhijian, y-goto

On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> 
> Daisuke Matsuda (7):
>   RDMA/rxe: Always defer tasks on responder and completer to workqueue
>   RDMA/rxe: Make MR functions accessible from other rxe source code
>   RDMA/rxe: Move resp_states definition to rxe_verbs.h
>   RDMA/rxe: Add page invalidation support
>   RDMA/rxe: Allow registering MRs for On-Demand Paging
>   RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>   RDMA/rxe: Add support for the traditional Atomic operations with ODP

What is the current situation with rxe? I don't recall seeing the bugs
that were reported get fixed?

I'm reluctant to dig a deeper hold until it is done?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-05  0:11 ` [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Jason Gunthorpe
@ 2023-12-05  1:50   ` Zhu Yanjun
  2023-12-07  6:37     ` Daisuke Matsuda (Fujitsu)
  0 siblings, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2023-12-05  1:50 UTC (permalink / raw)
  To: Jason Gunthorpe, Daisuke Matsuda
  Cc: linux-rdma, leon, zyjzyj2000, linux-kernel, rpearsonhpe, yangx.jy,
	lizhijian, y-goto

在 2023/12/5 8:11, Jason Gunthorpe 写道:
> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>
>> Daisuke Matsuda (7):
>>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>    RDMA/rxe: Make MR functions accessible from other rxe source code
>>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>    RDMA/rxe: Add page invalidation support
>>    RDMA/rxe: Allow registering MRs for On-Demand Paging
>>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> 
> What is the current situation with rxe? I don't recall seeing the bugs
> that were reported get fixed?

Exactly. A problem is reported in the link 
https://www.spinics.net/lists/linux-rdma/msg120947.html

It seems that a variable 'entry' set but not used 
[-Wunused-but-set-variable]

And ODP is an important feature. Should we suggest to add a test case 
about this ODP in rdma-core to verify this ODP feature?

Zhu Yanjun

> 
> I'm reluctant to dig a deeper hold until it is done?
> 
> Thanks,
> Jason


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-05  1:50   ` Zhu Yanjun
@ 2023-12-07  6:37     ` Daisuke Matsuda (Fujitsu)
  2023-12-12 18:07       ` Zhu Yanjun
  2024-01-04 14:56       ` Jason Gunthorpe
  0 siblings, 2 replies; 16+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2023-12-07  6:37 UTC (permalink / raw)
  To: 'Zhu Yanjun', Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Xiao Yang (Fujitsu), Zhijian Li (Fujitsu),
	Yasunori Gotou (Fujitsu)

On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> 
> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>
> >> Daisuke Matsuda (7):
> >>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >>    RDMA/rxe: Make MR functions accessible from other rxe source code
> >>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >>    RDMA/rxe: Add page invalidation support
> >>    RDMA/rxe: Allow registering MRs for On-Demand Paging
> >>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >
> > What is the current situation with rxe? I don't recall seeing the bugs
> > that were reported get fixed?

Well, I suppose Jason is mentioning "blktests srp/002 hang".
cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/

It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
so the hang looks not specific to rxe.
cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.


There is another issue that causes kernel panic.
[bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
Zhijian has submitted patches to fix this, and he got some comments.
It looks he is involved in CXL driver intensively these days.
I guess he is still working on it.

> 
> Exactly. A problem is reported in the link
> https://www.spinics.net/lists/linux-rdma/msg120947.html
> 
> It seems that a variable 'entry' set but not used
> [-Wunused-but-set-variable]

Yeah, I can revise the patch anytime.

> 
> And ODP is an important feature. Should we suggest to add a test case
> about this ODP in rdma-core to verify this ODP feature?

Rxe can share the same tests with mlx5.
I added test cases for Write, Read and Atomic operations with ODP,
and we can add more tests if there are any suggestions.
Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks,
Daisuke Matsuda

> 
> Zhu Yanjun
> 
> >
> > I'm reluctant to dig a deeper hold until it is done?
> >
> > Thanks,
> > Jason


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-07  6:37     ` Daisuke Matsuda (Fujitsu)
@ 2023-12-12 18:07       ` Zhu Yanjun
  2023-12-14  5:55         ` Daisuke Matsuda (Fujitsu)
  2024-01-04 14:56       ` Jason Gunthorpe
  1 sibling, 1 reply; 16+ messages in thread
From: Zhu Yanjun @ 2023-12-12 18:07 UTC (permalink / raw)
  To: Daisuke Matsuda (Fujitsu), Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Xiao Yang (Fujitsu), Zhijian Li (Fujitsu),
	Yasunori Gotou (Fujitsu)

在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>
>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>
>>>> Daisuke Matsuda (7):
>>>>     RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>>     RDMA/rxe: Make MR functions accessible from other rxe source code
>>>>     RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>>     RDMA/rxe: Add page invalidation support
>>>>     RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>>     RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>>     RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>
>>> What is the current situation with rxe? I don't recall seeing the bugs
>>> that were reported get fixed?
> 
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> 
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
> 
> 
> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> 
> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> Zhijian has submitted patches to fix this, and he got some comments.
> It looks he is involved in CXL driver intensively these days.
> I guess he is still working on it.
> 
>>
>> Exactly. A problem is reported in the link
>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>
>> It seems that a variable 'entry' set but not used
>> [-Wunused-but-set-variable]
> 
> Yeah, I can revise the patch anytime.
> 
>>
>> And ODP is an important feature. Should we suggest to add a test case
>> about this ODP in rdma-core to verify this ODP feature?
> 
> Rxe can share the same tests with mlx5.
> I added test cases for Write, Read and Atomic operations with ODP,
> and we can add more tests if there are any suggestions.
> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py

Thanks a lot.
Do you make tests with blktests after your patches are applied with the 
latest kernel?

Zhu Yanjun

> 
> Thanks,
> Daisuke Matsuda
> 
>>
>> Zhu Yanjun
>>
>>>
>>> I'm reluctant to dig a deeper hold until it is done?
>>>
>>> Thanks,
>>> Jason
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-12 18:07       ` Zhu Yanjun
@ 2023-12-14  5:55         ` Daisuke Matsuda (Fujitsu)
  2023-12-15  2:46           ` Zhu Yanjun
  0 siblings, 1 reply; 16+ messages in thread
From: Daisuke Matsuda (Fujitsu) @ 2023-12-14  5:55 UTC (permalink / raw)
  To: 'Zhu Yanjun', Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Xiao Yang (Fujitsu), Zhijian Li (Fujitsu),
	Yasunori Gotou (Fujitsu)

On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
> > On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> >>
> >> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> >>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> >>>>
> >>>> Daisuke Matsuda (7):
> >>>>     RDMA/rxe: Always defer tasks on responder and completer to workqueue
> >>>>     RDMA/rxe: Make MR functions accessible from other rxe source code
> >>>>     RDMA/rxe: Move resp_states definition to rxe_verbs.h
> >>>>     RDMA/rxe: Add page invalidation support
> >>>>     RDMA/rxe: Allow registering MRs for On-Demand Paging
> >>>>     RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> >>>>     RDMA/rxe: Add support for the traditional Atomic operations with ODP
> >>>
> >>> What is the current situation with rxe? I don't recall seeing the bugs
> >>> that were reported get fixed?
> >
> > Well, I suppose Jason is mentioning "blktests srp/002 hang".
> > cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> >
> > It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> > so the hang looks not specific to rxe.
> > cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> > I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
> >
> >
> > There is another issue that causes kernel panic.
> > [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> > cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
> >
> > https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
> > Zhijian has submitted patches to fix this, and he got some comments.
> > It looks he is involved in CXL driver intensively these days.
> > I guess he is still working on it.
> >
> >>
> >> Exactly. A problem is reported in the link
> >> https://www.spinics.net/lists/linux-rdma/msg120947.html
> >>
> >> It seems that a variable 'entry' set but not used
> >> [-Wunused-but-set-variable]
> >
> > Yeah, I can revise the patch anytime.
> >
> >>
> >> And ODP is an important feature. Should we suggest to add a test case
> >> about this ODP in rdma-core to verify this ODP feature?
> >
> > Rxe can share the same tests with mlx5.
> > I added test cases for Write, Read and Atomic operations with ODP,
> > and we can add more tests if there are any suggestions.
> > Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
> 
> Thanks a lot.
> Do you make tests with blktests after your patches are applied with the
> latest kernel?

I have not done that yet, but I agree I should do it.
I will try to take time for the test before submitting v8

Thanks,
Daisuke Matsuda


> 
> Zhu Yanjun
> 
> >
> > Thanks,
> > Daisuke Matsuda
> >
> >>
> >> Zhu Yanjun
> >>
> >>>
> >>> I'm reluctant to dig a deeper hold until it is done?
> >>>
> >>> Thanks,
> >>> Jason
> >
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-14  5:55         ` Daisuke Matsuda (Fujitsu)
@ 2023-12-15  2:46           ` Zhu Yanjun
  0 siblings, 0 replies; 16+ messages in thread
From: Zhu Yanjun @ 2023-12-15  2:46 UTC (permalink / raw)
  To: Daisuke Matsuda (Fujitsu), Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, leon@kernel.org, zyjzyj2000@gmail.com,
	linux-kernel@vger.kernel.org, rpearsonhpe@gmail.com,
	Xiao Yang (Fujitsu), Zhijian Li (Fujitsu),
	Yasunori Gotou (Fujitsu)


在 2023/12/14 13:55, Daisuke Matsuda (Fujitsu) 写道:
> On Wed, Dec 13, 2023 3:08 AM Zhu Yanjun wrote:
>> 在 2023/12/7 14:37, Daisuke Matsuda (Fujitsu) 写道:
>>> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
>>>> 在 2023/12/5 8:11, Jason Gunthorpe 写道:
>>>>> On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
>>>>>> Daisuke Matsuda (7):
>>>>>>      RDMA/rxe: Always defer tasks on responder and completer to workqueue
>>>>>>      RDMA/rxe: Make MR functions accessible from other rxe source code
>>>>>>      RDMA/rxe: Move resp_states definition to rxe_verbs.h
>>>>>>      RDMA/rxe: Add page invalidation support
>>>>>>      RDMA/rxe: Allow registering MRs for On-Demand Paging
>>>>>>      RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
>>>>>>      RDMA/rxe: Add support for the traditional Atomic operations with ODP
>>>>> What is the current situation with rxe? I don't recall seeing the bugs
>>>>> that were reported get fixed?
>>> Well, I suppose Jason is mentioning "blktests srp/002 hang".
>>> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
>>>
>>> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
>>> so the hang looks not specific to rxe.
>>> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
>>> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.
>>>
>>>
>>> There is another issue that causes kernel panic.
>>> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
>>> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
>>>
>>> https://patchwork.kernel.org/project/linux-rdma/list/?series=798592&state=*
>>> Zhijian has submitted patches to fix this, and he got some comments.
>>> It looks he is involved in CXL driver intensively these days.
>>> I guess he is still working on it.
>>>
>>>> Exactly. A problem is reported in the link
>>>> https://www.spinics.net/lists/linux-rdma/msg120947.html
>>>>
>>>> It seems that a variable 'entry' set but not used
>>>> [-Wunused-but-set-variable]
>>> Yeah, I can revise the patch anytime.
>>>
>>>> And ODP is an important feature. Should we suggest to add a test case
>>>> about this ODP in rdma-core to verify this ODP feature?
>>> Rxe can share the same tests with mlx5.
>>> I added test cases for Write, Read and Atomic operations with ODP,
>>> and we can add more tests if there are any suggestions.
>>> Cf. https://github.com/linux-rdma/rdma-core/blob/master/tests/test_odp.py
>> Thanks a lot.
>> Do you make tests with blktests after your patches are applied with the
>> latest kernel?
> I have not done that yet, but I agree I should do it.
> I will try to take time for the test before submitting v8

Thanks. Hope blktest can work well with your commits.

Zhu Yanjun

>
> Thanks,
> Daisuke Matsuda
>
>
>> Zhu Yanjun
>>
>>> Thanks,
>>> Daisuke Matsuda
>>>
>>>> Zhu Yanjun
>>>>
>>>>> I'm reluctant to dig a deeper hold until it is done?
>>>>>
>>>>> Thanks,
>>>>> Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE
  2023-12-07  6:37     ` Daisuke Matsuda (Fujitsu)
  2023-12-12 18:07       ` Zhu Yanjun
@ 2024-01-04 14:56       ` Jason Gunthorpe
  1 sibling, 0 replies; 16+ messages in thread
From: Jason Gunthorpe @ 2024-01-04 14:56 UTC (permalink / raw)
  To: Daisuke Matsuda (Fujitsu), rpearsonhpe@gmail.com
  Cc: 'Zhu Yanjun', linux-rdma@vger.kernel.org, leon@kernel.org,
	zyjzyj2000@gmail.com, linux-kernel@vger.kernel.org,
	Xiao Yang (Fujitsu), Zhijian Li (Fujitsu),
	Yasunori Gotou (Fujitsu)

On Thu, Dec 07, 2023 at 06:37:13AM +0000, Daisuke Matsuda (Fujitsu) wrote:
> On Tue, Dec 5, 2023 10:51 AM Zhu Yanjun wrote:
> > 
> > 在 2023/12/5 8:11, Jason Gunthorpe 写道:
> > > On Thu, Nov 09, 2023 at 02:44:45PM +0900, Daisuke Matsuda wrote:
> > >>
> > >> Daisuke Matsuda (7):
> > >>    RDMA/rxe: Always defer tasks on responder and completer to workqueue
> > >>    RDMA/rxe: Make MR functions accessible from other rxe source code
> > >>    RDMA/rxe: Move resp_states definition to rxe_verbs.h
> > >>    RDMA/rxe: Add page invalidation support
> > >>    RDMA/rxe: Allow registering MRs for On-Demand Paging
> > >>    RDMA/rxe: Add support for Send/Recv/Write/Read with ODP
> > >>    RDMA/rxe: Add support for the traditional Atomic operations with ODP
> > >
> > > What is the current situation with rxe? I don't recall seeing the bugs
> > > that were reported get fixed?
> 
> Well, I suppose Jason is mentioning "blktests srp/002 hang".
> cf. https://lore.kernel.org/linux-rdma/dsg6rd66tyiei32zaxs6ddv5ebefr5vtxjwz6d2ewqrcwisogl@ge7jzan7dg5u/T/
> 
> It is likely to be a timing issue. Bob reported that "siw hangs with the debug kernel",
> so the hang looks not specific to rxe.
> cf. https://lore.kernel.org/all/53ede78a-f73d-44cd-a555-f8ff36bd9c55@acm.org/T/
> I think we need to decide whether to continue to block patches to rxe since nobody has successfully fixed the issue.

Bob? Is that what we think?

> There is another issue that causes kernel panic.
> [bug report][bisected] rdma_rxe: blktests srp lead kernel panic with 64k page size
> cf. https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/

This is more understandable, and the fix of matching the MTT size to
the PAGE_SIZE seems reasonable to me.

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-04 14:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-09  5:44 [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 1/7] RDMA/rxe: Always defer tasks on responder and completer to workqueue Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 2/7] RDMA/rxe: Make MR functions accessible from other rxe source code Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 3/7] RDMA/rxe: Move resp_states definition to rxe_verbs.h Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 4/7] RDMA/rxe: Add page invalidation support Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 5/7] RDMA/rxe: Allow registering MRs for On-Demand Paging Daisuke Matsuda
2023-11-09 18:49   ` kernel test robot
2023-11-09  5:44 ` [PATCH for-next v7 6/7] RDMA/rxe: Add support for Send/Recv/Write/Read with ODP Daisuke Matsuda
2023-11-09  5:44 ` [PATCH for-next v7 7/7] RDMA/rxe: Add support for the traditional Atomic operations " Daisuke Matsuda
2023-12-05  0:11 ` [PATCH for-next v7 0/7] On-Demand Paging on SoftRoCE Jason Gunthorpe
2023-12-05  1:50   ` Zhu Yanjun
2023-12-07  6:37     ` Daisuke Matsuda (Fujitsu)
2023-12-12 18:07       ` Zhu Yanjun
2023-12-14  5:55         ` Daisuke Matsuda (Fujitsu)
2023-12-15  2:46           ` Zhu Yanjun
2024-01-04 14:56       ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).