public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets
@ 2023-07-27 20:01 Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops Bob Pearson
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

This patch set is a revised version of an older set which implements 
support for nonlinear or fragmented packets. This avoids extra copies
in both the send and receive paths and gives significant performance
improvement for large messages such as are used in storage applications.

This patch set has been heavily tested at large system scale and
demonstrated a 2X improvement in file system read performance on
a 200 Gb/sec network.

The patch set is rebased to the current for-next branch with the
following previous patch sets applied:
	RDMA/rxe: Fix incomplete state save in rxe_requester
	RDMA/rxe: Misc fixes and cleanups
	Enable rcu locking of verbs objects
	RDMA/rxe: Misc cleanups

Bob Pearson (10):
  RDMA/rxe: Add sg fragment ops
  RDMA/rxe: Extend rxe_mr_copy to support skb frags
  RDMA/rxe: Extend copy_data to support skb frags
  RDMA/rxe: Extend rxe_init_packet() to support frags
  RDMA/rxe: Extend rxe_icrc.c to support frags
  RDMA/rxe: Extend rxe_init_req_packet() for frags
  RDMA/rxe: Extend response packets for frags
  RDMA/rxe: Extend send/write_data_in() for frags
  RDMA/rxe: Extend do_read() in rxe_comp.c for frags
  RDMA/rxe: Enable sg code in rxe

 drivers/infiniband/sw/rxe/rxe.c        |   5 +
 drivers/infiniband/sw/rxe/rxe.h        |   3 +
 drivers/infiniband/sw/rxe/rxe_comp.c   |  46 +++-
 drivers/infiniband/sw/rxe/rxe_icrc.c   |  65 ++++-
 drivers/infiniband/sw/rxe/rxe_loc.h    |  27 +-
 drivers/infiniband/sw/rxe/rxe_mr.c     | 348 +++++++++++++++++++------
 drivers/infiniband/sw/rxe/rxe_net.c    | 109 +++++++-
 drivers/infiniband/sw/rxe/rxe_opcode.c |   2 +
 drivers/infiniband/sw/rxe/rxe_recv.c   |   1 +
 drivers/infiniband/sw/rxe/rxe_req.c    |  88 ++++++-
 drivers/infiniband/sw/rxe/rxe_resp.c   | 172 +++++++-----
 drivers/infiniband/sw/rxe/rxe_verbs.h  |   8 +-
 12 files changed, 672 insertions(+), 202 deletions(-)


base-commit: 693e1cdebb50d2aa67406411ca6d5be195d62771
prerequisite-patch-id: c3994e7a93e37e0ce4f50e0c768f3c1a0059a02f
prerequisite-patch-id: 48e13f6ccb560fdeacbd20aaf6696782c23d1190
prerequisite-patch-id: da75fb8eaa863df840e7b392b5048fcc72b0bef3
prerequisite-patch-id: d0877649e2edaf00585a0a6a80391fe0d7bbc13b
prerequisite-patch-id: 6495b1d1f664f8ab91ed9ef9d2ca5b3b27d7df35
prerequisite-patch-id: a6367b8fedd0d8999139c8b857ebbd3ce5c72245
prerequisite-patch-id: 78c95e90a5e49b15b7af8ef57130739c143e88b5
prerequisite-patch-id: 7c65a01066c0418de6897bc8b5f44d078d21b0ec
prerequisite-patch-id: 8ab09f93c23c7875e56c597e69236c30464723b6
prerequisite-patch-id: ca9d84b34873b49048e42fb4c13a2a097c215c46
prerequisite-patch-id: 0f6a587501c8246e1185dfd0cbf5e2044c5f9b13
prerequisite-patch-id: 5246df93137429916d76e75b9a13a4ad5ceb0bad
prerequisite-patch-id: 41b0e4150794dd914d9fcb4cd106fe4cf4227611
prerequisite-patch-id: 02b08ec037bc35b9c7771640c89c66504cdf38a6
prerequisite-patch-id: dfccc06c16454d7fe8e6fcba064d4e471d314666
prerequisite-patch-id: 7459a6e5cdd46efd53ba27f9b3e9028af6e0863b
prerequisite-patch-id: 36d49f9303f5cb276a5601c1ab568eea6eca7d3a
prerequisite-patch-id: 6359a681e40832694f81ca003c10e5327996bf7d
prerequisite-patch-id: 558175db657f374dbd3e0a57ac4c5fb77a56b6c6
prerequisite-patch-id: d6b811de06c8900be5840dd29715161d26db66cf
-- 
2.39.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-28  1:07   ` Zhu Yanjun
  2023-07-27 20:01 ` [PATCH for-next v3 02/10] RDMA/rxe: Extend rxe_mr_copy to support skb frags Bob Pearson
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Rename rxe_mr_copy_dir to rxe_mr_copy_op. This allows
adding new fragment operations later.

This is in preparation for supporting fragmented skbs.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_loc.h   |  4 ++--
 drivers/infiniband/sw/rxe/rxe_mr.c    | 22 +++++++++++-----------
 drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
 drivers/infiniband/sw/rxe/rxe_resp.c  |  6 +++---
 drivers/infiniband/sw/rxe/rxe_verbs.h |  6 +++---
 6 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 5111735aafae..e3f8dfc9b8bf 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -368,7 +368,7 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
 
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, payload_addr(pkt),
-			payload_size(pkt), RXE_TO_MR_OBJ);
+			payload_size(pkt), RXE_COPY_TO_MR);
 	if (ret) {
 		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
@@ -390,7 +390,7 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
 
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, &atomic_orig,
-			sizeof(u64), RXE_TO_MR_OBJ);
+			sizeof(u64), RXE_COPY_TO_MR);
 	if (ret) {
 		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index cf38f4dcff78..532026cdd49e 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -64,9 +64,9 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
 int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
 int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
-		unsigned int length, enum rxe_mr_copy_dir dir);
+		unsigned int length, enum rxe_mr_copy_op op);
 int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
-	      void *addr, int length, enum rxe_mr_copy_dir dir);
+	      void *addr, int length, enum rxe_mr_copy_op op);
 int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		  int sg_nents, unsigned int *sg_offset);
 int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index f54042e9aeb2..812c85cad463 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -243,7 +243,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
 }
 
 static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
-			      unsigned int length, enum rxe_mr_copy_dir dir)
+			      unsigned int length, enum rxe_mr_copy_op op)
 {
 	unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
 	unsigned long index = rxe_mr_iova_to_index(mr, iova);
@@ -259,7 +259,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 		bytes = min_t(unsigned int, length,
 				mr_page_size(mr) - page_offset);
 		va = kmap_local_page(page);
-		if (dir == RXE_FROM_MR_OBJ)
+		if (op == RXE_COPY_FROM_MR)
 			memcpy(addr, va + page_offset, bytes);
 		else
 			memcpy(va + page_offset, addr, bytes);
@@ -275,7 +275,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 }
 
 static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
-			    unsigned int length, enum rxe_mr_copy_dir dir)
+			    unsigned int length, enum rxe_mr_copy_op op)
 {
 	unsigned int page_offset = dma_addr & (PAGE_SIZE - 1);
 	unsigned int bytes;
@@ -288,10 +288,10 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
 				PAGE_SIZE - page_offset);
 		va = kmap_local_page(page);
 
-		if (dir == RXE_TO_MR_OBJ)
-			memcpy(va + page_offset, addr, bytes);
-		else
+		if (op == RXE_COPY_FROM_MR)
 			memcpy(addr, va + page_offset, bytes);
+		else
+			memcpy(va + page_offset, addr, bytes);
 
 		kunmap_local(va);
 		page_offset = 0;
@@ -302,7 +302,7 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
 }
 
 int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
-		unsigned int length, enum rxe_mr_copy_dir dir)
+		unsigned int length, enum rxe_mr_copy_op op)
 {
 	int err;
 
@@ -313,7 +313,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 		return -EINVAL;
 
 	if (mr->ibmr.type == IB_MR_TYPE_DMA) {
-		rxe_mr_copy_dma(mr, iova, addr, length, dir);
+		rxe_mr_copy_dma(mr, iova, addr, length, op);
 		return 0;
 	}
 
@@ -323,7 +323,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 		return err;
 	}
 
-	return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
+	return rxe_mr_copy_xarray(mr, iova, addr, length, op);
 }
 
 /* copy data in or out of a wqe, i.e. sg list
@@ -335,7 +335,7 @@ int copy_data(
 	struct rxe_dma_info	*dma,
 	void			*addr,
 	int			length,
-	enum rxe_mr_copy_dir	dir)
+	enum rxe_mr_copy_op	op)
 {
 	int			bytes;
 	struct rxe_sge		*sge	= &dma->sge[dma->cur_sge];
@@ -395,7 +395,7 @@ int copy_data(
 
 		if (bytes > 0) {
 			iova = sge->addr + offset;
-			err = rxe_mr_copy(mr, iova, addr, bytes, dir);
+			err = rxe_mr_copy(mr, iova, addr, bytes, op);
 			if (err)
 				goto err2;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 51b781ac2844..f3653234cf32 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -327,7 +327,7 @@ static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 		wqe->dma.sge_offset += payload;
 	} else {
 		err = copy_data(qp->pd, 0, &wqe->dma, payload_addr(pkt),
-				payload, RXE_FROM_MR_OBJ);
+				payload, RXE_COPY_FROM_MR);
 	}
 
 	return err;
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 8a25c56dfd86..596615c515ad 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -565,7 +565,7 @@ static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
 	int err;
 
 	err = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma,
-			data_addr, data_len, RXE_TO_MR_OBJ);
+			data_addr, data_len, RXE_COPY_TO_MR);
 	if (unlikely(err))
 		return (err == -ENOSPC) ? RESPST_ERR_LENGTH
 					: RESPST_ERR_MALFORMED_WQE;
@@ -581,7 +581,7 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
 	int data_len = payload_size(pkt);
 
 	err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
-			  payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
+			  payload_addr(pkt), data_len, RXE_COPY_TO_MR);
 	if (err) {
 		rc = RESPST_ERR_RKEY_VIOLATION;
 		goto out;
@@ -928,7 +928,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	}
 
 	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
-			  payload, RXE_FROM_MR_OBJ);
+			  payload, RXE_COPY_FROM_MR);
 	if (err) {
 		kfree_skb(skb);
 		state = RESPST_ERR_RKEY_VIOLATION;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index ccb9d19ffe8a..d9c44bd30da4 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -275,9 +275,9 @@ enum rxe_mr_state {
 	RXE_MR_STATE_VALID,
 };
 
-enum rxe_mr_copy_dir {
-	RXE_TO_MR_OBJ,
-	RXE_FROM_MR_OBJ,
+enum rxe_mr_copy_op {
+	RXE_COPY_TO_MR,
+	RXE_COPY_FROM_MR,
 };
 
 enum rxe_mr_lookup_type {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 02/10] RDMA/rxe: Extend rxe_mr_copy to support skb frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 03/10] RDMA/rxe: Extend copy_data " Bob Pearson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

rxe_mr_copy() currently supports copying between an mr and
a contiguous region of kernel memory.

Rename rxe_mr_copy() to rxe_copy_mr_data().
Extend the operations to support copying between an mr and an skb
fragment list. Fixup calls to rxe_mr_copy() to support the new API.
Add two APIs rxe_add_frag() and rxe_num_mr_frags() to add a fragment
to and skb and count the total number of fragments needed.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h   |  10 +-
 drivers/infiniband/sw/rxe/rxe_mr.c    | 170 +++++++++++++++++++++++---
 drivers/infiniband/sw/rxe/rxe_resp.c  |  14 ++-
 drivers/infiniband/sw/rxe/rxe_verbs.h |   2 +
 4 files changed, 173 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 532026cdd49e..77661e0ccab7 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -62,11 +62,15 @@ void rxe_mr_init_dma(int access, struct rxe_mr *mr);
 int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
 		     int access, struct rxe_mr *mr);
 int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
-int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
-int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
-		unsigned int length, enum rxe_mr_copy_op op);
+int rxe_add_frag(struct sk_buff *skb, struct rxe_mr *mr, struct page *page,
+		 unsigned int length, unsigned int offset);
+int rxe_num_mr_frags(struct rxe_mr *mr, u64 iova, unsigned int length);
+int rxe_copy_mr_data(struct sk_buff *skb, struct rxe_mr *mr, u64 iova,
+		     void *addr, unsigned int skb_offset,
+		     unsigned int length, enum rxe_mr_copy_op op);
 int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
 	      void *addr, int length, enum rxe_mr_copy_op op);
+int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
 int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		  int sg_nents, unsigned int *sg_offset);
 int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 812c85cad463..2667e8129a1d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -242,7 +242,79 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
 	return ib_sg_to_pages(ibmr, sgl, sg_nents, sg_offset, rxe_set_page);
 }
 
-static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
+/**
+ * rxe_add_frag() - Add a frag to a nonlinear packet
+ * @skb: The packet buffer
+ * @page: The page
+ * @mr: The memory region
+ * @length: Length of fragment
+ * @offset: Offset of fragment in page
+ *
+ * Caller must verify that the fragment is contained in the page.
+ * Caller should verify that the number of fragments does not
+ * exceed MAX_SKB_FRAGS
+ *
+ * Returns: 0 on success else a negative errno
+ */
+int rxe_add_frag(struct sk_buff *skb, struct rxe_mr *mr, struct page *page,
+		 unsigned int length, unsigned int offset)
+{
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	skb_frag_t *frag = &skb_shinfo(skb)->frags[nr_frags];
+
+	if (nr_frags >= MAX_SKB_FRAGS) {
+		rxe_dbg_mr(mr, "ran out of frags");
+		return -EINVAL;
+	}
+
+	frag->bv_len = length;
+	frag->bv_offset = offset;
+	frag->bv_page = page;
+	/* because kfree_skb will call put_page() */
+	get_page(page);
+	skb_shinfo(skb)->nr_frags++;
+
+	skb->data_len += length;
+	skb->len += length;
+
+	return 0;
+}
+
+/**
+ * rxe_num_mr_frags() - Compute the number of skb frags needed to copy
+ *			length bytes from an mr to an skb frag list.
+ * @mr: mr to copy data from
+ * @iova: iova in memory region as starting point
+ * @length: number of bytes to transfer
+ *
+ * Returns: the number of frags needed
+ */
+int rxe_num_mr_frags(struct rxe_mr *mr, u64 iova, unsigned int length)
+{
+	unsigned int page_size;
+	unsigned int page_offset;
+	unsigned int bytes;
+	int num_frags = 0;
+
+	if (mr->ibmr.type == IB_MR_TYPE_DMA)
+		page_size = PAGE_SIZE;
+	else
+		page_size = mr_page_size(mr);
+	page_offset = iova & (page_size - 1);
+
+	while (length) {
+		bytes = min_t(unsigned int, length,
+				page_size - page_offset);
+		length -= bytes;
+		page_offset = 0;
+		num_frags++;
+	}
+
+	return num_frags;
+}
+
+static int rxe_mr_copy_xarray(struct sk_buff *skb, struct rxe_mr *mr,
+			      u64 iova, void *addr, unsigned int skb_offset,
 			      unsigned int length, enum rxe_mr_copy_op op)
 {
 	unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
@@ -250,6 +322,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 	unsigned int bytes;
 	struct page *page;
 	void *va;
+	int err = 0;
 
 	while (length) {
 		page = xa_load(&mr->page_list, index);
@@ -258,12 +331,32 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 
 		bytes = min_t(unsigned int, length,
 				mr_page_size(mr) - page_offset);
-		va = kmap_local_page(page);
-		if (op == RXE_COPY_FROM_MR)
+		switch (op) {
+		case RXE_COPY_FROM_MR:
+			va = kmap_local_page(page);
 			memcpy(addr, va + page_offset, bytes);
-		else
+			kunmap_local(va);
+			break;
+		case RXE_COPY_TO_MR:
+			va = kmap_local_page(page);
 			memcpy(va + page_offset, addr, bytes);
-		kunmap_local(va);
+			kunmap_local(va);
+			break;
+		case RXE_FRAG_TO_MR:
+			va = kmap_local_page(page);
+			err = skb_copy_bits(skb, skb_offset,
+					    va + page_offset, bytes);
+			kunmap_local(va);
+			skb_offset += bytes;
+			break;
+		case RXE_FRAG_FROM_MR:
+			err = rxe_add_frag(skb, mr, page, bytes,
+					   page_offset);
+			break;
+		}
+
+		if (err)
+			return err;
 
 		page_offset = 0;
 		addr += bytes;
@@ -274,13 +367,15 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
 	return 0;
 }
 
-static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
-			    unsigned int length, enum rxe_mr_copy_op op)
+static int rxe_mr_copy_dma(struct sk_buff *skb, struct rxe_mr *mr,
+			   u64 dma_addr, void *addr, unsigned int skb_offset,
+			   unsigned int length, enum rxe_mr_copy_op op)
 {
 	unsigned int page_offset = dma_addr & (PAGE_SIZE - 1);
 	unsigned int bytes;
 	struct page *page;
 	u8 *va;
+	int err = 0;
 
 	while (length) {
 		page = ib_virt_dma_to_page(dma_addr);
@@ -288,10 +383,32 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
 				PAGE_SIZE - page_offset);
 		va = kmap_local_page(page);
 
-		if (op == RXE_COPY_FROM_MR)
+		switch (op) {
+		case RXE_COPY_FROM_MR:
+			va = kmap_local_page(page);
 			memcpy(addr, va + page_offset, bytes);
-		else
+			kunmap_local(va);
+			break;
+		case RXE_COPY_TO_MR:
+			va = kmap_local_page(page);
 			memcpy(va + page_offset, addr, bytes);
+			kunmap_local(va);
+			break;
+		case RXE_FRAG_TO_MR:
+			va = kmap_local_page(page);
+			err = skb_copy_bits(skb, skb_offset,
+					    va + page_offset, bytes);
+			kunmap_local(va);
+			skb_offset += bytes;
+			break;
+		case RXE_FRAG_FROM_MR:
+			err = rxe_add_frag(skb, mr, page, bytes,
+					   page_offset);
+			break;
+		}
+
+		if (err)
+			return err;
 
 		kunmap_local(va);
 		page_offset = 0;
@@ -299,10 +416,31 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
 		addr += bytes;
 		length -= bytes;
 	}
+
+	return 0;
 }
 
-int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
-		unsigned int length, enum rxe_mr_copy_op op)
+/**
+ * rxe_copy_mr_data() - transfer data between an MR and a packet
+ * @skb: the packet buffer
+ * @mr: the MR
+ * @iova: the address in the MR
+ * @addr: the address in the packet (TO/FROM MR only)
+ * @length: the length to transfer
+ * @op: copy operation (TO MR, FROM MR or FRAG MR)
+ *
+ * Copy data from a range (addr, addr+length-1) in a packet
+ * to or from a range in an MR object at (iova, iova+length-1).
+ * Or, build a frag list referencing the MR range.
+ *
+ * Caller must verify that the access permissions support the
+ * operation.
+ *
+ * Returns: 0 on success or an error
+ */
+int rxe_copy_mr_data(struct sk_buff *skb, struct rxe_mr *mr, u64 iova,
+		     void *addr, unsigned int skb_offset,
+		     unsigned int length, enum rxe_mr_copy_op op)
 {
 	int err;
 
@@ -313,8 +451,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 		return -EINVAL;
 
 	if (mr->ibmr.type == IB_MR_TYPE_DMA) {
-		rxe_mr_copy_dma(mr, iova, addr, length, op);
-		return 0;
+		return rxe_mr_copy_dma(skb, mr, iova, addr, skb_offset,
+				       length, op);
 	}
 
 	err = mr_check_range(mr, iova, length);
@@ -323,7 +461,8 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
 		return err;
 	}
 
-	return rxe_mr_copy_xarray(mr, iova, addr, length, op);
+	return rxe_mr_copy_xarray(skb, mr, iova, addr, skb_offset,
+				  length, op);
 }
 
 /* copy data in or out of a wqe, i.e. sg list
@@ -395,7 +534,8 @@ int copy_data(
 
 		if (bytes > 0) {
 			iova = sge->addr + offset;
-			err = rxe_mr_copy(mr, iova, addr, bytes, op);
+			err = rxe_copy_mr_data(NULL, mr, iova, addr,
+					       0, bytes, op);
 			if (err)
 				goto err2;
 
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 596615c515ad..87d61a462ff5 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -576,12 +576,15 @@ static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
 static enum resp_states write_data_in(struct rxe_qp *qp,
 				      struct rxe_pkt_info *pkt)
 {
+	struct sk_buff *skb = PKT_TO_SKB(pkt);
 	enum resp_states rc = RESPST_NONE;
-	int	err;
 	int data_len = payload_size(pkt);
+	int err;
+	int skb_offset = 0;
 
-	err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
-			  payload_addr(pkt), data_len, RXE_COPY_TO_MR);
+	err = rxe_copy_mr_data(skb, qp->resp.mr, qp->resp.va + qp->resp.offset,
+			  payload_addr(pkt), skb_offset, data_len,
+			  RXE_COPY_TO_MR);
 	if (err) {
 		rc = RESPST_ERR_RKEY_VIOLATION;
 		goto out;
@@ -876,6 +879,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	int err;
 	struct resp_res *res = qp->resp.res;
 	struct rxe_mr *mr;
+	unsigned int skb_offset = 0;
 	u8 *pad_addr;
 
 	if (!res) {
@@ -927,8 +931,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 		goto err_out;
 	}
 
-	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
-			  payload, RXE_COPY_FROM_MR);
+	err = rxe_copy_mr_data(skb, mr, res->read.va, payload_addr(&ack_pkt),
+			       skb_offset, payload, RXE_COPY_FROM_MR);
 	if (err) {
 		kfree_skb(skb);
 		state = RESPST_ERR_RKEY_VIOLATION;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
index d9c44bd30da4..89cf50b938c2 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.h
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
@@ -278,6 +278,8 @@ enum rxe_mr_state {
 enum rxe_mr_copy_op {
 	RXE_COPY_TO_MR,
 	RXE_COPY_FROM_MR,
+	RXE_FRAG_TO_MR,
+	RXE_FRAG_FROM_MR,
 };
 
 enum rxe_mr_lookup_type {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 03/10] RDMA/rxe: Extend copy_data to support skb frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 02/10] RDMA/rxe: Extend rxe_mr_copy to support skb frags Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 04/10] RDMA/rxe: Extend rxe_init_packet() to support frags Bob Pearson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

copy_data() currently supports copying between an mr and
the scatter-gather list of a wqe.

Rename copy_data() to rxe_copy_dma_data().
Extend the operations to support copying between a sg list and an skb
fragment list. Fixup calls to copy_data() to support the new API.
Add a routine to count the number of skbs required for
rxe_copy_dma_data().

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c |  17 ++-
 drivers/infiniband/sw/rxe/rxe_loc.h  |  10 +-
 drivers/infiniband/sw/rxe/rxe_mr.c   | 175 +++++++++++++++++----------
 drivers/infiniband/sw/rxe/rxe_req.c  |  11 +-
 drivers/infiniband/sw/rxe/rxe_resp.c |   7 +-
 5 files changed, 139 insertions(+), 81 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index e3f8dfc9b8bf..670ee08f6f5a 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -364,11 +364,14 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
 				      struct rxe_pkt_info *pkt,
 				      struct rxe_send_wqe *wqe)
 {
+	struct sk_buff *skb = PKT_TO_SKB(pkt);
+	int skb_offset = 0;
 	int ret;
 
-	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
-			&wqe->dma, payload_addr(pkt),
-			payload_size(pkt), RXE_COPY_TO_MR);
+	ret = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
+				&wqe->dma, payload_addr(pkt),
+				skb_offset, payload_size(pkt),
+				RXE_COPY_TO_MR);
 	if (ret) {
 		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
@@ -384,13 +387,15 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
 					struct rxe_pkt_info *pkt,
 					struct rxe_send_wqe *wqe)
 {
+	struct sk_buff *skb = NULL;
+	int skb_offset = 0;
 	int ret;
 
 	u64 atomic_orig = atmack_orig(pkt);
 
-	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
-			&wqe->dma, &atomic_orig,
-			sizeof(u64), RXE_COPY_TO_MR);
+	ret = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
+				&wqe->dma, &atomic_orig,
+				skb_offset, sizeof(u64), RXE_COPY_TO_MR);
 	if (ret) {
 		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 77661e0ccab7..fad853199b4d 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -68,15 +68,19 @@ int rxe_num_mr_frags(struct rxe_mr *mr, u64 iova, unsigned int length);
 int rxe_copy_mr_data(struct sk_buff *skb, struct rxe_mr *mr, u64 iova,
 		     void *addr, unsigned int skb_offset,
 		     unsigned int length, enum rxe_mr_copy_op op);
-int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
-	      void *addr, int length, enum rxe_mr_copy_op op);
+int rxe_num_dma_frags(const struct rxe_pd *pd, const struct rxe_dma_info *dma,
+		      unsigned int length);
+int rxe_copy_dma_data(struct sk_buff *skb, struct rxe_pd *pd, int access,
+		      struct rxe_dma_info *dma, void *addr,
+		      unsigned int skb_offset, unsigned int length,
+		      enum rxe_mr_copy_op op);
 int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
 int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
 		  int sg_nents, unsigned int *sg_offset);
 int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
 			u64 compare, u64 swap_add, u64 *orig_val);
 int rxe_mr_do_atomic_write(struct rxe_mr *mr, u64 iova, u64 value);
-struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
+struct rxe_mr *lookup_mr(const struct rxe_pd *pd, int access, u32 key,
 			 enum rxe_mr_lookup_type type);
 int mr_check_range(struct rxe_mr *mr, u64 iova, size_t length);
 int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 2667e8129a1d..0ac71238599a 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -313,6 +313,63 @@ int rxe_num_mr_frags(struct rxe_mr *mr, u64 iova, unsigned int length)
 	return num_frags;
 }
 
+/**
+ * rxe_num_dma_frags() - Count the number of skb frags needed to copy
+ *			 length bytes from a dma info struct to an skb
+ * @pd: protection domain used by dma entries
+ * @dma: dma info
+ * @length: number of bytes to copy
+ *
+ * Returns: number of frags needed
+ */
+int rxe_num_dma_frags(const struct rxe_pd *pd, const struct rxe_dma_info *dma,
+		      unsigned int length)
+{
+	unsigned int cur_sge = dma->cur_sge;
+	const struct rxe_sge *sge = &dma->sge[cur_sge];
+	unsigned int offset = dma->sge_offset;
+	struct rxe_mr *mr = NULL;
+	unsigned int bytes;
+	u64 iova;
+	int num_frags = 0;
+
+	if (WARN_ON(length > dma->resid))
+		return 0;
+
+	while (length) {
+		if (offset >= sge->length) {
+			if (mr)
+				rxe_put(mr);
+			sge++;
+			cur_sge++;
+			offset = 0;
+
+			if (WARN_ON(cur_sge >= dma->num_sge))
+				return 0;
+			if (!sge->length)
+				continue;
+		}
+
+		mr = lookup_mr(pd, 0, sge->lkey, RXE_LOOKUP_LOCAL);
+		if (WARN_ON(!mr))
+			return 0;
+
+		bytes = min_t(unsigned int, length,
+				sge->length - offset);
+		if (bytes) {
+			iova = sge->addr + offset;
+			num_frags += rxe_num_mr_frags(mr, iova, length);
+			offset += bytes;
+			length -= bytes;
+		}
+	}
+
+	if (mr)
+		rxe_put(mr);
+
+	return num_frags;
+}
+
 static int rxe_mr_copy_xarray(struct sk_buff *skb, struct rxe_mr *mr,
 			      u64 iova, void *addr, unsigned int skb_offset,
 			      unsigned int length, enum rxe_mr_copy_op op)
@@ -465,99 +522,85 @@ int rxe_copy_mr_data(struct sk_buff *skb, struct rxe_mr *mr, u64 iova,
 				  length, op);
 }
 
-/* copy data in or out of a wqe, i.e. sg list
- * under the control of a dma descriptor
+/**
+ * rxe_copy_dma_data() - transfer data between a packet and a wqe
+ * @skb: packet buffer (FRAG MR only)
+ * @pd: PD which MRs must match
+ * @access: access permission for MRs in sge (TO MR only)
+ * @dma: dma info from a wqe
+ * @addr: payload address in packet (TO/FROM MR only)
+ * @skb_offset: offset of data in skb (RXE_FRAG_TO_MR only)
+ * @length: payload length
+ * @op: copy operation (RXE_COPY_TO/FROM_MR or RXE_FRAG_TO/FROM_MR)
+ *
+ * Iterate over scatter/gather list in dma info starting from the
+ * current location until the payload length is used up and for each
+ * entry copy or build a frag list referencing the MR obtained from
+ * the lkey in the sge. This routine is called once for each packet
+ * sent or received to/from the wqe.
+ *
+ * Returns: 0 on success or an error
  */
-int copy_data(
-	struct rxe_pd		*pd,
-	int			access,
-	struct rxe_dma_info	*dma,
-	void			*addr,
-	int			length,
-	enum rxe_mr_copy_op	op)
+int rxe_copy_dma_data(struct sk_buff *skb, struct rxe_pd *pd, int access,
+		      struct rxe_dma_info *dma, void *addr,
+		      unsigned int skb_offset, unsigned int length,
+		      enum rxe_mr_copy_op op)
 {
-	int			bytes;
-	struct rxe_sge		*sge	= &dma->sge[dma->cur_sge];
-	int			offset	= dma->sge_offset;
-	int			resid	= dma->resid;
-	struct rxe_mr		*mr	= NULL;
-	u64			iova;
-	int			err;
+	struct rxe_sge *sge = &dma->sge[dma->cur_sge];
+	unsigned int offset = dma->sge_offset;
+	unsigned int resid = dma->resid;
+	struct rxe_mr *mr = NULL;
+	unsigned int bytes;
+	u64 iova;
+	int err = 0;
 
 	if (length == 0)
 		return 0;
 
-	if (length > resid) {
-		err = -EINVAL;
-		goto err2;
-	}
-
-	if (sge->length && (offset < sge->length)) {
-		mr = lookup_mr(pd, access, sge->lkey, RXE_LOOKUP_LOCAL);
-		if (!mr) {
-			err = -EINVAL;
-			goto err1;
-		}
-	}
-
-	while (length > 0) {
-		bytes = length;
+	if (length > resid)
+		return -EINVAL;
 
+	while (length) {
 		if (offset >= sge->length) {
-			if (mr) {
+			if (mr)
 				rxe_put(mr);
-				mr = NULL;
-			}
+
 			sge++;
 			dma->cur_sge++;
 			offset = 0;
 
-			if (dma->cur_sge >= dma->num_sge) {
-				err = -ENOSPC;
-				goto err2;
-			}
-
-			if (sge->length) {
-				mr = lookup_mr(pd, access, sge->lkey,
-					       RXE_LOOKUP_LOCAL);
-				if (!mr) {
-					err = -EINVAL;
-					goto err1;
-				}
-			} else {
+			if (dma->cur_sge >= dma->num_sge)
+				return -EINVAL;
+			if (!sge->length)
 				continue;
-			}
 		}
 
-		if (bytes > sge->length - offset)
-			bytes = sge->length - offset;
+		mr = lookup_mr(pd, access, sge->lkey, RXE_LOOKUP_LOCAL);
+		if (!mr)
+			return -EINVAL;
 
+		bytes = min_t(int, length, sge->length - offset);
 		if (bytes > 0) {
 			iova = sge->addr + offset;
-			err = rxe_copy_mr_data(NULL, mr, iova, addr,
-					       0, bytes, op);
+			err = rxe_copy_mr_data(skb, mr, iova, addr,
+					       skb_offset, bytes, op);
 			if (err)
-				goto err2;
+				goto err_put;
 
-			offset	+= bytes;
-			resid	-= bytes;
-			length	-= bytes;
-			addr	+= bytes;
+			addr += bytes;
+			offset += bytes;
+			skb_offset += bytes;
+			resid -= bytes;
+			length -= bytes;
 		}
 	}
 
 	dma->sge_offset = offset;
-	dma->resid	= resid;
-
-	if (mr)
-		rxe_put(mr);
-
-	return 0;
+	dma->resid = resid;
 
-err2:
+err_put:
 	if (mr)
 		rxe_put(mr);
-err1:
 	return err;
 }
 
@@ -753,7 +796,7 @@ int advance_dma_data(struct rxe_dma_info *dma, unsigned int length)
 	return 0;
 }
 
-struct rxe_mr *lookup_mr(struct rxe_pd *pd, int access, u32 key,
+struct rxe_mr *lookup_mr(const struct rxe_pd *pd, int access, u32 key,
 			 enum rxe_mr_lookup_type type)
 {
 	struct rxe_mr *mr;
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index f3653234cf32..525e704c12c2 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -315,8 +315,10 @@ static void rxe_init_roce_hdrs(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 }
 
 static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-			    struct rxe_pkt_info *pkt, u32 payload)
+			    struct rxe_pkt_info *pkt, u32 payload,
+			    struct sk_buff *skb)
 {
+	int skb_offset = 0;
 	void *data;
 	int err = 0;
 
@@ -326,8 +328,9 @@ static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 		wqe->dma.resid -= payload;
 		wqe->dma.sge_offset += payload;
 	} else {
-		err = copy_data(qp->pd, 0, &wqe->dma, payload_addr(pkt),
-				payload, RXE_COPY_FROM_MR);
+		err = rxe_copy_dma_data(skb, qp->pd, 0, &wqe->dma,
+					payload_addr(pkt), skb_offset,
+					payload, RXE_COPY_FROM_MR);
 	}
 
 	return err;
@@ -379,7 +382,7 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 
 	/* init payload if any */
 	if (pkt->mask & RXE_WRITE_OR_SEND_MASK) {
-		err = rxe_init_payload(qp, wqe, pkt, payload);
+		err = rxe_init_payload(qp, wqe, pkt, payload, skb);
 		if (unlikely(err))
 			goto err_out;
 	} else if (pkt->mask & RXE_FLUSH_MASK) {
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 87d61a462ff5..a6c1d67ad943 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -562,10 +562,13 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
 				     int data_len)
 {
+	struct sk_buff *skb = NULL;
+	int skb_offset = 0;
 	int err;
 
-	err = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma,
-			data_addr, data_len, RXE_COPY_TO_MR);
+	err = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
+				&qp->resp.wqe->dma, data_addr,
+				skb_offset, data_len, RXE_COPY_TO_MR);
 	if (unlikely(err))
 		return (err == -ENOSPC) ? RESPST_ERR_LENGTH
 					: RESPST_ERR_MALFORMED_WQE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 04/10] RDMA/rxe: Extend rxe_init_packet() to support frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (2 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 03/10] RDMA/rxe: Extend copy_data " Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c " Bob Pearson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Add a subroutine rxe_can_use_sg() to determine if a packet is
a candidate for a fragmented skb. Add a global variable rxe_use_sg
to control whether to support nonlinear skbs. Modify rxe_init_packet()
to test if the packet should use a fragmented skb. Fixup calls to
rxe_init_packet() to use the new API but disable creating nonlinear
skbs for now.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c      |  5 +++
 drivers/infiniband/sw/rxe/rxe.h      |  3 ++
 drivers/infiniband/sw/rxe/rxe_loc.h  |  4 +-
 drivers/infiniband/sw/rxe/rxe_net.c  | 58 ++++++++++++++++++++++++++--
 drivers/infiniband/sw/rxe/rxe_req.c  |  4 +-
 drivers/infiniband/sw/rxe/rxe_resp.c |  4 +-
 6 files changed, 66 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 6b55c595f8f8..800e8c0d437d 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -13,6 +13,11 @@ MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
 MODULE_DESCRIPTION("Soft RDMA transport");
 MODULE_LICENSE("Dual BSD/GPL");
 
+/* if true allow using fragmented skbs */
+bool rxe_use_sg;
+module_param_named(use_sg, rxe_use_sg, bool, 0444);
+MODULE_PARM_DESC(use_sg, "Support skb frags; default false");
+
 /* free resources for a rxe device all objects created for this device must
  * have been destroyed
  */
diff --git a/drivers/infiniband/sw/rxe/rxe.h b/drivers/infiniband/sw/rxe/rxe.h
index 077e3ad8f39a..b334eda62c36 100644
--- a/drivers/infiniband/sw/rxe/rxe.h
+++ b/drivers/infiniband/sw/rxe/rxe.h
@@ -30,6 +30,9 @@
 #include "rxe_verbs.h"
 #include "rxe_loc.h"
 
+/* if true allow using fragmented skbs */
+extern bool rxe_use_sg;
+
 /*
  * Version 1 and Version 2 are identical on 64 bit machines, but on 32 bit
  * machines Version 2 has a different struct layout.
diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index fad853199b4d..96b1fb79610a 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -97,8 +97,8 @@ struct rxe_mw *rxe_lookup_mw(struct rxe_qp *qp, int access, u32 rkey);
 void rxe_mw_cleanup(struct rxe_pool_elem *elem);
 
 /* rxe_net.c */
-struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
-				struct rxe_pkt_info *pkt);
+struct sk_buff *rxe_init_packet(struct rxe_qp *qp, struct rxe_av *av,
+				struct rxe_pkt_info *pkt, bool *is_frag);
 int rxe_prepare(struct rxe_av *av, struct rxe_pkt_info *pkt,
 		struct sk_buff *skb);
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index 94e347a7f386..c44ef39010f1 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -510,12 +510,47 @@ int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
 	return err;
 }
 
-struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
-				struct rxe_pkt_info *pkt)
+/**
+ * rxe_can_use_sg() - determine if packet is a candidate for fragmenting
+ * @rxe: the rxe device
+ * @pkt: packet info
+ *
+ * Limit to packets with:
+ *	rxe_use_sg set
+ *	ndev supports SG
+ *
+ * Returns: true if conditions are met else 0
+ */
+static bool rxe_can_use_sg(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
+{
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
+
+	return (rxe_use_sg && (rxe->ndev->features & NETIF_F_SG));
+}
+
+/* must be big enough to hold MAC+IPV6+UDP+ROCE headers */
+#define RXE_MIN_SKB_SIZE (256)
+
+/**
+ * rxe_init_packet - allocate and initialize a new skb
+ * @qp: the queue pair
+ * @av: remote address vector
+ * @pkt: packet info
+ * @frag: optional return value for fragmented skb
+ *	  on call if frag == NULL do not use fragmented skb
+ *	  on return if not NULL set *frag to 1
+ *	  if packet will be fragmented else 0
+ *
+ * Returns: an skb on success else NULL
+ */
+struct sk_buff *rxe_init_packet(struct rxe_qp *qp, struct rxe_av *av,
+				struct rxe_pkt_info *pkt, bool *frag)
 {
+	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	unsigned int hdr_len;
 	struct sk_buff *skb = NULL;
 	struct net_device *ndev = rxe->ndev;
+	int skb_size;
 
 	if (av->network_type == RXE_NETWORK_TYPE_IPV4)
 		hdr_len = ETH_HLEN + sizeof(struct udphdr) +
@@ -524,8 +559,18 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 		hdr_len = ETH_HLEN + sizeof(struct udphdr) +
 			sizeof(struct ipv6hdr);
 
-	skb = alloc_skb(pkt->paylen + hdr_len + LL_RESERVED_SPACE(ndev),
-			GFP_ATOMIC);
+	skb_size = LL_RESERVED_SPACE(ndev) + hdr_len + pkt->paylen;
+	if (frag) {
+		if (rxe_can_use_sg(qp, pkt) &&
+		    (skb_size > RXE_MIN_SKB_SIZE)) {
+			skb_size = RXE_MIN_SKB_SIZE;
+			*frag = true;
+		} else {
+			*frag = false;
+		}
+	}
+
+	skb = alloc_skb(skb_size, GFP_ATOMIC);
 	if (unlikely(!skb))
 		goto out;
 
@@ -539,6 +584,11 @@ struct sk_buff *rxe_init_packet(struct rxe_dev *rxe, struct rxe_av *av,
 	else
 		skb->protocol = htons(ETH_P_IPV6);
 
+	if (frag && *frag)
+		pkt->hdr = skb_put(skb, rxe_opcode[pkt->opcode].length);
+	else
+		pkt->hdr = skb_put(skb, pkt->paylen);
+
 out:
 	return skb;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 525e704c12c2..491360fef346 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -369,14 +369,12 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 			pkt->pad + RXE_ICRC_SIZE;
 
 	/* init skb */
-	skb = rxe_init_packet(rxe, av, pkt);
+	skb = rxe_init_packet(qp, av, pkt, NULL);
 	if (unlikely(!skb)) {
 		err = -ENOMEM;
 		goto err_out;
 	}
 
-	pkt->hdr = skb_put(skb, pkt->paylen);
-
 	/* init roce headers */
 	rxe_init_roce_hdrs(qp, wqe, pkt);
 
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index a6c1d67ad943..254f2eab8d20 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -788,12 +788,10 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	ack->paylen = rxe_opcode[opcode].length + payload +
 			ack->pad + RXE_ICRC_SIZE;
 
-	skb = rxe_init_packet(rxe, &qp->pri_av, ack);
+	skb = rxe_init_packet(qp, &qp->pri_av, ack, NULL);
 	if (!skb)
 		return NULL;
 
-	ack->hdr = skb_put(skb, ack->paylen);
-
 	bth_init(ack, opcode, 0, 0, ack->pad, IB_DEFAULT_PKEY_FULL,
 		 qp->attr.dest_qp_num, 0, psn);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c to support frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (3 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 04/10] RDMA/rxe: Extend rxe_init_packet() to support frags Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-28 14:20   ` Zhu Yanjun
  2023-07-27 20:01 ` [PATCH for-next v3 06/10] RDMA/rxe: Extend rxe_init_req_packet() for frags Bob Pearson
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Extend the subroutines rxe_icrc_generate() and rxe_icrc_check()
to support skb frags.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_icrc.c | 65 ++++++++++++++++++++++++----
 drivers/infiniband/sw/rxe/rxe_net.c  | 51 +++++++++++++++++-----
 drivers/infiniband/sw/rxe/rxe_recv.c |  1 +
 3 files changed, 98 insertions(+), 19 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_icrc.c b/drivers/infiniband/sw/rxe/rxe_icrc.c
index c9aa0995e900..393391863350 100644
--- a/drivers/infiniband/sw/rxe/rxe_icrc.c
+++ b/drivers/infiniband/sw/rxe/rxe_icrc.c
@@ -63,7 +63,7 @@ static __be32 rxe_crc32(struct rxe_dev *rxe, __be32 crc, void *next, size_t len)
 
 /**
  * rxe_icrc_hdr() - Compute the partial ICRC for the network and transport
- *		  headers of a packet.
+ *		    headers of a packet.
  * @skb: packet buffer
  * @pkt: packet information
  *
@@ -129,6 +129,56 @@ static __be32 rxe_icrc_hdr(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	return crc;
 }
 
+/**
+ * rxe_icrc_payload() - Compute the ICRC for a packet payload and also
+ *			compute the address of the icrc in the packet.
+ * @skb: packet buffer
+ * @pkt: packet information
+ * @icrc: current icrc i.e. including headers
+ * @icrcp: returned pointer to icrc in skb
+ *
+ * Return: 0 if the values match else an error
+ */
+static __be32 rxe_icrc_payload(struct sk_buff *skb, struct rxe_pkt_info *pkt,
+			       __be32 icrc, __be32 **icrcp)
+{
+	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	skb_frag_t *frag;
+	u8 *addr;
+	int hdr_len;
+	int len;
+	int i;
+
+	/* handle any payload left in the linear buffer */
+	hdr_len = rxe_opcode[pkt->opcode].length;
+	addr = pkt->hdr + hdr_len;
+	len = skb_tail_pointer(skb) - skb_transport_header(skb)
+		- sizeof(struct udphdr) - hdr_len;
+	if (!shinfo->nr_frags) {
+		len -= RXE_ICRC_SIZE;
+		*icrcp = (__be32 *)(addr + len);
+	}
+	if (len > 0)
+		icrc = rxe_crc32(pkt->rxe, icrc, payload_addr(pkt), len);
+	WARN_ON(len < 0);
+
+	/* handle any payload in frags */
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		frag = &shinfo->frags[i];
+		addr = page_to_virt(frag->bv_page) + frag->bv_offset;
+		len = frag->bv_len;
+		if (i == shinfo->nr_frags - 1) {
+			len -= RXE_ICRC_SIZE;
+			*icrcp = (__be32 *)(addr + len);
+		}
+		if (len > 0)
+			icrc = rxe_crc32(pkt->rxe, icrc, addr, len);
+		WARN_ON(len < 0);
+	}
+
+	return icrc;
+}
+
 /**
  * rxe_icrc_check() - Compute ICRC for a packet and compare to the ICRC
  *		      delivered in the packet.
@@ -143,13 +193,11 @@ int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	__be32 pkt_icrc;
 	__be32 icrc;
 
-	icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
-	pkt_icrc = *icrcp;
-
 	icrc = rxe_icrc_hdr(skb, pkt);
-	icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
-				payload_size(pkt) + pkt->pad);
+	icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
+
 	icrc = ~icrc;
+	pkt_icrc = *icrcp;
 
 	if (unlikely(icrc != pkt_icrc))
 		return -EINVAL;
@@ -167,9 +215,8 @@ void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 	__be32 *icrcp;
 	__be32 icrc;
 
-	icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
 	icrc = rxe_icrc_hdr(skb, pkt);
-	icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
-				payload_size(pkt) + pkt->pad);
+	icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
+
 	*icrcp = ~icrc;
 }
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index c44ef39010f1..c43f9dd3ae6e 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -148,33 +148,53 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	struct udphdr *udph;
 	struct rxe_dev *rxe;
 	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
+	u8 opcode;
+	u8 buf[1];
+	u8 *p;
 
 	/* takes a reference on rxe->ib_dev
 	 * drop when skb is freed
 	 */
 	rxe = get_rxe_from_skb(skb);
 	if (!rxe)
-		goto drop;
+		goto err_drop;
 
-	if (skb_linearize(skb)) {
-		ib_device_put(&rxe->ib_dev);
-		goto drop;
+	/* Get bth opcode out of skb, it may be in a fragment */
+	p = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
+	if (!p)
+		goto err_device_put;
+	opcode = *p;
+
+	/* If using fragmented skbs make sure roce headers
+	 * are in linear buffer else make skb linear
+	 */
+	if (rxe_use_sg && skb_is_nonlinear(skb)) {
+		int delta = rxe_opcode[opcode].length -
+			(skb_headlen(skb) - sizeof(struct udphdr));
+
+		if (delta > 0 && !__pskb_pull_tail(skb, delta))
+			goto err_device_put;
+	} else {
+		if (skb_linearize(skb))
+			goto err_device_put;
 	}
 
 	udph = udp_hdr(skb);
 	pkt->rxe = rxe;
 	pkt->port_num = 1;
 	pkt->hdr = (u8 *)(udph + 1);
-	pkt->mask = RXE_GRH_MASK;
+	pkt->mask = rxe_opcode[opcode].mask | RXE_GRH_MASK;
 	pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
 
-	/* remove udp header */
 	skb_pull(skb, sizeof(struct udphdr));
 
 	rxe_rcv(skb);
 
 	return 0;
-drop:
+
+err_device_put:
+	ib_device_put(&rxe->ib_dev);
+err_drop:
 	kfree_skb(skb);
 
 	return 0;
@@ -446,24 +466,35 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
  */
 static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
 {
-	memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
+	struct rxe_pkt_info *newpkt;
+	int err;
 
+	/* make loopback line up with rxe_udp_encap_recv */
 	if (skb->protocol == htons(ETH_P_IP))
 		skb_pull(skb, sizeof(struct iphdr));
 	else
 		skb_pull(skb, sizeof(struct ipv6hdr));
+	skb_reset_transport_header(skb);
+
+	newpkt = SKB_TO_PKT(skb);
+	memcpy(newpkt, pkt, sizeof(*newpkt));
+	newpkt->hdr = skb_transport_header(skb) + sizeof(struct udphdr);
 
 	if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev))) {
 		kfree_skb(skb);
-		return -EIO;
+		err = -EINVAL;
+		goto drop;
 	}
 
 	/* remove udp header */
 	skb_pull(skb, sizeof(struct udphdr));
 
 	rxe_rcv(skb);
-
 	return 0;
+
+drop:
+	kfree_skb(skb);
+	return err;
 }
 
 int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
index f912a913f89a..940197199252 100644
--- a/drivers/infiniband/sw/rxe/rxe_recv.c
+++ b/drivers/infiniband/sw/rxe/rxe_recv.c
@@ -338,6 +338,7 @@ void rxe_rcv(struct sk_buff *skb)
 	if (unlikely(err))
 		goto drop;
 
+	/* skb->data points at UDP header */
 	err = rxe_icrc_check(skb, pkt);
 	if (unlikely(err))
 		goto drop;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 06/10] RDMA/rxe: Extend rxe_init_req_packet() for frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (4 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c " Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 07/10] RDMA/rxe: Extend response packets " Bob Pearson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Add code to rxe_build_req_packet() to allocate space for the
pad and icrc if the skb is fragmented.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_loc.h    |  5 ++
 drivers/infiniband/sw/rxe/rxe_mr.c     |  5 +-
 drivers/infiniband/sw/rxe/rxe_opcode.c |  2 +
 drivers/infiniband/sw/rxe/rxe_req.c    | 83 ++++++++++++++++++++++----
 4 files changed, 84 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
index 96b1fb79610a..40624de62288 100644
--- a/drivers/infiniband/sw/rxe/rxe_loc.h
+++ b/drivers/infiniband/sw/rxe/rxe_loc.h
@@ -177,7 +177,12 @@ void rxe_srq_cleanup(struct rxe_pool_elem *elem);
 void rxe_dealloc(struct ib_device *ib_dev);
 
 int rxe_completer(struct rxe_qp *qp);
+
+/* rxe_req.c */
+int rxe_prepare_pad_icrc(struct rxe_pkt_info *pkt, struct sk_buff *skb,
+			 int payload, bool frag);
 int rxe_requester(struct rxe_qp *qp);
+
 int rxe_responder(struct rxe_qp *qp);
 
 /* rxe_icrc.c */
diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
index 0ac71238599a..5178775f2d4e 100644
--- a/drivers/infiniband/sw/rxe/rxe_mr.c
+++ b/drivers/infiniband/sw/rxe/rxe_mr.c
@@ -263,7 +263,10 @@ int rxe_add_frag(struct sk_buff *skb, struct rxe_mr *mr, struct page *page,
 	skb_frag_t *frag = &skb_shinfo(skb)->frags[nr_frags];
 
 	if (nr_frags >= MAX_SKB_FRAGS) {
-		rxe_dbg_mr(mr, "ran out of frags");
+		if (mr)
+			rxe_dbg_mr(mr, "ran out of frags");
+		else
+			rxe_dbg("ran out of frags");
 		return -EINVAL;
 	}
 
diff --git a/drivers/infiniband/sw/rxe/rxe_opcode.c b/drivers/infiniband/sw/rxe/rxe_opcode.c
index f358b732a751..a72e5fd4f571 100644
--- a/drivers/infiniband/sw/rxe/rxe_opcode.c
+++ b/drivers/infiniband/sw/rxe/rxe_opcode.c
@@ -399,6 +399,8 @@ struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE] = {
 			[RXE_BTH]	= 0,
 			[RXE_FETH]	= RXE_BTH_BYTES,
 			[RXE_RETH]	= RXE_BTH_BYTES + RXE_FETH_BYTES,
+			[RXE_PAYLOAD]	= RXE_BTH_BYTES + RXE_FETH_BYTES +
+					  RXE_RETH_BYTES,
 		}
 	},
 	[IB_OPCODE_RC_ATOMIC_WRITE]                        = {
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 491360fef346..cf34d1a58f85 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -316,26 +316,83 @@ static void rxe_init_roce_hdrs(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 
 static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
 			    struct rxe_pkt_info *pkt, u32 payload,
-			    struct sk_buff *skb)
+			    struct sk_buff *skb, bool frag)
 {
+	int len = skb_tailroom(skb);
+	int tot_len = payload + pkt->pad + RXE_ICRC_SIZE;
+	int access = 0;
 	int skb_offset = 0;
+	int op;
+	void *addr;
 	void *data;
 	int err = 0;
 
 	if (wqe->wr.send_flags & IB_SEND_INLINE) {
+		if (WARN_ON(frag)) {
+			rxe_err_qp(qp, "inline data for fragmented skb not supported");
+			return -EINVAL;
+		}
+		if (len < tot_len) {
+			rxe_err_qp(qp, "skb too small");
+			return -EINVAL;
+		}
 		data = &wqe->dma.inline_data[wqe->dma.sge_offset];
 		memcpy(payload_addr(pkt), data, payload);
 		wqe->dma.resid -= payload;
 		wqe->dma.sge_offset += payload;
 	} else {
-		err = rxe_copy_dma_data(skb, qp->pd, 0, &wqe->dma,
-					payload_addr(pkt), skb_offset,
-					payload, RXE_COPY_FROM_MR);
+		op = frag ? RXE_FRAG_FROM_MR : RXE_COPY_FROM_MR;
+		addr = frag ? NULL : payload_addr(pkt);
+		err = rxe_copy_dma_data(skb, qp->pd, access, &wqe->dma,
+					addr, skb_offset, payload, op);
 	}
 
 	return err;
 }
 
+/**
+ * rxe_prepare_pad_icrc() - Alloc space if fragmented and init pad and icrc
+ * @pkt: packet info
+ * @skb: packet buffer
+ * @payload: roce payload
+ * @frag: true if skb is fragmented
+ *
+ * Returns: 0 on success else an error
+ */
+int rxe_prepare_pad_icrc(struct rxe_pkt_info *pkt, struct sk_buff *skb,
+			 int payload, bool frag)
+{
+	unsigned int length = RXE_ICRC_SIZE + pkt->pad;
+	unsigned int offset;
+	struct page *page;
+	u64 iova;
+	u8 *addr;
+
+	if (frag) {
+		addr = skb_end_pointer(skb) - length;
+		iova = (uintptr_t)addr;
+		page = virt_to_page(iova);
+		offset = iova & (PAGE_SIZE - 1);
+
+		/* make sure we have enough room and frag
+		 * doesn't cross page boundary should never
+		 * happen
+		 */
+		if (WARN_ON(((skb->end - skb->tail) <= length) ||
+			((offset + length) > PAGE_SIZE)))
+			return -ENOMEM;
+
+		memset(addr, 0, length);
+
+		return rxe_add_frag(skb, NULL, page, length, offset);
+	}
+
+	addr = payload_addr(pkt) + payload;
+	memset(addr, 0, length);
+
+	return 0;
+}
+
 static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 					   struct rxe_send_wqe *wqe,
 					   int opcode, u32 payload,
@@ -345,7 +402,7 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 	struct sk_buff *skb = NULL;
 	struct rxe_av *av;
 	struct rxe_ah *ah = NULL;
-	u8 *pad_addr;
+	bool frag = false;
 	int err;
 
 	pkt->rxe = rxe;
@@ -380,9 +437,13 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 
 	/* init payload if any */
 	if (pkt->mask & RXE_WRITE_OR_SEND_MASK) {
-		err = rxe_init_payload(qp, wqe, pkt, payload, skb);
-		if (unlikely(err))
+		err = rxe_init_payload(qp, wqe, pkt, payload,
+				       skb, frag);
+		if (unlikely(err)) {
+			rxe_dbg_qp(qp, "rxe_init_payload failed, err = %d",
+				   err);
 			goto err_out;
+		}
 	} else if (pkt->mask & RXE_FLUSH_MASK) {
 		/* oA19-2: shall have no payload. */
 		wqe->dma.resid = 0;
@@ -394,9 +455,11 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 	}
 
 	/* init pad and icrc */
-	if (pkt->pad) {
-		pad_addr = payload_addr(pkt) + payload;
-		memset(pad_addr, 0, pkt->pad);
+	err = rxe_prepare_pad_icrc(pkt, skb, payload, frag);
+	if (unlikely(err)) {
+		rxe_dbg_qp(qp, "rxe_prepare_pad_icrc failed, err = %d",
+			   err);
+		goto err_out;
 	}
 
 	/* init IP and UDP network headers */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 07/10] RDMA/rxe: Extend response packets for frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (5 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 06/10] RDMA/rxe: Extend rxe_init_req_packet() for frags Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 08/10] RDMA/rxe: Extend send/write_data_in() " Bob Pearson
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Extend prepare_ack_packet(), read_reply() and send_common_ack() in
rxe_resp.c to support fragmented skbs.  Adjust calls to these routines
for the changed API.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 59 ++++++++++++++++++----------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index 254f2eab8d20..dc62e11dc448 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -765,14 +765,11 @@ static enum resp_states atomic_write_reply(struct rxe_qp *qp,
 
 static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 					  struct rxe_pkt_info *ack,
-					  int opcode,
-					  int payload,
-					  u32 psn,
-					  u8 syndrome)
+					  int opcode, int payload, u32 psn,
+					  u8 syndrome, bool *fragp)
 {
 	struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
 	struct sk_buff *skb;
-	int err;
 
 	ack->rxe = rxe;
 	ack->qp = qp;
@@ -788,7 +785,7 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	ack->paylen = rxe_opcode[opcode].length + payload +
 			ack->pad + RXE_ICRC_SIZE;
 
-	skb = rxe_init_packet(qp, &qp->pri_av, ack, NULL);
+	skb = rxe_init_packet(qp, &qp->pri_av, ack, fragp);
 	if (!skb)
 		return NULL;
 
@@ -803,12 +800,6 @@ static struct sk_buff *prepare_ack_packet(struct rxe_qp *qp,
 	if (ack->mask & RXE_ATMACK_MASK)
 		atmack_set_orig(ack, qp->resp.res->atomic.orig_val);
 
-	err = rxe_prepare(&qp->pri_av, ack, skb);
-	if (err) {
-		kfree_skb(skb);
-		return NULL;
-	}
-
 	return skb;
 }
 
@@ -881,7 +872,8 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	struct resp_res *res = qp->resp.res;
 	struct rxe_mr *mr;
 	unsigned int skb_offset = 0;
-	u8 *pad_addr;
+	enum rxe_mr_copy_op op;
+	bool frag;
 
 	if (!res) {
 		res = rxe_prepare_res(qp, req_pkt, RXE_READ_MASK);
@@ -898,8 +890,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 			qp->resp.mr = NULL;
 		} else {
 			mr = rxe_recheck_mr(qp, res->read.rkey);
-			if (!mr)
-				return RESPST_ERR_RKEY_VIOLATION;
+			if (!mr) {
+				state = RESPST_ERR_RKEY_VIOLATION;
+				goto err_out;
+			}
 		}
 
 		if (res->read.resid <= mtu)
@@ -926,23 +920,33 @@ static enum resp_states read_reply(struct rxe_qp *qp,
 	payload = min_t(int, res->read.resid, mtu);
 
 	skb = prepare_ack_packet(qp, &ack_pkt, opcode, payload,
-				 res->cur_psn, AETH_ACK_UNLIMITED);
+				 res->cur_psn, AETH_ACK_UNLIMITED, &frag);
 	if (!skb) {
 		state = RESPST_ERR_RNR;
 		goto err_out;
 	}
 
+	op = frag ? RXE_FRAG_FROM_MR : RXE_COPY_FROM_MR;
 	err = rxe_copy_mr_data(skb, mr, res->read.va, payload_addr(&ack_pkt),
-			       skb_offset, payload, RXE_COPY_FROM_MR);
+			       skb_offset, payload, op);
 	if (err) {
 		kfree_skb(skb);
 		state = RESPST_ERR_RKEY_VIOLATION;
 		goto err_out;
 	}
 
-	if (ack_pkt.pad) {
-		pad_addr = payload_addr(&ack_pkt) + payload;
-		memset(pad_addr, 0, ack_pkt.pad);
+	err = rxe_prepare_pad_icrc(&ack_pkt, skb, payload, frag);
+	if (err) {
+		kfree_skb(skb);
+		state = RESPST_ERR_RNR;
+		goto err_out;
+	}
+
+	err = rxe_prepare(&qp->pri_av, &ack_pkt, skb);
+	if (err) {
+		kfree_skb(skb);
+		state = RESPST_ERR_RNR;
+		goto err_out;
 	}
 
 	/* rxe_xmit_packet always consumes the skb */
@@ -1177,10 +1181,23 @@ static int send_common_ack(struct rxe_qp *qp, u8 syndrome, u32 psn,
 	struct rxe_pkt_info ack_pkt;
 	struct sk_buff *skb;
 
-	skb = prepare_ack_packet(qp, &ack_pkt, opcode, 0, psn, syndrome);
+	skb = prepare_ack_packet(qp, &ack_pkt, opcode, 0, psn,
+				 syndrome, NULL);
 	if (!skb)
 		return -ENOMEM;
 
+	err = rxe_prepare_pad_icrc(&ack_pkt, skb, 0, false);
+	if (err) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	err = rxe_prepare(&qp->pri_av, &ack_pkt, skb);
+	if (err) {
+		kfree_skb(skb);
+		return err;
+	}
+
 	err = rxe_xmit_packet(qp, &ack_pkt, skb);
 	if (err)
 		rxe_dbg_qp(qp, "Failed sending %s\n", msg);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 08/10] RDMA/rxe: Extend send/write_data_in() for frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (6 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 07/10] RDMA/rxe: Extend response packets " Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 09/10] RDMA/rxe: Extend do_read() in rxe_comp.c " Bob Pearson
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Extend send_data_in() and write_data_in() in rxe_resp.c to
support fragmented received skbs.

This is in preparation for using fragmented skbs.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_resp.c | 102 +++++++++++++++++----------
 1 file changed, 64 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index dc62e11dc448..c7153e376987 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -559,45 +559,88 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
 	return state;
 }
 
-static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
-				     int data_len)
+/**
+ * rxe_send_data_in() - Copy payload data into receive buffer
+ * @qp: The queue pair
+ * @pkt: Request packet info
+ *
+ * Copy the packet payload into the receive buffer at the current offset.
+ * If a UD message also copy the IP header into the receive buffer.
+ *
+ * Returns: 0 if successful else an error resp_states value.
+ */
+static enum resp_states rxe_send_data_in(struct rxe_qp *qp,
+					 struct rxe_pkt_info *pkt)
 {
-	struct sk_buff *skb = NULL;
+	struct sk_buff *skb = PKT_TO_SKB(pkt);
+	u8 *data_addr = payload_addr(pkt);
+	int data_len = payload_size(pkt);
+	union rdma_network_hdr hdr;
+	enum rxe_mr_copy_op op;
 	int skb_offset = 0;
 	int err;
 
+	/* Per IBA for UD packets copy the IP header into the receive buffer */
+	if (qp_type(qp) == IB_QPT_UD || qp_type(qp) == IB_QPT_GSI) {
+		if (skb->protocol == htons(ETH_P_IP)) {
+			memset(&hdr.reserved, 0, sizeof(hdr.reserved));
+			memcpy(&hdr.roce4grh, ip_hdr(skb), sizeof(hdr.roce4grh));
+		} else {
+			memcpy(&hdr.ibgrh, ipv6_hdr(skb), sizeof(hdr));
+		}
+		err = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
+					&qp->resp.wqe->dma, &hdr, skb_offset,
+					sizeof(hdr), RXE_COPY_TO_MR);
+		if (err)
+			goto err_out;
+	}
+
+	op = skb_is_nonlinear(skb) ? RXE_FRAG_TO_MR : RXE_COPY_TO_MR;
+	/* offset to payload from skb->data (= &bth header) */
+	skb_offset = rxe_opcode[pkt->opcode].length;
 	err = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
 				&qp->resp.wqe->dma, data_addr,
-				skb_offset, data_len, RXE_COPY_TO_MR);
-	if (unlikely(err))
-		return (err == -ENOSPC) ? RESPST_ERR_LENGTH
-					: RESPST_ERR_MALFORMED_WQE;
+				skb_offset, data_len, op);
+	if (err)
+		goto err_out;
 
 	return RESPST_NONE;
+
+err_out:
+	return (err == -ENOSPC) ? RESPST_ERR_LENGTH
+				: RESPST_ERR_MALFORMED_WQE;
 }
 
-static enum resp_states write_data_in(struct rxe_qp *qp,
-				      struct rxe_pkt_info *pkt)
+/**
+ * rxe_write_data_in() - Copy payload data to iova
+ * @qp: The queue pair
+ * @pkt: Request packet info
+ *
+ * Copy the packet payload to current iova and update iova.
+ *
+ * Returns: 0 if successful else an error resp_states value.
+ */
+static enum resp_states rxe_write_data_in(struct rxe_qp *qp,
+					  struct rxe_pkt_info *pkt)
 {
 	struct sk_buff *skb = PKT_TO_SKB(pkt);
-	enum resp_states rc = RESPST_NONE;
+	u8 *data_addr = payload_addr(pkt);
 	int data_len = payload_size(pkt);
+	enum rxe_mr_copy_op op;
+	int skb_offset;
 	int err;
-	int skb_offset = 0;
 
+	op = skb_is_nonlinear(skb) ? RXE_FRAG_TO_MR : RXE_COPY_TO_MR;
+	skb_offset = rxe_opcode[pkt->opcode].length;
 	err = rxe_copy_mr_data(skb, qp->resp.mr, qp->resp.va + qp->resp.offset,
-			  payload_addr(pkt), skb_offset, data_len,
-			  RXE_COPY_TO_MR);
-	if (err) {
-		rc = RESPST_ERR_RKEY_VIOLATION;
-		goto out;
-	}
+			  data_addr, skb_offset, data_len, op);
+	if (err)
+		return RESPST_ERR_RKEY_VIOLATION;
 
 	qp->resp.va += data_len;
 	qp->resp.resid -= data_len;
 
-out:
-	return rc;
+	return RESPST_NONE;
 }
 
 static struct resp_res *rxe_prepare_res(struct rxe_qp *qp,
@@ -991,30 +1034,13 @@ static int invalidate_rkey(struct rxe_qp *qp, u32 rkey)
 static enum resp_states execute(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
 {
 	enum resp_states err;
-	struct sk_buff *skb = PKT_TO_SKB(pkt);
-	union rdma_network_hdr hdr;
 
 	if (pkt->mask & RXE_SEND_MASK) {
-		if (qp_type(qp) == IB_QPT_UD ||
-		    qp_type(qp) == IB_QPT_GSI) {
-			if (skb->protocol == htons(ETH_P_IP)) {
-				memset(&hdr.reserved, 0,
-						sizeof(hdr.reserved));
-				memcpy(&hdr.roce4grh, ip_hdr(skb),
-						sizeof(hdr.roce4grh));
-				err = send_data_in(qp, &hdr, sizeof(hdr));
-			} else {
-				err = send_data_in(qp, ipv6_hdr(skb),
-						sizeof(hdr));
-			}
-			if (err)
-				return err;
-		}
-		err = send_data_in(qp, payload_addr(pkt), payload_size(pkt));
+		err = rxe_send_data_in(qp, pkt);
 		if (err)
 			return err;
 	} else if (pkt->mask & RXE_WRITE_MASK) {
-		err = write_data_in(qp, pkt);
+		err = rxe_write_data_in(qp, pkt);
 		if (err)
 			return err;
 	} else if (pkt->mask & RXE_READ_MASK) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 09/10] RDMA/rxe: Extend do_read() in rxe_comp.c for frags
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (7 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 08/10] RDMA/rxe: Extend send/write_data_in() " Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-07-27 20:01 ` [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe Bob Pearson
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Extend do_read() in rxe_comp.c to support fragmented skbs.

Rename rxe_do_read(). Adjust caller's API.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 39 ++++++++++++++++++----------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 670ee08f6f5a..ecaaed15c4eb 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -360,22 +360,35 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 	return COMPST_ERROR;
 }
 
-static inline enum comp_state do_read(struct rxe_qp *qp,
-				      struct rxe_pkt_info *pkt,
-				      struct rxe_send_wqe *wqe)
+/**
+ * rxe_do_read() - Process read reply packet
+ * @qp: The queue pair
+ * @pkt: Packet info
+ * @wqe: The current work request
+ *
+ * Copy payload from incoming read reply packet into current
+ * iova.
+ *
+ * Returns: 0 on success else an error comp_state
+ */
+static inline enum comp_state rxe_do_read(struct rxe_qp *qp,
+					  struct rxe_pkt_info *pkt,
+					  struct rxe_send_wqe *wqe)
 {
 	struct sk_buff *skb = PKT_TO_SKB(pkt);
-	int skb_offset = 0;
-	int ret;
+	u8 *data_addr = payload_addr(pkt);
+	int data_len = payload_size(pkt);
+	enum rxe_mr_copy_op op;
+	int skb_offset;
+	int err;
 
-	ret = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
-				&wqe->dma, payload_addr(pkt),
-				skb_offset, payload_size(pkt),
-				RXE_COPY_TO_MR);
-	if (ret) {
-		wqe->status = IB_WC_LOC_PROT_ERR;
+	op = skb_is_nonlinear(skb) ? RXE_FRAG_TO_MR : RXE_COPY_TO_MR;
+	skb_offset = rxe_opcode[pkt->opcode].length;
+	err = rxe_copy_dma_data(skb, qp->pd, IB_ACCESS_LOCAL_WRITE,
+				&wqe->dma, data_addr,
+				skb_offset, data_len, op);
+	if (err)
 		return COMPST_ERROR;
-	}
 
 	if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
 		return COMPST_COMP_ACK;
@@ -704,7 +717,7 @@ int rxe_completer(struct rxe_qp *qp)
 			break;
 
 		case COMPST_READ:
-			state = do_read(qp, pkt, wqe);
+			state = rxe_do_read(qp, pkt, wqe);
 			break;
 
 		case COMPST_ATOMIC:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (8 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 09/10] RDMA/rxe: Extend do_read() in rxe_comp.c " Bob Pearson
@ 2023-07-27 20:01 ` Bob Pearson
  2023-08-15 19:07   ` Jason Gunthorpe
  2023-07-28  0:40 ` [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Zhu Yanjun
  2023-08-15 19:08 ` Jason Gunthorpe
  11 siblings, 1 reply; 20+ messages in thread
From: Bob Pearson @ 2023-07-27 20:01 UTC (permalink / raw)
  To: jgg, zyjzyj2000, linux-rdma, jhack; +Cc: Bob Pearson

Make changes to enable sg code in rxe.

Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
---
 drivers/infiniband/sw/rxe/rxe.c     | 4 ++--
 drivers/infiniband/sw/rxe/rxe_req.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
index 800e8c0d437d..b52dd1704e74 100644
--- a/drivers/infiniband/sw/rxe/rxe.c
+++ b/drivers/infiniband/sw/rxe/rxe.c
@@ -14,9 +14,9 @@ MODULE_DESCRIPTION("Soft RDMA transport");
 MODULE_LICENSE("Dual BSD/GPL");
 
 /* if true allow using fragmented skbs */
-bool rxe_use_sg;
+bool rxe_use_sg = true;
 module_param_named(use_sg, rxe_use_sg, bool, 0444);
-MODULE_PARM_DESC(use_sg, "Support skb frags; default false");
+MODULE_PARM_DESC(use_sg, "Support skb frags; default true");
 
 /* free resources for a rxe device all objects created for this device must
  * have been destroyed
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index cf34d1a58f85..d00c24e1a569 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -402,7 +402,7 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 	struct sk_buff *skb = NULL;
 	struct rxe_av *av;
 	struct rxe_ah *ah = NULL;
-	bool frag = false;
+	bool frag;
 	int err;
 
 	pkt->rxe = rxe;
@@ -426,7 +426,7 @@ static struct sk_buff *rxe_init_req_packet(struct rxe_qp *qp,
 			pkt->pad + RXE_ICRC_SIZE;
 
 	/* init skb */
-	skb = rxe_init_packet(qp, av, pkt, NULL);
+	skb = rxe_init_packet(qp, av, pkt, &frag);
 	if (unlikely(!skb)) {
 		err = -ENOMEM;
 		goto err_out;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (9 preceding siblings ...)
  2023-07-27 20:01 ` [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe Bob Pearson
@ 2023-07-28  0:40 ` Zhu Yanjun
  2023-07-28  1:54   ` Bob Pearson
  2023-08-15 19:08 ` Jason Gunthorpe
  11 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2023-07-28  0:40 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma, jhack

在 2023/7/28 4:01, Bob Pearson 写道:
> This patch set is a revised version of an older set which implements
> support for nonlinear or fragmented packets. This avoids extra copies
> in both the send and receive paths and gives significant performance
> improvement for large messages such as are used in storage applications.
> 
> This patch set has been heavily tested at large system scale and
> demonstrated a 2X improvement in file system read performance on
> a 200 Gb/sec network.
> 
> The patch set is rebased to the current for-next branch with the
> following previous patch sets applied:
> 	RDMA/rxe: Fix incomplete state save in rxe_requester
> 	RDMA/rxe: Misc fixes and cleanups
> 	Enable rcu locking of verbs objects
> 	RDMA/rxe: Misc cleanups
> 
> Bob Pearson (10):
>    RDMA/rxe: Add sg fragment ops
>    RDMA/rxe: Extend rxe_mr_copy to support skb frags
>    RDMA/rxe: Extend copy_data to support skb frags
>    RDMA/rxe: Extend rxe_init_packet() to support frags
>    RDMA/rxe: Extend rxe_icrc.c to support frags
>    RDMA/rxe: Extend rxe_init_req_packet() for frags
>    RDMA/rxe: Extend response packets for frags
>    RDMA/rxe: Extend send/write_data_in() for frags
>    RDMA/rxe: Extend do_read() in rxe_comp.c for frags
>    RDMA/rxe: Enable sg code in rxe
> 
>   drivers/infiniband/sw/rxe/rxe.c        |   5 +
>   drivers/infiniband/sw/rxe/rxe.h        |   3 +
>   drivers/infiniband/sw/rxe/rxe_comp.c   |  46 +++-
>   drivers/infiniband/sw/rxe/rxe_icrc.c   |  65 ++++-
>   drivers/infiniband/sw/rxe/rxe_loc.h    |  27 +-
>   drivers/infiniband/sw/rxe/rxe_mr.c     | 348 +++++++++++++++++++------
>   drivers/infiniband/sw/rxe/rxe_net.c    | 109 +++++++-
>   drivers/infiniband/sw/rxe/rxe_opcode.c |   2 +
>   drivers/infiniband/sw/rxe/rxe_recv.c   |   1 +
>   drivers/infiniband/sw/rxe/rxe_req.c    |  88 ++++++-
>   drivers/infiniband/sw/rxe/rxe_resp.c   | 172 +++++++-----
>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   8 +-
>   12 files changed, 672 insertions(+), 202 deletions(-)
> 
> 

What are the following? This is the new format in linux kernel community?

Zhu Yanjun

> base-commit: 693e1cdebb50d2aa67406411ca6d5be195d62771
> prerequisite-patch-id: c3994e7a93e37e0ce4f50e0c768f3c1a0059a02f
> prerequisite-patch-id: 48e13f6ccb560fdeacbd20aaf6696782c23d1190
> prerequisite-patch-id: da75fb8eaa863df840e7b392b5048fcc72b0bef3
> prerequisite-patch-id: d0877649e2edaf00585a0a6a80391fe0d7bbc13b
> prerequisite-patch-id: 6495b1d1f664f8ab91ed9ef9d2ca5b3b27d7df35
> prerequisite-patch-id: a6367b8fedd0d8999139c8b857ebbd3ce5c72245
> prerequisite-patch-id: 78c95e90a5e49b15b7af8ef57130739c143e88b5
> prerequisite-patch-id: 7c65a01066c0418de6897bc8b5f44d078d21b0ec
> prerequisite-patch-id: 8ab09f93c23c7875e56c597e69236c30464723b6
> prerequisite-patch-id: ca9d84b34873b49048e42fb4c13a2a097c215c46
> prerequisite-patch-id: 0f6a587501c8246e1185dfd0cbf5e2044c5f9b13
> prerequisite-patch-id: 5246df93137429916d76e75b9a13a4ad5ceb0bad
> prerequisite-patch-id: 41b0e4150794dd914d9fcb4cd106fe4cf4227611
> prerequisite-patch-id: 02b08ec037bc35b9c7771640c89c66504cdf38a6
> prerequisite-patch-id: dfccc06c16454d7fe8e6fcba064d4e471d314666
> prerequisite-patch-id: 7459a6e5cdd46efd53ba27f9b3e9028af6e0863b
> prerequisite-patch-id: 36d49f9303f5cb276a5601c1ab568eea6eca7d3a
> prerequisite-patch-id: 6359a681e40832694f81ca003c10e5327996bf7d
> prerequisite-patch-id: 558175db657f374dbd3e0a57ac4c5fb77a56b6c6
> prerequisite-patch-id: d6b811de06c8900be5840dd29715161d26db66cf


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops
  2023-07-27 20:01 ` [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops Bob Pearson
@ 2023-07-28  1:07   ` Zhu Yanjun
  2023-07-28  1:49     ` Bob Pearson
  0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2023-07-28  1:07 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma, jhack

在 2023/7/28 4:01, Bob Pearson 写道:
> Rename rxe_mr_copy_dir to rxe_mr_copy_op. This allows
> adding new fragment operations later.

In this commit, only renaming a enum from rxe_mr_copy_dir to 
rxe_mr_copy_op. And no bug fix and performance enhancing. I am not sure 
if it is good or not. Please Jason and Leon also check it.

Zhu Yanjun

> 
> This is in preparation for supporting fragmented skbs.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_comp.c  |  4 ++--
>   drivers/infiniband/sw/rxe/rxe_loc.h   |  4 ++--
>   drivers/infiniband/sw/rxe/rxe_mr.c    | 22 +++++++++++-----------
>   drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
>   drivers/infiniband/sw/rxe/rxe_resp.c  |  6 +++---
>   drivers/infiniband/sw/rxe/rxe_verbs.h |  6 +++---
>   6 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 5111735aafae..e3f8dfc9b8bf 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -368,7 +368,7 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
>   
>   	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>   			&wqe->dma, payload_addr(pkt),
> -			payload_size(pkt), RXE_TO_MR_OBJ);
> +			payload_size(pkt), RXE_COPY_TO_MR);
>   	if (ret) {
>   		wqe->status = IB_WC_LOC_PROT_ERR;
>   		return COMPST_ERROR;
> @@ -390,7 +390,7 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
>   
>   	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>   			&wqe->dma, &atomic_orig,
> -			sizeof(u64), RXE_TO_MR_OBJ);
> +			sizeof(u64), RXE_COPY_TO_MR);
>   	if (ret) {
>   		wqe->status = IB_WC_LOC_PROT_ERR;
>   		return COMPST_ERROR;
> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
> index cf38f4dcff78..532026cdd49e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
> @@ -64,9 +64,9 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>   int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
>   int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
>   int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
> -		unsigned int length, enum rxe_mr_copy_dir dir);
> +		unsigned int length, enum rxe_mr_copy_op op);
>   int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
> -	      void *addr, int length, enum rxe_mr_copy_dir dir);
> +	      void *addr, int length, enum rxe_mr_copy_op op);
>   int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
>   		  int sg_nents, unsigned int *sg_offset);
>   int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> index f54042e9aeb2..812c85cad463 100644
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> @@ -243,7 +243,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>   }
>   
>   static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
> -			      unsigned int length, enum rxe_mr_copy_dir dir)
> +			      unsigned int length, enum rxe_mr_copy_op op)
>   {
>   	unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
>   	unsigned long index = rxe_mr_iova_to_index(mr, iova);
> @@ -259,7 +259,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
>   		bytes = min_t(unsigned int, length,
>   				mr_page_size(mr) - page_offset);
>   		va = kmap_local_page(page);
> -		if (dir == RXE_FROM_MR_OBJ)
> +		if (op == RXE_COPY_FROM_MR)
>   			memcpy(addr, va + page_offset, bytes);
>   		else
>   			memcpy(va + page_offset, addr, bytes);
> @@ -275,7 +275,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
>   }
>   
>   static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
> -			    unsigned int length, enum rxe_mr_copy_dir dir)
> +			    unsigned int length, enum rxe_mr_copy_op op)
>   {
>   	unsigned int page_offset = dma_addr & (PAGE_SIZE - 1);
>   	unsigned int bytes;
> @@ -288,10 +288,10 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
>   				PAGE_SIZE - page_offset);
>   		va = kmap_local_page(page);
>   
> -		if (dir == RXE_TO_MR_OBJ)
> -			memcpy(va + page_offset, addr, bytes);
> -		else
> +		if (op == RXE_COPY_FROM_MR)
>   			memcpy(addr, va + page_offset, bytes);
> +		else
> +			memcpy(va + page_offset, addr, bytes);
>   
>   		kunmap_local(va);
>   		page_offset = 0;
> @@ -302,7 +302,7 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
>   }
>   
>   int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
> -		unsigned int length, enum rxe_mr_copy_dir dir)
> +		unsigned int length, enum rxe_mr_copy_op op)
>   {
>   	int err;
>   
> @@ -313,7 +313,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>   		return -EINVAL;
>   
>   	if (mr->ibmr.type == IB_MR_TYPE_DMA) {
> -		rxe_mr_copy_dma(mr, iova, addr, length, dir);
> +		rxe_mr_copy_dma(mr, iova, addr, length, op);
>   		return 0;
>   	}
>   
> @@ -323,7 +323,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>   		return err;
>   	}
>   
> -	return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
> +	return rxe_mr_copy_xarray(mr, iova, addr, length, op);
>   }
>   
>   /* copy data in or out of a wqe, i.e. sg list
> @@ -335,7 +335,7 @@ int copy_data(
>   	struct rxe_dma_info	*dma,
>   	void			*addr,
>   	int			length,
> -	enum rxe_mr_copy_dir	dir)
> +	enum rxe_mr_copy_op	op)
>   {
>   	int			bytes;
>   	struct rxe_sge		*sge	= &dma->sge[dma->cur_sge];
> @@ -395,7 +395,7 @@ int copy_data(
>   
>   		if (bytes > 0) {
>   			iova = sge->addr + offset;
> -			err = rxe_mr_copy(mr, iova, addr, bytes, dir);
> +			err = rxe_mr_copy(mr, iova, addr, bytes, op);
>   			if (err)
>   				goto err2;
>   
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index 51b781ac2844..f3653234cf32 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -327,7 +327,7 @@ static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
>   		wqe->dma.sge_offset += payload;
>   	} else {
>   		err = copy_data(qp->pd, 0, &wqe->dma, payload_addr(pkt),
> -				payload, RXE_FROM_MR_OBJ);
> +				payload, RXE_COPY_FROM_MR);
>   	}
>   
>   	return err;
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> index 8a25c56dfd86..596615c515ad 100644
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> @@ -565,7 +565,7 @@ static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
>   	int err;
>   
>   	err = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma,
> -			data_addr, data_len, RXE_TO_MR_OBJ);
> +			data_addr, data_len, RXE_COPY_TO_MR);
>   	if (unlikely(err))
>   		return (err == -ENOSPC) ? RESPST_ERR_LENGTH
>   					: RESPST_ERR_MALFORMED_WQE;
> @@ -581,7 +581,7 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
>   	int data_len = payload_size(pkt);
>   
>   	err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
> -			  payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
> +			  payload_addr(pkt), data_len, RXE_COPY_TO_MR);
>   	if (err) {
>   		rc = RESPST_ERR_RKEY_VIOLATION;
>   		goto out;
> @@ -928,7 +928,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>   	}
>   
>   	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> -			  payload, RXE_FROM_MR_OBJ);
> +			  payload, RXE_COPY_FROM_MR);
>   	if (err) {
>   		kfree_skb(skb);
>   		state = RESPST_ERR_RKEY_VIOLATION;
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index ccb9d19ffe8a..d9c44bd30da4 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -275,9 +275,9 @@ enum rxe_mr_state {
>   	RXE_MR_STATE_VALID,
>   };
>   
> -enum rxe_mr_copy_dir {
> -	RXE_TO_MR_OBJ,
> -	RXE_FROM_MR_OBJ,
> +enum rxe_mr_copy_op {
> +	RXE_COPY_TO_MR,
> +	RXE_COPY_FROM_MR,
>   };
>   
>   enum rxe_mr_lookup_type {


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops
  2023-07-28  1:07   ` Zhu Yanjun
@ 2023-07-28  1:49     ` Bob Pearson
  0 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-28  1:49 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, zyjzyj2000, linux-rdma, jhack

On 7/27/23 20:07, Zhu Yanjun wrote:
> 在 2023/7/28 4:01, Bob Pearson 写道:
>> Rename rxe_mr_copy_dir to rxe_mr_copy_op. This allows
>> adding new fragment operations later.
> 
> In this commit, only renaming a enum from rxe_mr_copy_dir to rxe_mr_copy_op. And no bug fix and performance enhancing. I am not sure if it is good or not. Please Jason and Leon also check it.
> 
> Zhu Yanjun
> 
>>
>> This is in preparation for supporting fragmented skbs.

Read the next one.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_comp.c  |  4 ++--
>>   drivers/infiniband/sw/rxe/rxe_loc.h   |  4 ++--
>>   drivers/infiniband/sw/rxe/rxe_mr.c    | 22 +++++++++++-----------
>>   drivers/infiniband/sw/rxe/rxe_req.c   |  2 +-
>>   drivers/infiniband/sw/rxe/rxe_resp.c  |  6 +++---
>>   drivers/infiniband/sw/rxe/rxe_verbs.h |  6 +++---
>>   6 files changed, 22 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index 5111735aafae..e3f8dfc9b8bf 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -368,7 +368,7 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
>>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>>               &wqe->dma, payload_addr(pkt),
>> -            payload_size(pkt), RXE_TO_MR_OBJ);
>> +            payload_size(pkt), RXE_COPY_TO_MR);
>>       if (ret) {
>>           wqe->status = IB_WC_LOC_PROT_ERR;
>>           return COMPST_ERROR;
>> @@ -390,7 +390,7 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
>>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>>               &wqe->dma, &atomic_orig,
>> -            sizeof(u64), RXE_TO_MR_OBJ);
>> +            sizeof(u64), RXE_COPY_TO_MR);
>>       if (ret) {
>>           wqe->status = IB_WC_LOC_PROT_ERR;
>>           return COMPST_ERROR;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_loc.h b/drivers/infiniband/sw/rxe/rxe_loc.h
>> index cf38f4dcff78..532026cdd49e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_loc.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_loc.h
>> @@ -64,9 +64,9 @@ int rxe_mr_init_user(struct rxe_dev *rxe, u64 start, u64 length, u64 iova,
>>   int rxe_mr_init_fast(int max_pages, struct rxe_mr *mr);
>>   int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, unsigned int length);
>>   int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>> -        unsigned int length, enum rxe_mr_copy_dir dir);
>> +        unsigned int length, enum rxe_mr_copy_op op);
>>   int copy_data(struct rxe_pd *pd, int access, struct rxe_dma_info *dma,
>> -          void *addr, int length, enum rxe_mr_copy_dir dir);
>> +          void *addr, int length, enum rxe_mr_copy_op op);
>>   int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
>>             int sg_nents, unsigned int *sg_offset);
>>   int rxe_mr_do_atomic_op(struct rxe_mr *mr, u64 iova, int opcode,
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index f54042e9aeb2..812c85cad463 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -243,7 +243,7 @@ int rxe_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sgl,
>>   }
>>     static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
>> -                  unsigned int length, enum rxe_mr_copy_dir dir)
>> +                  unsigned int length, enum rxe_mr_copy_op op)
>>   {
>>       unsigned int page_offset = rxe_mr_iova_to_page_offset(mr, iova);
>>       unsigned long index = rxe_mr_iova_to_index(mr, iova);
>> @@ -259,7 +259,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
>>           bytes = min_t(unsigned int, length,
>>                   mr_page_size(mr) - page_offset);
>>           va = kmap_local_page(page);
>> -        if (dir == RXE_FROM_MR_OBJ)
>> +        if (op == RXE_COPY_FROM_MR)
>>               memcpy(addr, va + page_offset, bytes);
>>           else
>>               memcpy(va + page_offset, addr, bytes);
>> @@ -275,7 +275,7 @@ static int rxe_mr_copy_xarray(struct rxe_mr *mr, u64 iova, void *addr,
>>   }
>>     static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
>> -                unsigned int length, enum rxe_mr_copy_dir dir)
>> +                unsigned int length, enum rxe_mr_copy_op op)
>>   {
>>       unsigned int page_offset = dma_addr & (PAGE_SIZE - 1);
>>       unsigned int bytes;
>> @@ -288,10 +288,10 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
>>                   PAGE_SIZE - page_offset);
>>           va = kmap_local_page(page);
>>   -        if (dir == RXE_TO_MR_OBJ)
>> -            memcpy(va + page_offset, addr, bytes);
>> -        else
>> +        if (op == RXE_COPY_FROM_MR)
>>               memcpy(addr, va + page_offset, bytes);
>> +        else
>> +            memcpy(va + page_offset, addr, bytes);
>>             kunmap_local(va);
>>           page_offset = 0;
>> @@ -302,7 +302,7 @@ static void rxe_mr_copy_dma(struct rxe_mr *mr, u64 dma_addr, void *addr,
>>   }
>>     int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>> -        unsigned int length, enum rxe_mr_copy_dir dir)
>> +        unsigned int length, enum rxe_mr_copy_op op)
>>   {
>>       int err;
>>   @@ -313,7 +313,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>>           return -EINVAL;
>>         if (mr->ibmr.type == IB_MR_TYPE_DMA) {
>> -        rxe_mr_copy_dma(mr, iova, addr, length, dir);
>> +        rxe_mr_copy_dma(mr, iova, addr, length, op);
>>           return 0;
>>       }
>>   @@ -323,7 +323,7 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr,
>>           return err;
>>       }
>>   -    return rxe_mr_copy_xarray(mr, iova, addr, length, dir);
>> +    return rxe_mr_copy_xarray(mr, iova, addr, length, op);
>>   }
>>     /* copy data in or out of a wqe, i.e. sg list
>> @@ -335,7 +335,7 @@ int copy_data(
>>       struct rxe_dma_info    *dma,
>>       void            *addr,
>>       int            length,
>> -    enum rxe_mr_copy_dir    dir)
>> +    enum rxe_mr_copy_op    op)
>>   {
>>       int            bytes;
>>       struct rxe_sge        *sge    = &dma->sge[dma->cur_sge];
>> @@ -395,7 +395,7 @@ int copy_data(
>>             if (bytes > 0) {
>>               iova = sge->addr + offset;
>> -            err = rxe_mr_copy(mr, iova, addr, bytes, dir);
>> +            err = rxe_mr_copy(mr, iova, addr, bytes, op);
>>               if (err)
>>                   goto err2;
>>   diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
>> index 51b781ac2844..f3653234cf32 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -327,7 +327,7 @@ static int rxe_init_payload(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
>>           wqe->dma.sge_offset += payload;
>>       } else {
>>           err = copy_data(qp->pd, 0, &wqe->dma, payload_addr(pkt),
>> -                payload, RXE_FROM_MR_OBJ);
>> +                payload, RXE_COPY_FROM_MR);
>>       }
>>         return err;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index 8a25c56dfd86..596615c515ad 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -565,7 +565,7 @@ static enum resp_states send_data_in(struct rxe_qp *qp, void *data_addr,
>>       int err;
>>         err = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE, &qp->resp.wqe->dma,
>> -            data_addr, data_len, RXE_TO_MR_OBJ);
>> +            data_addr, data_len, RXE_COPY_TO_MR);
>>       if (unlikely(err))
>>           return (err == -ENOSPC) ? RESPST_ERR_LENGTH
>>                       : RESPST_ERR_MALFORMED_WQE;
>> @@ -581,7 +581,7 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
>>       int data_len = payload_size(pkt);
>>         err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
>> -              payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
>> +              payload_addr(pkt), data_len, RXE_COPY_TO_MR);
>>       if (err) {
>>           rc = RESPST_ERR_RKEY_VIOLATION;
>>           goto out;
>> @@ -928,7 +928,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>       }
>>         err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
>> -              payload, RXE_FROM_MR_OBJ);
>> +              payload, RXE_COPY_FROM_MR);
>>       if (err) {
>>           kfree_skb(skb);
>>           state = RESPST_ERR_RKEY_VIOLATION;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> index ccb9d19ffe8a..d9c44bd30da4 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -275,9 +275,9 @@ enum rxe_mr_state {
>>       RXE_MR_STATE_VALID,
>>   };
>>   -enum rxe_mr_copy_dir {
>> -    RXE_TO_MR_OBJ,
>> -    RXE_FROM_MR_OBJ,
>> +enum rxe_mr_copy_op {
>> +    RXE_COPY_TO_MR,
>> +    RXE_COPY_FROM_MR,
>>   };
>>     enum rxe_mr_lookup_type {
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets
  2023-07-28  0:40 ` [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Zhu Yanjun
@ 2023-07-28  1:54   ` Bob Pearson
  0 siblings, 0 replies; 20+ messages in thread
From: Bob Pearson @ 2023-07-28  1:54 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, zyjzyj2000, linux-rdma, jhack

On 7/27/23 19:40, Zhu Yanjun wrote:
> 在 2023/7/28 4:01, Bob Pearson 写道:
>> This patch set is a revised version of an older set which implements
>> support for nonlinear or fragmented packets. This avoids extra copies
>> in both the send and receive paths and gives significant performance
>> improvement for large messages such as are used in storage applications.
>>
>> This patch set has been heavily tested at large system scale and
>> demonstrated a 2X improvement in file system read performance on
>> a 200 Gb/sec network.
>>
>> The patch set is rebased to the current for-next branch with the
>> following previous patch sets applied:
>>     RDMA/rxe: Fix incomplete state save in rxe_requester
>>     RDMA/rxe: Misc fixes and cleanups
>>     Enable rcu locking of verbs objects
>>     RDMA/rxe: Misc cleanups
>>
>> Bob Pearson (10):
>>    RDMA/rxe: Add sg fragment ops
>>    RDMA/rxe: Extend rxe_mr_copy to support skb frags
>>    RDMA/rxe: Extend copy_data to support skb frags
>>    RDMA/rxe: Extend rxe_init_packet() to support frags
>>    RDMA/rxe: Extend rxe_icrc.c to support frags
>>    RDMA/rxe: Extend rxe_init_req_packet() for frags
>>    RDMA/rxe: Extend response packets for frags
>>    RDMA/rxe: Extend send/write_data_in() for frags
>>    RDMA/rxe: Extend do_read() in rxe_comp.c for frags
>>    RDMA/rxe: Enable sg code in rxe
>>
>>   drivers/infiniband/sw/rxe/rxe.c        |   5 +
>>   drivers/infiniband/sw/rxe/rxe.h        |   3 +
>>   drivers/infiniband/sw/rxe/rxe_comp.c   |  46 +++-
>>   drivers/infiniband/sw/rxe/rxe_icrc.c   |  65 ++++-
>>   drivers/infiniband/sw/rxe/rxe_loc.h    |  27 +-
>>   drivers/infiniband/sw/rxe/rxe_mr.c     | 348 +++++++++++++++++++------
>>   drivers/infiniband/sw/rxe/rxe_net.c    | 109 +++++++-
>>   drivers/infiniband/sw/rxe/rxe_opcode.c |   2 +
>>   drivers/infiniband/sw/rxe/rxe_recv.c   |   1 +
>>   drivers/infiniband/sw/rxe/rxe_req.c    |  88 ++++++-
>>   drivers/infiniband/sw/rxe/rxe_resp.c   | 172 +++++++-----
>>   drivers/infiniband/sw/rxe/rxe_verbs.h  |   8 +-
>>   12 files changed, 672 insertions(+), 202 deletions(-)
>>
>>
> 
> What are the following? This is the new format in linux kernel community?
If you type --base d6b811de06c8900be after git format-patch it documents what the patch
was applied to. I have multiple patch sets that have to be applied in order.
Three of them have already been submitted but are not upstream yet. The fourth
one was submitted just before this set.

Bob
> 
> Zhu Yanjun
> 
>> base-commit: 693e1cdebb50d2aa67406411ca6d5be195d62771
>> prerequisite-patch-id: c3994e7a93e37e0ce4f50e0c768f3c1a0059a02f
>> prerequisite-patch-id: 48e13f6ccb560fdeacbd20aaf6696782c23d1190
>> prerequisite-patch-id: da75fb8eaa863df840e7b392b5048fcc72b0bef3
>> prerequisite-patch-id: d0877649e2edaf00585a0a6a80391fe0d7bbc13b
>> prerequisite-patch-id: 6495b1d1f664f8ab91ed9ef9d2ca5b3b27d7df35
>> prerequisite-patch-id: a6367b8fedd0d8999139c8b857ebbd3ce5c72245
>> prerequisite-patch-id: 78c95e90a5e49b15b7af8ef57130739c143e88b5
>> prerequisite-patch-id: 7c65a01066c0418de6897bc8b5f44d078d21b0ec
>> prerequisite-patch-id: 8ab09f93c23c7875e56c597e69236c30464723b6
>> prerequisite-patch-id: ca9d84b34873b49048e42fb4c13a2a097c215c46
>> prerequisite-patch-id: 0f6a587501c8246e1185dfd0cbf5e2044c5f9b13
>> prerequisite-patch-id: 5246df93137429916d76e75b9a13a4ad5ceb0bad
>> prerequisite-patch-id: 41b0e4150794dd914d9fcb4cd106fe4cf4227611
>> prerequisite-patch-id: 02b08ec037bc35b9c7771640c89c66504cdf38a6
>> prerequisite-patch-id: dfccc06c16454d7fe8e6fcba064d4e471d314666
>> prerequisite-patch-id: 7459a6e5cdd46efd53ba27f9b3e9028af6e0863b
>> prerequisite-patch-id: 36d49f9303f5cb276a5601c1ab568eea6eca7d3a
>> prerequisite-patch-id: 6359a681e40832694f81ca003c10e5327996bf7d
>> prerequisite-patch-id: 558175db657f374dbd3e0a57ac4c5fb77a56b6c6
>> prerequisite-patch-id: d6b811de06c8900be5840dd29715161d26db66cf
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c to support frags
  2023-07-27 20:01 ` [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c " Bob Pearson
@ 2023-07-28 14:20   ` Zhu Yanjun
  2023-07-28 14:49     ` Bob Pearson
  0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2023-07-28 14:20 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma, jhack

在 2023/7/28 4:01, Bob Pearson 写道:
> Extend the subroutines rxe_icrc_generate() and rxe_icrc_check()
> to support skb frags.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_icrc.c | 65 ++++++++++++++++++++++++----
>   drivers/infiniband/sw/rxe/rxe_net.c  | 51 +++++++++++++++++-----
>   drivers/infiniband/sw/rxe/rxe_recv.c |  1 +
>   3 files changed, 98 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_icrc.c b/drivers/infiniband/sw/rxe/rxe_icrc.c
> index c9aa0995e900..393391863350 100644
> --- a/drivers/infiniband/sw/rxe/rxe_icrc.c
> +++ b/drivers/infiniband/sw/rxe/rxe_icrc.c
> @@ -63,7 +63,7 @@ static __be32 rxe_crc32(struct rxe_dev *rxe, __be32 crc, void *next, size_t len)
>   
>   /**
>    * rxe_icrc_hdr() - Compute the partial ICRC for the network and transport
> - *		  headers of a packet.
> + *		    headers of a packet.
>    * @skb: packet buffer
>    * @pkt: packet information
>    *
> @@ -129,6 +129,56 @@ static __be32 rxe_icrc_hdr(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>   	return crc;
>   }
>   
> +/**
> + * rxe_icrc_payload() - Compute the ICRC for a packet payload and also
> + *			compute the address of the icrc in the packet.
> + * @skb: packet buffer
> + * @pkt: packet information
> + * @icrc: current icrc i.e. including headers
> + * @icrcp: returned pointer to icrc in skb
> + *
> + * Return: 0 if the values match else an error
> + */
> +static __be32 rxe_icrc_payload(struct sk_buff *skb, struct rxe_pkt_info *pkt,
> +			       __be32 icrc, __be32 **icrcp)
> +{
> +	struct skb_shared_info *shinfo = skb_shinfo(skb);
> +	skb_frag_t *frag;
> +	u8 *addr;
> +	int hdr_len;
> +	int len;
> +	int i;
> +
> +	/* handle any payload left in the linear buffer */
> +	hdr_len = rxe_opcode[pkt->opcode].length;
> +	addr = pkt->hdr + hdr_len;
> +	len = skb_tail_pointer(skb) - skb_transport_header(skb)
> +		- sizeof(struct udphdr) - hdr_len;
> +	if (!shinfo->nr_frags) {
> +		len -= RXE_ICRC_SIZE;
> +		*icrcp = (__be32 *)(addr + len);
> +	}
> +	if (len > 0)
> +		icrc = rxe_crc32(pkt->rxe, icrc, payload_addr(pkt), len);
> +	WARN_ON(len < 0);
> +
> +	/* handle any payload in frags */
> +	for (i = 0; i < shinfo->nr_frags; i++) {
> +		frag = &shinfo->frags[i];
> +		addr = page_to_virt(frag->bv_page) + frag->bv_offset;
> +		len = frag->bv_len;
> +		if (i == shinfo->nr_frags - 1) {
> +			len -= RXE_ICRC_SIZE;
> +			*icrcp = (__be32 *)(addr + len);
> +		}
> +		if (len > 0)
> +			icrc = rxe_crc32(pkt->rxe, icrc, addr, len);
> +		WARN_ON(len < 0);
> +	}
> +
> +	return icrc;
> +}
> +
>   /**
>    * rxe_icrc_check() - Compute ICRC for a packet and compare to the ICRC
>    *		      delivered in the packet.
> @@ -143,13 +193,11 @@ int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>   	__be32 pkt_icrc;
>   	__be32 icrc;
>   
> -	icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
> -	pkt_icrc = *icrcp;
> -
>   	icrc = rxe_icrc_hdr(skb, pkt);
> -	icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
> -				payload_size(pkt) + pkt->pad);
> +	icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
> +
>   	icrc = ~icrc;
> +	pkt_icrc = *icrcp;
>   
>   	if (unlikely(icrc != pkt_icrc))
>   		return -EINVAL;
> @@ -167,9 +215,8 @@ void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>   	__be32 *icrcp;
>   	__be32 icrc;
>   
> -	icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
>   	icrc = rxe_icrc_hdr(skb, pkt);
> -	icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
> -				payload_size(pkt) + pkt->pad);
> +	icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
> +
>   	*icrcp = ~icrc;
>   }
> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
> index c44ef39010f1..c43f9dd3ae6e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_net.c
> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
> @@ -148,33 +148,53 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	struct udphdr *udph;
>   	struct rxe_dev *rxe;
>   	struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
> +	u8 opcode;
> +	u8 buf[1];
> +	u8 *p;

opcode and *p duplicate.
You can use only one variable.
u8 *opcode;
>   
>   	/* takes a reference on rxe->ib_dev
>   	 * drop when skb is freed
>   	 */
>   	rxe = get_rxe_from_skb(skb);
>   	if (!rxe)
> -		goto drop;
> +		goto err_drop;
>   
> -	if (skb_linearize(skb)) {
> -		ib_device_put(&rxe->ib_dev);
> -		goto drop;
> +	/* Get bth opcode out of skb, it may be in a fragment */
> +	p = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
> +	if (!p)
> +		goto err_device_put;
> +	opcode = *p;


	opcode = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
	if (!opcode)
		goto err_device_put;
;
> +
> +	/* If using fragmented skbs make sure roce headers
> +	 * are in linear buffer else make skb linear
> +	 */
> +	if (rxe_use_sg && skb_is_nonlinear(skb)) {
> +		int delta = rxe_opcode[opcode].length -

		int delta = rxe_opcode[(*opcode)].length -

> +			(skb_headlen(skb) - sizeof(struct udphdr));
> +
> +		if (delta > 0 && !__pskb_pull_tail(skb, delta))
> +			goto err_device_put;
> +	} else {
> +		if (skb_linearize(skb))
> +			goto err_device_put;
>   	}
>   
>   	udph = udp_hdr(skb);
>   	pkt->rxe = rxe;
>   	pkt->port_num = 1;
>   	pkt->hdr = (u8 *)(udph + 1);
> -	pkt->mask = RXE_GRH_MASK;
> +	pkt->mask = rxe_opcode[opcode].mask | RXE_GRH_MASK;

<..>

Zhu Yanjun
>   	pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
>   
> -	/* remove udp header */
>   	skb_pull(skb, sizeof(struct udphdr));
>   
>   	rxe_rcv(skb);
>   
>   	return 0;
> -drop:
> +
> +err_device_put:
> +	ib_device_put(&rxe->ib_dev);
> +err_drop:
>   	kfree_skb(skb);
>   
>   	return 0;
> @@ -446,24 +466,35 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>    */
>   static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>   {
> -	memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
> +	struct rxe_pkt_info *newpkt;
> +	int err;
>   
> +	/* make loopback line up with rxe_udp_encap_recv */
>   	if (skb->protocol == htons(ETH_P_IP))
>   		skb_pull(skb, sizeof(struct iphdr));
>   	else
>   		skb_pull(skb, sizeof(struct ipv6hdr));
> +	skb_reset_transport_header(skb);
> +
> +	newpkt = SKB_TO_PKT(skb);
> +	memcpy(newpkt, pkt, sizeof(*newpkt));
> +	newpkt->hdr = skb_transport_header(skb) + sizeof(struct udphdr);
>   
>   	if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev))) {
>   		kfree_skb(skb);
> -		return -EIO;
> +		err = -EINVAL;
> +		goto drop;
>   	}
>   
>   	/* remove udp header */
>   	skb_pull(skb, sizeof(struct udphdr));
>   
>   	rxe_rcv(skb);
> -
>   	return 0;
> +
> +drop:
> +	kfree_skb(skb);
> +	return err;
>   }
>   
>   int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
> index f912a913f89a..940197199252 100644
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -338,6 +338,7 @@ void rxe_rcv(struct sk_buff *skb)
>   	if (unlikely(err))
>   		goto drop;
>   
> +	/* skb->data points at UDP header */
>   	err = rxe_icrc_check(skb, pkt);
>   	if (unlikely(err))
>   		goto drop;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c to support frags
  2023-07-28 14:20   ` Zhu Yanjun
@ 2023-07-28 14:49     ` Bob Pearson
  2023-07-28 23:39       ` Zhu Yanjun
  0 siblings, 1 reply; 20+ messages in thread
From: Bob Pearson @ 2023-07-28 14:49 UTC (permalink / raw)
  To: Zhu Yanjun, jgg, zyjzyj2000, linux-rdma, jhack

On 7/28/23 09:20, Zhu Yanjun wrote:
> 在 2023/7/28 4:01, Bob Pearson 写道:
>> Extend the subroutines rxe_icrc_generate() and rxe_icrc_check()
>> to support skb frags.
>>
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>>   drivers/infiniband/sw/rxe/rxe_icrc.c | 65 ++++++++++++++++++++++++----
>>   drivers/infiniband/sw/rxe/rxe_net.c  | 51 +++++++++++++++++-----
>>   drivers/infiniband/sw/rxe/rxe_recv.c |  1 +
>>   3 files changed, 98 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_icrc.c b/drivers/infiniband/sw/rxe/rxe_icrc.c
>> index c9aa0995e900..393391863350 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_icrc.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_icrc.c
>> @@ -63,7 +63,7 @@ static __be32 rxe_crc32(struct rxe_dev *rxe, __be32 crc, void *next, size_t len)
>>     /**
>>    * rxe_icrc_hdr() - Compute the partial ICRC for the network and transport
>> - *          headers of a packet.
>> + *            headers of a packet.
>>    * @skb: packet buffer
>>    * @pkt: packet information
>>    *
>> @@ -129,6 +129,56 @@ static __be32 rxe_icrc_hdr(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>       return crc;
>>   }
>>   +/**
>> + * rxe_icrc_payload() - Compute the ICRC for a packet payload and also
>> + *            compute the address of the icrc in the packet.
>> + * @skb: packet buffer
>> + * @pkt: packet information
>> + * @icrc: current icrc i.e. including headers
>> + * @icrcp: returned pointer to icrc in skb
>> + *
>> + * Return: 0 if the values match else an error
>> + */
>> +static __be32 rxe_icrc_payload(struct sk_buff *skb, struct rxe_pkt_info *pkt,
>> +                   __be32 icrc, __be32 **icrcp)
>> +{
>> +    struct skb_shared_info *shinfo = skb_shinfo(skb);
>> +    skb_frag_t *frag;
>> +    u8 *addr;
>> +    int hdr_len;
>> +    int len;
>> +    int i;
>> +
>> +    /* handle any payload left in the linear buffer */
>> +    hdr_len = rxe_opcode[pkt->opcode].length;
>> +    addr = pkt->hdr + hdr_len;
>> +    len = skb_tail_pointer(skb) - skb_transport_header(skb)
>> +        - sizeof(struct udphdr) - hdr_len;
>> +    if (!shinfo->nr_frags) {
>> +        len -= RXE_ICRC_SIZE;
>> +        *icrcp = (__be32 *)(addr + len);
>> +    }
>> +    if (len > 0)
>> +        icrc = rxe_crc32(pkt->rxe, icrc, payload_addr(pkt), len);
>> +    WARN_ON(len < 0);
>> +
>> +    /* handle any payload in frags */
>> +    for (i = 0; i < shinfo->nr_frags; i++) {
>> +        frag = &shinfo->frags[i];
>> +        addr = page_to_virt(frag->bv_page) + frag->bv_offset;
>> +        len = frag->bv_len;
>> +        if (i == shinfo->nr_frags - 1) {
>> +            len -= RXE_ICRC_SIZE;
>> +            *icrcp = (__be32 *)(addr + len);
>> +        }
>> +        if (len > 0)
>> +            icrc = rxe_crc32(pkt->rxe, icrc, addr, len);
>> +        WARN_ON(len < 0);
>> +    }
>> +
>> +    return icrc;
>> +}
>> +
>>   /**
>>    * rxe_icrc_check() - Compute ICRC for a packet and compare to the ICRC
>>    *              delivered in the packet.
>> @@ -143,13 +193,11 @@ int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>       __be32 pkt_icrc;
>>       __be32 icrc;
>>   -    icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
>> -    pkt_icrc = *icrcp;
>> -
>>       icrc = rxe_icrc_hdr(skb, pkt);
>> -    icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
>> -                payload_size(pkt) + pkt->pad);
>> +    icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
>> +
>>       icrc = ~icrc;
>> +    pkt_icrc = *icrcp;
>>         if (unlikely(icrc != pkt_icrc))
>>           return -EINVAL;
>> @@ -167,9 +215,8 @@ void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>       __be32 *icrcp;
>>       __be32 icrc;
>>   -    icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
>>       icrc = rxe_icrc_hdr(skb, pkt);
>> -    icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
>> -                payload_size(pkt) + pkt->pad);
>> +    icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
>> +
>>       *icrcp = ~icrc;
>>   }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>> index c44ef39010f1..c43f9dd3ae6e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>> @@ -148,33 +148,53 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>>       struct udphdr *udph;
>>       struct rxe_dev *rxe;
>>       struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
>> +    u8 opcode;
>> +    u8 buf[1];
>> +    u8 *p;
> 
> opcode and *p duplicate.
> You can use only one variable.
opcode is u8 p is *u8.
> u8 *opcode;
>>         /* takes a reference on rxe->ib_dev
>>        * drop when skb is freed
>>        */
>>       rxe = get_rxe_from_skb(skb);
>>       if (!rxe)
>> -        goto drop;
>> +        goto err_drop;
>>   -    if (skb_linearize(skb)) {
>> -        ib_device_put(&rxe->ib_dev);
>> -        goto drop;
>> +    /* Get bth opcode out of skb, it may be in a fragment */
>> +    p = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
>> +    if (!p)
>> +        goto err_device_put;
>> +    opcode = *p;
> 
> 
>     opcode = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
>     if (!opcode)
>         goto err_device_put;
> ;
>> +
>> +    /* If using fragmented skbs make sure roce headers
>> +     * are in linear buffer else make skb linear
>> +     */
>> +    if (rxe_use_sg && skb_is_nonlinear(skb)) {
>> +        int delta = rxe_opcode[opcode].length -
> 
>         int delta = rxe_opcode[(*opcode)].length -
> 
>> +            (skb_headlen(skb) - sizeof(struct udphdr));
>> +
>> +        if (delta > 0 && !__pskb_pull_tail(skb, delta))
>> +            goto err_device_put;
>> +    } else {
>> +        if (skb_linearize(skb))
>> +            goto err_device_put;
>>       }
>>         udph = udp_hdr(skb);
>>       pkt->rxe = rxe;
>>       pkt->port_num = 1;
>>       pkt->hdr = (u8 *)(udph + 1);
>> -    pkt->mask = RXE_GRH_MASK;
>> +    pkt->mask = rxe_opcode[opcode].mask | RXE_GRH_MASK;
> 
> <..>
> 
> Zhu Yanjun
>>       pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
>>   -    /* remove udp header */
>>       skb_pull(skb, sizeof(struct udphdr));
>>         rxe_rcv(skb);
>>         return 0;
>> -drop:
>> +
>> +err_device_put:
>> +    ib_device_put(&rxe->ib_dev);
>> +err_drop:
>>       kfree_skb(skb);
>>         return 0;
>> @@ -446,24 +466,35 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>    */
>>   static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>   {
>> -    memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
>> +    struct rxe_pkt_info *newpkt;
>> +    int err;
>>   +    /* make loopback line up with rxe_udp_encap_recv */
>>       if (skb->protocol == htons(ETH_P_IP))
>>           skb_pull(skb, sizeof(struct iphdr));
>>       else
>>           skb_pull(skb, sizeof(struct ipv6hdr));
>> +    skb_reset_transport_header(skb);
>> +
>> +    newpkt = SKB_TO_PKT(skb);
>> +    memcpy(newpkt, pkt, sizeof(*newpkt));
>> +    newpkt->hdr = skb_transport_header(skb) + sizeof(struct udphdr);
>>         if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev))) {
>>           kfree_skb(skb);
>> -        return -EIO;
>> +        err = -EINVAL;
>> +        goto drop;
>>       }
>>         /* remove udp header */
>>       skb_pull(skb, sizeof(struct udphdr));
>>         rxe_rcv(skb);
>> -
>>       return 0;
>> +
>> +drop:
>> +    kfree_skb(skb);
>> +    return err;
>>   }
>>     int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
>> index f912a913f89a..940197199252 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>> @@ -338,6 +338,7 @@ void rxe_rcv(struct sk_buff *skb)
>>       if (unlikely(err))
>>           goto drop;
>>   +    /* skb->data points at UDP header */
>>       err = rxe_icrc_check(skb, pkt);
>>       if (unlikely(err))
>>           goto drop;
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c to support frags
  2023-07-28 14:49     ` Bob Pearson
@ 2023-07-28 23:39       ` Zhu Yanjun
  0 siblings, 0 replies; 20+ messages in thread
From: Zhu Yanjun @ 2023-07-28 23:39 UTC (permalink / raw)
  To: Bob Pearson, jgg, zyjzyj2000, linux-rdma, jhack, leon@kernel.org


在 2023/7/28 22:49, Bob Pearson 写道:
> On 7/28/23 09:20, Zhu Yanjun wrote:
>> 在 2023/7/28 4:01, Bob Pearson 写道:
>>> Extend the subroutines rxe_icrc_generate() and rxe_icrc_check()
>>> to support skb frags.
>>>
>>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>>> ---
>>>    drivers/infiniband/sw/rxe/rxe_icrc.c | 65 ++++++++++++++++++++++++----
>>>    drivers/infiniband/sw/rxe/rxe_net.c  | 51 +++++++++++++++++-----
>>>    drivers/infiniband/sw/rxe/rxe_recv.c |  1 +
>>>    3 files changed, 98 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_icrc.c b/drivers/infiniband/sw/rxe/rxe_icrc.c
>>> index c9aa0995e900..393391863350 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_icrc.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_icrc.c
>>> @@ -63,7 +63,7 @@ static __be32 rxe_crc32(struct rxe_dev *rxe, __be32 crc, void *next, size_t len)
>>>      /**
>>>     * rxe_icrc_hdr() - Compute the partial ICRC for the network and transport
>>> - *          headers of a packet.
>>> + *            headers of a packet.
>>>     * @skb: packet buffer
>>>     * @pkt: packet information
>>>     *
>>> @@ -129,6 +129,56 @@ static __be32 rxe_icrc_hdr(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>>        return crc;
>>>    }
>>>    +/**
>>> + * rxe_icrc_payload() - Compute the ICRC for a packet payload and also
>>> + *            compute the address of the icrc in the packet.
>>> + * @skb: packet buffer
>>> + * @pkt: packet information
>>> + * @icrc: current icrc i.e. including headers
>>> + * @icrcp: returned pointer to icrc in skb
>>> + *
>>> + * Return: 0 if the values match else an error
>>> + */
>>> +static __be32 rxe_icrc_payload(struct sk_buff *skb, struct rxe_pkt_info *pkt,
>>> +                   __be32 icrc, __be32 **icrcp)
>>> +{
>>> +    struct skb_shared_info *shinfo = skb_shinfo(skb);
>>> +    skb_frag_t *frag;
>>> +    u8 *addr;
>>> +    int hdr_len;
>>> +    int len;
>>> +    int i;
>>> +
>>> +    /* handle any payload left in the linear buffer */
>>> +    hdr_len = rxe_opcode[pkt->opcode].length;
>>> +    addr = pkt->hdr + hdr_len;
>>> +    len = skb_tail_pointer(skb) - skb_transport_header(skb)
>>> +        - sizeof(struct udphdr) - hdr_len;
>>> +    if (!shinfo->nr_frags) {
>>> +        len -= RXE_ICRC_SIZE;
>>> +        *icrcp = (__be32 *)(addr + len);
>>> +    }
>>> +    if (len > 0)
>>> +        icrc = rxe_crc32(pkt->rxe, icrc, payload_addr(pkt), len);
>>> +    WARN_ON(len < 0);
>>> +
>>> +    /* handle any payload in frags */
>>> +    for (i = 0; i < shinfo->nr_frags; i++) {
>>> +        frag = &shinfo->frags[i];
>>> +        addr = page_to_virt(frag->bv_page) + frag->bv_offset;
>>> +        len = frag->bv_len;
>>> +        if (i == shinfo->nr_frags - 1) {
>>> +            len -= RXE_ICRC_SIZE;
>>> +            *icrcp = (__be32 *)(addr + len);
>>> +        }
>>> +        if (len > 0)
>>> +            icrc = rxe_crc32(pkt->rxe, icrc, addr, len);
>>> +        WARN_ON(len < 0);
>>> +    }
>>> +
>>> +    return icrc;
>>> +}
>>> +
>>>    /**
>>>     * rxe_icrc_check() - Compute ICRC for a packet and compare to the ICRC
>>>     *              delivered in the packet.
>>> @@ -143,13 +193,11 @@ int rxe_icrc_check(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>>        __be32 pkt_icrc;
>>>        __be32 icrc;
>>>    -    icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
>>> -    pkt_icrc = *icrcp;
>>> -
>>>        icrc = rxe_icrc_hdr(skb, pkt);
>>> -    icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
>>> -                payload_size(pkt) + pkt->pad);
>>> +    icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
>>> +
>>>        icrc = ~icrc;
>>> +    pkt_icrc = *icrcp;
>>>          if (unlikely(icrc != pkt_icrc))
>>>            return -EINVAL;
>>> @@ -167,9 +215,8 @@ void rxe_icrc_generate(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>>        __be32 *icrcp;
>>>        __be32 icrc;
>>>    -    icrcp = (__be32 *)(pkt->hdr + pkt->paylen - RXE_ICRC_SIZE);
>>>        icrc = rxe_icrc_hdr(skb, pkt);
>>> -    icrc = rxe_crc32(pkt->rxe, icrc, (u8 *)payload_addr(pkt),
>>> -                payload_size(pkt) + pkt->pad);
>>> +    icrc = rxe_icrc_payload(skb, pkt, icrc, &icrcp);
>>> +
>>>        *icrcp = ~icrc;
>>>    }
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
>>> index c44ef39010f1..c43f9dd3ae6e 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_net.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_net.c
>>> @@ -148,33 +148,53 @@ static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>>>        struct udphdr *udph;
>>>        struct rxe_dev *rxe;
>>>        struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
>>> +    u8 opcode;
>>> +    u8 buf[1];
>>> +    u8 *p;
>> opcode and *p duplicate.
>> You can use only one variable.
> opcode is u8 p is *u8.


I mean that you can use one variable, (for example,opcode) can complete 
the same functionality with the 2 variables (p and opcode).

Zhu Yanjun


>> u8 *opcode;
>>>          /* takes a reference on rxe->ib_dev
>>>         * drop when skb is freed
>>>         */
>>>        rxe = get_rxe_from_skb(skb);
>>>        if (!rxe)
>>> -        goto drop;
>>> +        goto err_drop;
>>>    -    if (skb_linearize(skb)) {
>>> -        ib_device_put(&rxe->ib_dev);
>>> -        goto drop;
>>> +    /* Get bth opcode out of skb, it may be in a fragment */
>>> +    p = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
>>> +    if (!p)
>>> +        goto err_device_put;
>>> +    opcode = *p;
>>
>>      opcode = skb_header_pointer(skb, sizeof(struct udphdr), 1, buf);
>>      if (!opcode)
>>          goto err_device_put;
>> ;
>>> +
>>> +    /* If using fragmented skbs make sure roce headers
>>> +     * are in linear buffer else make skb linear
>>> +     */
>>> +    if (rxe_use_sg && skb_is_nonlinear(skb)) {
>>> +        int delta = rxe_opcode[opcode].length -
>>          int delta = rxe_opcode[(*opcode)].length -
>>
>>> +            (skb_headlen(skb) - sizeof(struct udphdr));
>>> +
>>> +        if (delta > 0 && !__pskb_pull_tail(skb, delta))
>>> +            goto err_device_put;
>>> +    } else {
>>> +        if (skb_linearize(skb))
>>> +            goto err_device_put;
>>>        }
>>>          udph = udp_hdr(skb);
>>>        pkt->rxe = rxe;
>>>        pkt->port_num = 1;
>>>        pkt->hdr = (u8 *)(udph + 1);
>>> -    pkt->mask = RXE_GRH_MASK;
>>> +    pkt->mask = rxe_opcode[opcode].mask | RXE_GRH_MASK;
>> <..>
>>
>> Zhu Yanjun
>>>        pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
>>>    -    /* remove udp header */
>>>        skb_pull(skb, sizeof(struct udphdr));
>>>          rxe_rcv(skb);
>>>          return 0;
>>> -drop:
>>> +
>>> +err_device_put:
>>> +    ib_device_put(&rxe->ib_dev);
>>> +err_drop:
>>>        kfree_skb(skb);
>>>          return 0;
>>> @@ -446,24 +466,35 @@ static int rxe_send(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>>     */
>>>    static int rxe_loopback(struct sk_buff *skb, struct rxe_pkt_info *pkt)
>>>    {
>>> -    memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
>>> +    struct rxe_pkt_info *newpkt;
>>> +    int err;
>>>    +    /* make loopback line up with rxe_udp_encap_recv */
>>>        if (skb->protocol == htons(ETH_P_IP))
>>>            skb_pull(skb, sizeof(struct iphdr));
>>>        else
>>>            skb_pull(skb, sizeof(struct ipv6hdr));
>>> +    skb_reset_transport_header(skb);
>>> +
>>> +    newpkt = SKB_TO_PKT(skb);
>>> +    memcpy(newpkt, pkt, sizeof(*newpkt));
>>> +    newpkt->hdr = skb_transport_header(skb) + sizeof(struct udphdr);
>>>          if (WARN_ON(!ib_device_try_get(&pkt->rxe->ib_dev))) {
>>>            kfree_skb(skb);
>>> -        return -EIO;
>>> +        err = -EINVAL;
>>> +        goto drop;
>>>        }
>>>          /* remove udp header */
>>>        skb_pull(skb, sizeof(struct udphdr));
>>>          rxe_rcv(skb);
>>> -
>>>        return 0;
>>> +
>>> +drop:
>>> +    kfree_skb(skb);
>>> +    return err;
>>>    }
>>>      int rxe_xmit_packet(struct rxe_qp *qp, struct rxe_pkt_info *pkt,
>>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c
>>> index f912a913f89a..940197199252 100644
>>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>>> @@ -338,6 +338,7 @@ void rxe_rcv(struct sk_buff *skb)
>>>        if (unlikely(err))
>>>            goto drop;
>>>    +    /* skb->data points at UDP header */
>>>        err = rxe_icrc_check(skb, pkt);
>>>        if (unlikely(err))
>>>            goto drop;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe
  2023-07-27 20:01 ` [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe Bob Pearson
@ 2023-08-15 19:07   ` Jason Gunthorpe
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2023-08-15 19:07 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma, jhack

On Thu, Jul 27, 2023 at 03:01:29PM -0500, Bob Pearson wrote:
> Make changes to enable sg code in rxe.
> 
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>  drivers/infiniband/sw/rxe/rxe.c     | 4 ++--
>  drivers/infiniband/sw/rxe/rxe_req.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe.c b/drivers/infiniband/sw/rxe/rxe.c
> index 800e8c0d437d..b52dd1704e74 100644
> --- a/drivers/infiniband/sw/rxe/rxe.c
> +++ b/drivers/infiniband/sw/rxe/rxe.c
> @@ -14,9 +14,9 @@ MODULE_DESCRIPTION("Soft RDMA transport");
>  MODULE_LICENSE("Dual BSD/GPL");
>  
>  /* if true allow using fragmented skbs */
> -bool rxe_use_sg;
> +bool rxe_use_sg = true;
>  module_param_named(use_sg, rxe_use_sg, bool, 0444);
> -MODULE_PARM_DESC(use_sg, "Support skb frags; default false");
> +MODULE_PARM_DESC(use_sg, "Support skb frags; default true");

I would like to avoid the module option - is it necessary? Shouldn't
frags always be better?

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets
  2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
                   ` (10 preceding siblings ...)
  2023-07-28  0:40 ` [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Zhu Yanjun
@ 2023-08-15 19:08 ` Jason Gunthorpe
  11 siblings, 0 replies; 20+ messages in thread
From: Jason Gunthorpe @ 2023-08-15 19:08 UTC (permalink / raw)
  To: Bob Pearson; +Cc: zyjzyj2000, linux-rdma, jhack

On Thu, Jul 27, 2023 at 03:01:19PM -0500, Bob Pearson wrote:
> This patch set is a revised version of an older set which implements 
> support for nonlinear or fragmented packets. This avoids extra copies
> in both the send and receive paths and gives significant performance
> improvement for large messages such as are used in storage applications.
> 
> This patch set has been heavily tested at large system scale and
> demonstrated a 2X improvement in file system read performance on
> a 200 Gb/sec network.
> 
> The patch set is rebased to the current for-next branch with the
> following previous patch sets applied:
> 	RDMA/rxe: Fix incomplete state save in rxe_requester
> 	RDMA/rxe: Misc fixes and cleanups
> 	Enable rcu locking of verbs objects
> 	RDMA/rxe: Misc cleanups
> 
> Bob Pearson (10):
>   RDMA/rxe: Add sg fragment ops
>   RDMA/rxe: Extend rxe_mr_copy to support skb frags
>   RDMA/rxe: Extend copy_data to support skb frags
>   RDMA/rxe: Extend rxe_init_packet() to support frags
>   RDMA/rxe: Extend rxe_icrc.c to support frags
>   RDMA/rxe: Extend rxe_init_req_packet() for frags
>   RDMA/rxe: Extend response packets for frags
>   RDMA/rxe: Extend send/write_data_in() for frags
>   RDMA/rxe: Extend do_read() in rxe_comp.c for frags
>   RDMA/rxe: Enable sg code in rxe

This does not apply to the tree so it will have to be rebased and
resent, it looked OK other than the module option question

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-08-15 19:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-27 20:01 [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 01/10] RDMA/rxe: Add sg fragment ops Bob Pearson
2023-07-28  1:07   ` Zhu Yanjun
2023-07-28  1:49     ` Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 02/10] RDMA/rxe: Extend rxe_mr_copy to support skb frags Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 03/10] RDMA/rxe: Extend copy_data " Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 04/10] RDMA/rxe: Extend rxe_init_packet() to support frags Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 05/10] RDMA/rxe: Extend rxe_icrc.c " Bob Pearson
2023-07-28 14:20   ` Zhu Yanjun
2023-07-28 14:49     ` Bob Pearson
2023-07-28 23:39       ` Zhu Yanjun
2023-07-27 20:01 ` [PATCH for-next v3 06/10] RDMA/rxe: Extend rxe_init_req_packet() for frags Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 07/10] RDMA/rxe: Extend response packets " Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 08/10] RDMA/rxe: Extend send/write_data_in() " Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 09/10] RDMA/rxe: Extend do_read() in rxe_comp.c " Bob Pearson
2023-07-27 20:01 ` [PATCH for-next v3 10/10] RDMA/rxe: Enable sg code in rxe Bob Pearson
2023-08-15 19:07   ` Jason Gunthorpe
2023-07-28  0:40 ` [PATCH for-next v3 00/10] RDMA/rxe: Implement support for nonlinear packets Zhu Yanjun
2023-07-28  1:54   ` Bob Pearson
2023-08-15 19:08 ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox