public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* RFC: a first draft of a generic RDMA READ/WRITE API
@ 2016-02-27 18:10 Christoph Hellwig
  2016-02-27 18:10 ` [PATCH 04/13] IB/core: refactor ib_create_qp Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: swise, sagig, target-devel

This series contains patches that implement a first version of a generic
API to handle RDMA READ/WRITE operations as commonly used on the target
(or server) side for storage protocols.

This has been developed for the upcoming NVMe over Fabrics target, and
extensively teѕted as part of that, although this upstream version has
additional updates over the one we're currently using.

It hides details such as the use of MRs for iWarp devices, and will allow
looking at other HCA specifics easily in the future.

This series contains a RFC conversion of the iSER target, although that
has hacky and probably not working signature MR support, otherwise I
would have actually proposed it for inclusion.  The iSER conversion
also contains my conversion to the CQ API, which now landed іn the
target tree in an updated version.  It is only included for conveniance.

One thing that I could not wrap my head around is support for the
current SRP target - SRP allows for multiple rkeys in a single request,
and the SRP target driver wants to map those to a single struct
scatterlist.  If anyone has ideas how to support that nicely: ideas
are welcome.

I also have a git tree available at:

	git://git.infradead.org/users/hch/rdma.git rdma-rw-api

Gitweb:

	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-rw-api

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/13] IB/cma: pass the port number to ib_create_qp
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

The new RW API will need this.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/cma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 9729639..a791522 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -800,6 +800,7 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
 	if (id->device != pd->device)
 		return -EINVAL;
 
+	qp_init_attr->port_num = id->port_num;
 	qp = ib_create_qp(pd, qp_init_attr);
 	if (IS_ERR(qp))
 		return PTR_ERR(qp);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-02-27 18:10   ` [PATCH 01/13] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
       [not found]     ` <1456596631-19418-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-02-27 18:10   ` [PATCH 03/13] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/verbs.c             | 22 +++++++++++-----------
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  7 +++----
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |  5 ++---
 drivers/infiniband/hw/cxgb4/mem.c           |  7 +++----
 drivers/infiniband/hw/mlx4/mlx4_ib.h        |  5 ++---
 drivers/infiniband/hw/mlx4/mr.c             |  7 +++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h        |  5 ++---
 drivers/infiniband/hw/mlx5/mr.c             |  7 +++----
 drivers/infiniband/hw/nes/nes_verbs.c       |  7 +++----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  7 +++----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  5 ++---
 drivers/infiniband/hw/qib/qib_mr.c          |  7 +++----
 drivers/infiniband/hw/qib/qib_verbs.h       |  5 ++---
 drivers/infiniband/ulp/iser/iser_memory.c   |  4 ++--
 drivers/infiniband/ulp/isert/ib_isert.c     |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c         |  2 +-
 include/rdma/ib_verbs.h                     | 23 +++++++++--------------
 net/rds/iw_rdma.c                           |  2 +-
 net/rds/iw_send.c                           |  2 +-
 net/sunrpc/xprtrdma/frwr_ops.c              |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c     |  2 +-
 21 files changed, 59 insertions(+), 76 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 5af6d02..5aa1e0b 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1557,6 +1557,7 @@ EXPORT_SYMBOL(ib_check_mr_status);
  * @mr:            memory region
  * @sg:            dma mapped scatterlist
  * @sg_nents:      number of entries in sg
+ * @sg_offset:     offset in byes into sg
  * @page_size:     page vector desired page size
  *
  * Constraints:
@@ -1573,17 +1574,15 @@ EXPORT_SYMBOL(ib_check_mr_status);
  * After this completes successfully, the  memory region
  * is ready for registration.
  */
-int ib_map_mr_sg(struct ib_mr *mr,
-		 struct scatterlist *sg,
-		 int sg_nents,
-		 unsigned int page_size)
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size)
 {
 	if (unlikely(!mr->device->map_mr_sg))
 		return -ENOSYS;
 
 	mr->page_size = page_size;
 
-	return mr->device->map_mr_sg(mr, sg, sg_nents);
+	return mr->device->map_mr_sg(mr, sg, sg_nents, sg_offset);
 }
 EXPORT_SYMBOL(ib_map_mr_sg);
 
@@ -1593,6 +1592,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * @mr:            memory region
  * @sgl:           dma mapped scatterlist
  * @sg_nents:      number of entries in sg
+ * @sg_offset:     offset in byes into sg
  * @set_page:      driver page assignment function pointer
  *
  * Core service helper for drivers to convert the largest
@@ -1603,10 +1603,8 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * Returns the number of sg elements that were assigned to
  * a page vector.
  */
-int ib_sg_to_pages(struct ib_mr *mr,
-		   struct scatterlist *sgl,
-		   int sg_nents,
-		   int (*set_page)(struct ib_mr *, u64))
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64))
 {
 	struct scatterlist *sg;
 	u64 last_end_dma_addr = 0;
@@ -1618,8 +1616,8 @@ int ib_sg_to_pages(struct ib_mr *mr,
 	mr->length = 0;
 
 	for_each_sg(sgl, sg, sg_nents, i) {
-		u64 dma_addr = sg_dma_address(sg);
-		unsigned int dma_len = sg_dma_len(sg);
+		u64 dma_addr = sg_dma_address(sg) + sg_offset;
+		unsigned int dma_len = sg_dma_len(sg) - sg_offset;
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
 
@@ -1652,6 +1650,8 @@ next_page:
 		mr->length += dma_len;
 		last_end_dma_addr = end_dma_addr;
 		last_page_off = end_dma_addr & ~page_mask;
+
+		sg_offset = 0;
 	}
 
 	return i;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 2734820..a9b8ed5 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -782,15 +782,14 @@ static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-static int iwch_map_mr_sg(struct ib_mr *ibmr,
-			  struct scatterlist *sg,
-			  int sg_nents)
+static int iwch_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+		int sg_nents, unsigned sg_offset)
 {
 	struct iwch_mr *mhp = to_iwch_mr(ibmr);
 
 	mhp->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, iwch_set_page);
 }
 
 static int iwch_destroy_qp(struct ib_qp *ib_qp)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index fb2de75..5b6b962 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -957,9 +957,8 @@ void c4iw_qp_rem_ref(struct ib_qp *qp);
 struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 			    enum ib_mr_type mr_type,
 			    u32 max_num_sg);
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
-		   struct scatterlist *sg,
-		   int sg_nents);
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
 struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 7849890..65c67ba 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -686,15 +686,14 @@ static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
-		   struct scatterlist *sg,
-		   int sg_nents)
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
 
 	mhp->mpl_len = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, c4iw_set_page);
 }
 
 int c4iw_dereg_mr(struct ib_mr *ib_mr)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 52ce7b0..e38cc44 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -716,9 +716,8 @@ int mlx4_ib_dealloc_mw(struct ib_mw *mw);
 struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 242b94e..32c387d 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -526,9 +526,8 @@ static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents)
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct mlx4_ib_mr *mr = to_mmr(ibmr);
 	int rc;
@@ -539,7 +538,7 @@ int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
 				   sizeof(u64) * mr->max_pages,
 				   DMA_TO_DEVICE);
 
-	rc = ib_sg_to_pages(ibmr, sg, sg_nents, mlx4_set_page);
+	rc = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, mlx4_set_page);
 
 	ib_dma_sync_single_for_device(ibmr->device, mr->page_map,
 				      sizeof(u64) * mr->max_pages,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d2b9737..2d05bb5 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -654,9 +654,8 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
 struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
 			const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 6000f7a..2afcb61 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1450,9 +1450,8 @@ static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents)
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
 	int n;
@@ -1463,7 +1462,7 @@ int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
 				   mr->desc_size * mr->max_descs,
 				   DMA_TO_DEVICE);
 
-	n = ib_sg_to_pages(ibmr, sg, sg_nents, mlx5_set_page);
+	n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, mlx5_set_page);
 
 	ib_dma_sync_single_for_device(ibmr->device, mr->desc_map,
 				      mr->desc_size * mr->max_descs,
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 8c4daf7..8c85b84 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -401,15 +401,14 @@ static int nes_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-static int nes_map_mr_sg(struct ib_mr *ibmr,
-			 struct scatterlist *sg,
-			 int sg_nents)
+static int nes_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+		int sg_nents, unsigned int sg_offset)
 {
 	struct nes_mr *nesmr = to_nesmr(ibmr);
 
 	nesmr->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, nes_set_page);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 37620b4..555127a 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -3077,13 +3077,12 @@ static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
-		     struct scatterlist *sg,
-		     int sg_nents)
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
 
 	mr->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, ocrdma_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, ocrdma_set_page);
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 8b517fd..b290e5d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -122,8 +122,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
 struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_num_sg);
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
-		     struct scatterlist *sg,
-		     int sg_nents);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned sg_offset);
 
 #endif				/* __OCRDMA_VERBS_H__ */
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 5f53304..c5fb4dd 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -315,15 +315,14 @@ static int qib_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int qib_map_mr_sg(struct ib_mr *ibmr,
-		  struct scatterlist *sg,
-		  int sg_nents)
+int qib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct qib_mr *mr = to_imr(ibmr);
 
 	mr->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, qib_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, qib_set_page);
 }
 
 /**
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 6c5e777..5067cac 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1042,9 +1042,8 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
 			   enum ib_mr_type mr_type,
 			   u32 max_entries);
 
-int qib_map_mr_sg(struct ib_mr *ibmr,
-		  struct scatterlist *sg,
-		  int sg_nents);
+int qib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 
 int qib_reg_mr(struct qib_qp *qp, struct ib_reg_wr *wr);
 
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 9a391cc..44cc85f 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -236,7 +236,7 @@ int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
 	page_vec->npages = 0;
 	page_vec->fake_mr.page_size = SIZE_4K;
 	plen = ib_sg_to_pages(&page_vec->fake_mr, mem->sg,
-			      mem->size, iser_set_page);
+			      mem->size, 0, iser_set_page);
 	if (unlikely(plen < mem->size)) {
 		iser_err("page vec too short to hold this SG\n");
 		iser_data_buf_dump(mem, device->ib_device);
@@ -446,7 +446,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
 
 	ib_update_fast_reg_key(mr, ib_inc_rkey(mr->rkey));
 
-	n = ib_map_mr_sg(mr, mem->sg, mem->size, SIZE_4K);
+	n = ib_map_mr_sg(mr, mem->sg, mem->size, 0, SIZE_4K);
 	if (unlikely(n != mem->size)) {
 		iser_err("failed to map sg (%d/%d)\n",
 			 n, mem->size);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index f121e61..7c7ad3a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2561,7 +2561,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 		wr = &inv_wr;
 	}
 
-	n = ib_map_mr_sg(mr, mem->sg, mem->nents, PAGE_SIZE);
+	n = ib_map_mr_sg(mr, mem->sg, mem->nents, 0, PAGE_SIZE);
 	if (unlikely(n != mem->nents)) {
 		isert_err("failed to map mr sg (%d/%d)\n",
 			 n, mem->nents);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 03022f6..60b169a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1365,7 +1365,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	rkey = ib_inc_rkey(desc->mr->rkey);
 	ib_update_fast_reg_key(desc->mr, rkey);
 
-	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, dev->mr_page_size);
+	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, 0, dev->mr_page_size);
 	if (unlikely(n < 0))
 		return n;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 284b00c..2114929 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1806,7 +1806,8 @@ struct ib_device {
 					       u32 max_num_sg);
 	int                        (*map_mr_sg)(struct ib_mr *mr,
 						struct scatterlist *sg,
-						int sg_nents);
+						int sg_nents,
+						unsigned sg_offset);
 	struct ib_mw *             (*alloc_mw)(struct ib_pd *pd,
 					       enum ib_mw_type type);
 	int                        (*dealloc_mw)(struct ib_mw *mw);
@@ -3070,28 +3071,22 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
 					    u16 pkey, const union ib_gid *gid,
 					    const struct sockaddr *addr);
 
-int ib_map_mr_sg(struct ib_mr *mr,
-		 struct scatterlist *sg,
-		 int sg_nents,
-		 unsigned int page_size);
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size);
 
 static inline int
-ib_map_mr_sg_zbva(struct ib_mr *mr,
-		  struct scatterlist *sg,
-		  int sg_nents,
-		  unsigned int page_size)
+ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size)
 {
 	int n;
 
-	n = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+	n = ib_map_mr_sg(mr, sg, sg_nents, sg_offset, page_size);
 	mr->iova = 0;
 
 	return n;
 }
 
-int ib_sg_to_pages(struct ib_mr *mr,
-		   struct scatterlist *sgl,
-		   int sg_nents,
-		   int (*set_page)(struct ib_mr *, u64));
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64));
 
 #endif /* IB_VERBS_H */
diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index b09a40c..7ce9a92 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -666,7 +666,7 @@ static int rds_iw_rdma_reg_mr(struct rds_iw_mapping *mapping)
 	struct ib_send_wr *failed_wr;
 	int ret, n;
 
-	n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, PAGE_SIZE);
+	n = ib_map_mr_sg_zbva(ibmr->mr, m_sg->list, m_sg->len, 0, PAGE_SIZE);
 	if (unlikely(n != m_sg->len))
 		return n < 0 ? n : -EINVAL;
 
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index e20bd50..1ba3c57 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -767,7 +767,7 @@ static int rds_iw_build_send_reg(struct rds_iw_send_work *send,
 {
 	int n;
 
-	n = ib_map_mr_sg(send->s_mr, sg, sg_nents, PAGE_SIZE);
+	n = ib_map_mr_sg(send->s_mr, sg, sg_nents, 0, PAGE_SIZE);
 	if (unlikely(n != sg_nents))
 		return n < 0 ? n : -EINVAL;
 
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e165673..4ffdcb2 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -384,7 +384,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 		return -ENOMEM;
 	}
 
-	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("RPC:       %s: failed to map mr %p (%u/%u)\n",
 		       __func__, frmr->fr_mr, n, frmr->sg_nents);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index c8b8a8b..1244a62 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -281,7 +281,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
 	}
 	atomic_inc(&xprt->sc_dma_used);
 
-	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
 		       frmr->mr, n, frmr->sg_nents);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/13] IB/core: add a helper to check for READ WITH INVALIDATE support
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-02-27 18:10   ` [PATCH 01/13] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 05/13] IB/core: add a simple MR pool Christoph Hellwig
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 include/rdma/ib_verbs.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2114929..267f11e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2286,6 +2286,15 @@ static inline bool rdma_cap_roce_gid_table(const struct ib_device *device,
 		device->add_gid && device->del_gid;
 }
 
+/*
+ * Check if the device supports READ W/ INVALIDATE.
+ */
+static inline bool rdma_has_read_invalidate(struct ib_device *dev, u32 port_num)
+{
+	/* iWarp requires READ W/ INVALIDATE.  No other device supports it yet */
+	return rdma_protocol_iwarp(dev, port_num);
+}
+
 int ib_query_gid(struct ib_device *device,
 		 u8 port_num, int index, union ib_gid *gid,
 		 struct ib_gid_attr *attr);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/13] IB/core: refactor ib_create_qp
  2016-02-27 18:10 RFC: a first draft of a generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-02-27 18:10 ` Christoph Hellwig
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: swise, sagig, target-devel

Split the XRC magic into a separate function, and return early on failure
to make the initialization code readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/verbs.c | 103 +++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 5aa1e0b..61131f8 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -723,62 +723,67 @@ struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
 }
 EXPORT_SYMBOL(ib_open_qp);
 
+static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
+		struct ib_qp_init_attr *qp_init_attr)
+{
+	struct ib_qp *real_qp = qp;
+
+	qp->event_handler = __ib_shared_qp_event_handler;
+	qp->qp_context = qp;
+	qp->pd = NULL;
+	qp->send_cq = qp->recv_cq = NULL;
+	qp->srq = NULL;
+	qp->xrcd = qp_init_attr->xrcd;
+	atomic_inc(&qp_init_attr->xrcd->usecnt);
+	INIT_LIST_HEAD(&qp->open_list);
+
+	qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
+			  qp_init_attr->qp_context);
+	if (IS_ERR(qp))
+		real_qp->device->destroy_qp(real_qp);
+	else
+		__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
+	return qp;
+}
+
 struct ib_qp *ib_create_qp(struct ib_pd *pd,
 			   struct ib_qp_init_attr *qp_init_attr)
 {
-	struct ib_qp *qp, *real_qp;
-	struct ib_device *device;
+	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
+	struct ib_qp *qp;
 
-	device = pd ? pd->device : qp_init_attr->xrcd->device;
 	qp = device->create_qp(pd, qp_init_attr, NULL);
-
-	if (!IS_ERR(qp)) {
-		qp->device     = device;
-		qp->real_qp    = qp;
-		qp->uobject    = NULL;
-		qp->qp_type    = qp_init_attr->qp_type;
-
-		atomic_set(&qp->usecnt, 0);
-		if (qp_init_attr->qp_type == IB_QPT_XRC_TGT) {
-			qp->event_handler = __ib_shared_qp_event_handler;
-			qp->qp_context = qp;
-			qp->pd = NULL;
-			qp->send_cq = qp->recv_cq = NULL;
-			qp->srq = NULL;
-			qp->xrcd = qp_init_attr->xrcd;
-			atomic_inc(&qp_init_attr->xrcd->usecnt);
-			INIT_LIST_HEAD(&qp->open_list);
-
-			real_qp = qp;
-			qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
-					  qp_init_attr->qp_context);
-			if (!IS_ERR(qp))
-				__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
-			else
-				real_qp->device->destroy_qp(real_qp);
-		} else {
-			qp->event_handler = qp_init_attr->event_handler;
-			qp->qp_context = qp_init_attr->qp_context;
-			if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
-				qp->recv_cq = NULL;
-				qp->srq = NULL;
-			} else {
-				qp->recv_cq = qp_init_attr->recv_cq;
-				atomic_inc(&qp_init_attr->recv_cq->usecnt);
-				qp->srq = qp_init_attr->srq;
-				if (qp->srq)
-					atomic_inc(&qp_init_attr->srq->usecnt);
-			}
-
-			qp->pd	    = pd;
-			qp->send_cq = qp_init_attr->send_cq;
-			qp->xrcd    = NULL;
-
-			atomic_inc(&pd->usecnt);
-			atomic_inc(&qp_init_attr->send_cq->usecnt);
-		}
+	if (IS_ERR(qp))
+		return qp;
+
+	qp->device     = device;
+	qp->real_qp    = qp;
+	qp->uobject    = NULL;
+	qp->qp_type    = qp_init_attr->qp_type;
+
+	atomic_set(&qp->usecnt, 0);
+	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
+		return ib_create_xrc_qp(qp, qp_init_attr);
+
+	qp->event_handler = qp_init_attr->event_handler;
+	qp->qp_context = qp_init_attr->qp_context;
+	if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
+		qp->recv_cq = NULL;
+		qp->srq = NULL;
+	} else {
+		qp->recv_cq = qp_init_attr->recv_cq;
+		atomic_inc(&qp_init_attr->recv_cq->usecnt);
+		qp->srq = qp_init_attr->srq;
+		if (qp->srq)
+			atomic_inc(&qp_init_attr->srq->usecnt);
 	}
 
+	qp->pd	    = pd;
+	qp->send_cq = qp_init_attr->send_cq;
+	qp->xrcd    = NULL;
+
+	atomic_inc(&pd->usecnt);
+	atomic_inc(&qp_init_attr->send_cq->usecnt);
 	return qp;
 }
 EXPORT_SYMBOL(ib_create_qp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/13] IB/core: add a simple MR pool
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (2 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 03/13] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-03-02  2:48     ` Parav Pandit
  2016-02-27 18:10   ` [PATCH 07/13] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/Makefile  |  2 +-
 drivers/infiniband/core/mr_pool.c | 84 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c   |  5 +++
 include/rdma/ib_verbs.h           |  9 ++++-
 include/rdma/mr_pool.h            | 25 ++++++++++++
 5 files changed, 123 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/mr_pool.c
 create mode 100644 include/rdma/mr_pool.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index f818538..48bd9d8 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 
 ib_core-y :=			packer.o ud_header.o verbs.o cq.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
-				roce_gid_mgmt.o
+				roce_gid_mgmt.o mr_pool.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/mr_pool.c b/drivers/infiniband/core/mr_pool.c
new file mode 100644
index 0000000..b1eb27a
--- /dev/null
+++ b/drivers/infiniband/core/mr_pool.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <rdma/ib_verbs.h>
+#include <rdma/mr_pool.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
+{
+	struct ib_mr *mr = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
+	if (mr)
+		qp->mrs_used++;
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+
+	return mr;
+}
+EXPORT_SYMBOL(ib_mr_pool_get);
+
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	list_add(&mr->qp_entry, list);
+	qp->mrs_used--;
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_put);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+		enum ib_mr_type type, u32 max_num_sg)
+{
+	struct ib_mr *mr;
+	unsigned long flags;
+	int ret, i;
+
+	for (i = 0; i < nr; i++) {
+		mr = ib_alloc_mr(qp->pd, type, max_num_sg);
+		if (IS_ERR(mr)) {
+			ret = PTR_ERR(mr);
+			goto out;
+		}
+
+		spin_lock_irqsave(&qp->mr_lock, flags);
+		list_add_tail(&mr->qp_entry, list);
+		spin_unlock_irqrestore(&qp->mr_lock, flags);
+	}
+
+	return 0;
+out:
+	ib_mr_pool_destroy(qp, list);
+	return ret;
+}
+EXPORT_SYMBOL(ib_mr_pool_init);
+
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list)
+{
+	struct ib_mr *mr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	while (!list_empty(list)) {
+		mr = list_first_entry(list, struct ib_mr, qp_entry);
+		list_del(&mr->qp_entry);
+
+		spin_unlock_irqrestore(&qp->mr_lock, flags);
+		ib_dereg_mr(mr);
+		spin_lock_irqsave(&qp->mr_lock, flags);
+	}
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_destroy);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 61131f8..9a77bb8 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -762,6 +762,9 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	qp->qp_type    = qp_init_attr->qp_type;
 
 	atomic_set(&qp->usecnt, 0);
+	qp->mrs_used = 0;
+	spin_lock_init(&qp->mr_lock);
+
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
 
@@ -1255,6 +1258,8 @@ int ib_destroy_qp(struct ib_qp *qp)
 	struct ib_srq *srq;
 	int ret;
 
+	WARN_ON_ONCE(qp->mrs_used > 0);
+
 	if (atomic_read(&qp->usecnt))
 		return -EBUSY;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 267f11e..1e68dae 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1408,6 +1408,10 @@ struct ib_qp {
 	struct ib_srq	       *srq;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
+
+	spinlock_t		mr_lock;
+	int			mrs_used;
+
 	/* count times opened, mcast attaches, flow attaches */
 	atomic_t		usecnt;
 	struct list_head	open_list;
@@ -1422,12 +1426,15 @@ struct ib_qp {
 struct ib_mr {
 	struct ib_device  *device;
 	struct ib_pd	  *pd;
-	struct ib_uobject *uobject;
 	u32		   lkey;
 	u32		   rkey;
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
+	union {
+		struct ib_uobject	*uobject;	/* user */
+		struct list_head	qp_entry;	/* FR */
+	};
 };
 
 struct ib_mw {
diff --git a/include/rdma/mr_pool.h b/include/rdma/mr_pool.h
new file mode 100644
index 0000000..986010b
--- /dev/null
+++ b/include/rdma/mr_pool.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_MR_POOL_H
+#define _RDMA_MR_POOL_H 1
+
+#include <rdma/ib_verbs.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list);
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+		enum ib_mr_type type, u32 max_num_sg);
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list);
+
+#endif /* _RDMA_MR_POOL_H */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr
  2016-02-27 18:10 RFC: a first draft of a generic RDMA READ/WRITE API Christoph Hellwig
  2016-02-27 18:10 ` [PATCH 04/13] IB/core: refactor ib_create_qp Christoph Hellwig
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-27 18:10 ` Christoph Hellwig
  2016-02-28 15:10   ` Sagi Grimberg
  2016-02-27 18:10 ` [PATCH 11/13] IB/isert: the kill ->isert_cmd back pointer in the struct iser_tx_desc Christoph Hellwig
  2016-02-27 18:10 ` [PATCH 13/13] IB/isert: RW API WIP Christoph Hellwig
  4 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: swise, sagig, target-devel, Steve Wise

From: Steve Wise <swise@chelsio.com>

This is the first step toward moving MR invalidation decisions
to the core.  It will be needed by the upcoming RW API.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 include/rdma/ib_verbs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1e68dae..2b94cea 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1431,6 +1431,7 @@ struct ib_mr {
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
+	bool		   need_inval;
 	union {
 		struct ib_uobject	*uobject;	/* user */
 		struct list_head	qp_entry;	/* FR */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/13] IB/core: generic RDMA READ/WRITE API
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (3 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 05/13] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-28 15:05     ` Sagi Grimberg
  2016-02-27 18:10   ` [PATCH 08/13] IB/isert: properly type the login buffer Christoph Hellwig
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

This supports both manual mapping of lots of SGEs, as well as using MRs
from the QP's MR pool, for iWarp or other cases where it's more optimal.
For now, MRs are only used for iWARP transports.  The user of the RDMA-RW
API must allocate the QP MR pool as well as size the SQ accordingly.

Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
and to Sagi Grimberg for ideas, reviews and fixes.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/Makefile |   2 +-
 drivers/infiniband/core/rw.c     | 393 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c  |  25 +++
 include/rdma/ib_verbs.h          |  14 +-
 include/rdma/rw.h                |  80 ++++++++
 5 files changed, 512 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/rw.c
 create mode 100644 include/rdma/rw.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 48bd9d8..26987d9 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 					$(user_access-y)
 
-ib_core-y :=			packer.o ud_header.o verbs.o cq.o sysfs.o \
+ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
new file mode 100644
index 0000000..69c3ca5
--- /dev/null
+++ b/drivers/infiniband/core/rw.c
@@ -0,0 +1,393 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/slab.h>
+#include <rdma/mr_pool.h>
+#include <rdma/rw.h>
+
+/*
+ * Check if the device needs a memory registration.  We currently always use
+ * memory registrations for iWarp, and never for IB and RoCE.  In the future
+ * we can hopefully fine tune this based on HCA driver input.
+ */
+static inline bool rdma_rw_use_mr(struct ib_device *dev, u8 port_num)
+{
+	return rdma_protocol_iwarp(dev, port_num);
+}
+
+static inline u32 rdma_rw_max_sge(struct rdma_rw_ctx *ctx,
+		struct ib_device *dev)
+{
+	return ctx->dma_dir == DMA_TO_DEVICE ?
+		dev->attrs.max_sge : dev->attrs.max_sge_rd;
+}
+
+static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u64 remote_addr, u32 rkey)
+{
+	struct ib_device *dev = qp->pd->device;
+	struct ib_rdma_wr *rdma_wr = &ctx->single.wr;
+
+	ctx->nr_ops = 1;
+
+	ctx->single.sge.lkey = qp->pd->local_dma_lkey;
+	ctx->single.sge.addr = ib_sg_dma_address(dev, ctx->sg);
+	ctx->single.sge.length = ib_sg_dma_len(dev, ctx->sg);
+
+	memset(rdma_wr, 0, sizeof(*rdma_wr));
+	rdma_wr->wr.opcode = ctx->dma_dir == DMA_TO_DEVICE ?
+			IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;
+	rdma_wr->wr.sg_list = &ctx->single.sge;
+	rdma_wr->wr.num_sge = 1;
+	rdma_wr->remote_addr = remote_addr;
+	rdma_wr->rkey = rkey;
+
+	return 1;
+}
+
+static int rdma_rw_build_sg_list(struct rdma_rw_ctx *ctx, struct ib_pd *pd,
+		struct ib_sge *sge, u32 data_left, u32 offset)
+{
+	u32 first_sg_index = offset / PAGE_SIZE;
+	u32 sg_nents = min(ctx->dma_nents - first_sg_index,
+			   rdma_rw_max_sge(ctx, pd->device));
+	u32 page_off = offset % PAGE_SIZE;
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(ctx->sg + first_sg_index, sg, sg_nents, i) {
+		sge->addr = ib_sg_dma_address(pd->device, sg) + page_off;
+		sge->length = min_t(u32, data_left,
+				ib_sg_dma_len(pd->device, sg) - page_off);
+		sge->lkey = pd->local_dma_lkey;
+
+		data_left -= sge->length;
+		if (!data_left)
+			break;
+
+		sge++;
+		page_off = 0;
+	}
+
+	return i + 1;
+}
+
+static int rdma_rw_init_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u64 remote_addr, u32 rkey, u32 length, u32 page_off)
+{
+	u32 max_sge = rdma_rw_max_sge(ctx, qp->pd->device);
+	u32 rdma_write_max = max_sge * PAGE_SIZE;
+	struct ib_sge *sge;
+	u32 va_offset = 0, i;
+
+	ctx->map.sges = sge =
+		kcalloc(ctx->dma_nents, sizeof(*ctx->map.sges), GFP_KERNEL);
+	if (!ctx->map.sges)
+		goto out;
+
+	ctx->nr_ops = DIV_ROUND_UP(ctx->dma_nents, max_sge);
+	ctx->map.wrs = kcalloc(ctx->nr_ops, sizeof(*ctx->map.wrs), GFP_KERNEL);
+	if (!ctx->map.wrs)
+		goto out_free_sges;
+
+	for (i = 0; i < ctx->nr_ops; i++) {
+		struct ib_rdma_wr *rdma_wr = &ctx->map.wrs[i];
+		u32 data_len = min(length - va_offset, rdma_write_max);
+
+		if (ctx->dma_dir == DMA_TO_DEVICE)
+			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+		else
+			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+		rdma_wr->wr.sg_list = sge;
+		rdma_wr->wr.num_sge = rdma_rw_build_sg_list(ctx, qp->pd, sge,
+				data_len, page_off + va_offset);
+		rdma_wr->remote_addr = remote_addr + va_offset;
+		rdma_wr->rkey = rkey;
+
+		if (i + 1 != ctx->nr_ops)
+			rdma_wr->wr.next = &ctx->map.wrs[i + 1].wr;
+
+		sge += rdma_wr->wr.num_sge;
+		va_offset += data_len;
+	}
+
+	return ctx->nr_ops;
+
+out_free_sges:
+	kfree(ctx->map.sges);
+out:
+	return -ENOMEM;
+}
+
+static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, u64 remote_addr, u32 rkey, u32 page_off)
+{
+	int pages_per_mr = qp->pd->device->attrs.max_fast_reg_page_list_len;
+	int pages_left = ctx->dma_nents;
+	struct scatterlist *sg = ctx->sg;
+	u32 va_offset = 0;
+	int i, ret = 0, count = 0;
+
+	ctx->nr_ops = (ctx->dma_nents + pages_per_mr - 1) / pages_per_mr;
+	ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
+	if (!ctx->reg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < ctx->nr_ops; i++) {
+		struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
+		struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
+		int nents = min(pages_left, pages_per_mr);
+
+		reg->mr = ib_mr_pool_get(qp, &qp->rdma_mrs);
+		if (!reg->mr) {
+			pr_info("failed to allocate MR from pool\n");
+			ret = -EAGAIN;
+			goto out_free;
+		}
+
+		if (reg->mr->need_inval) {
+			reg->inv_wr.opcode = IB_WR_LOCAL_INV;
+			reg->inv_wr.ex.invalidate_rkey = reg->mr->lkey;
+			reg->inv_wr.next = &reg->reg_wr.wr;
+			if (prev)
+				prev->wr.wr.next = &reg->inv_wr;
+
+			count++;
+		} else if (prev) {
+			prev->wr.wr.next = &reg->reg_wr.wr;
+		}
+
+		ib_update_fast_reg_key(reg->mr, ib_inc_rkey(reg->mr->lkey));
+
+		ret = ib_map_mr_sg(reg->mr, sg, nents, page_off,
+				PAGE_SIZE);
+		if (ret < nents) {
+			pr_info("failed to map MR\n");
+			ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
+			ret = -EINVAL;
+			goto out_free;
+		}
+
+		reg->reg_wr.wr.opcode = IB_WR_REG_MR;
+		reg->reg_wr.mr = reg->mr;
+		reg->reg_wr.key = reg->mr->lkey;
+		reg->reg_wr.wr.next = &reg->wr.wr;
+		count++;
+
+		reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
+		if (rdma_protocol_iwarp(qp->device, port_num))
+			reg->reg_wr.access |= IB_ACCESS_REMOTE_WRITE;
+
+		reg->sge.lkey = reg->mr->lkey;
+		reg->sge.addr = reg->mr->iova;
+		reg->sge.length = reg->mr->length;
+
+		reg->wr.wr.sg_list = &reg->sge;
+		reg->wr.wr.num_sge = 1;
+		reg->wr.remote_addr = remote_addr + va_offset;
+		reg->wr.rkey = rkey;
+		count++;
+
+		if (ctx->dma_dir == DMA_FROM_DEVICE) {
+			if (rdma_has_read_invalidate(qp->device, port_num)) {
+				reg->wr.wr.opcode = IB_WR_RDMA_READ_WITH_INV;
+				reg->wr.wr.ex.invalidate_rkey = reg->mr->lkey;
+				reg->mr->need_inval = false;
+			}  else {
+				reg->wr.wr.opcode = IB_WR_RDMA_READ;
+				reg->mr->need_inval = true;
+			}
+		} else {
+			reg->wr.wr.opcode = IB_WR_RDMA_WRITE;
+			reg->mr->need_inval = true;
+		}
+
+		va_offset += reg->sge.length;
+		pages_left -= nents;
+		sg += nents;
+	}
+
+	return count;
+
+out_free:
+	while (--i >= 0)
+		ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+	kfree(ctx->reg);
+out:
+	return ret;
+}
+
+/**
+ * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
+ * @ctx:	context to initialize
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @sg:		scatterlist to READ/WRITE from/to
+ * @nents:	number of entries in @sg
+ * @total_len:	total length of @sg in bytes
+ * @remote_addr:remote address to read/write (relative to @rkey)
+ * @rkey:	remote key to operate on
+ * @dir:	%DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ * @offset:	current byte offset into @sg
+ *
+ * If we're going to use a FR to map this context @max_nents should be smaller
+ * or equal to the MR size.
+ *
+ * Returns the number of WQEs that will be needed on the workqueue if
+ * successful, or a negative error code.
+ */
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 nents, u32 total_len,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir,
+		u32 offset)
+{
+	struct ib_device *dev = qp->pd->device;
+	u32 first_sg_index = offset / PAGE_SIZE;
+	u32 page_off = offset % PAGE_SIZE;
+	int ret = -ENOMEM;
+
+	ctx->sg = sg + first_sg_index;
+	ctx->dma_dir = dir;
+
+	ctx->orig_nents = nents - first_sg_index;
+	ctx->dma_nents =
+		ib_dma_map_sg(dev, ctx->sg, ctx->orig_nents, ctx->dma_dir);
+	if (!ctx->dma_nents)
+		goto out;
+
+	if (rdma_rw_use_mr(qp->device, port_num))
+		ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, remote_addr, rkey,
+				page_off);
+	else if (ctx->dma_nents == 1)
+		ret = rdma_rw_init_single_wr(ctx, qp, remote_addr, rkey);
+	else
+		ret = rdma_rw_init_wrs(ctx, qp, remote_addr, rkey,
+				total_len - offset, page_off);
+
+	if (ret < 0)
+		goto out_unmap_sg;
+
+	return ret;
+
+out_unmap_sg:
+	ib_dma_unmap_sg(dev, ctx->sg, ctx->orig_nents, ctx->dma_dir);
+out:
+	return ret;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_init);
+
+/**
+ * rdma_rw_ctx_destroy - release all resources allocated by rdma_rw_ctx_init
+ * @ctx:	context to release
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ */
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num)
+{
+	if (rdma_rw_use_mr(qp->device, port_num)) {
+		int i;
+
+		for (i = 0; i < ctx->nr_ops; i++)
+			ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+		kfree(ctx->reg);
+	} else if (ctx->dma_nents > 1) {
+		kfree(ctx->map.wrs);
+		kfree(ctx->map.sges);
+	}
+
+	ib_dma_unmap_sg(qp->pd->device, ctx->sg, ctx->orig_nents, ctx->dma_dir);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_destroy);
+
+/**
+ * rdma_rw_ctx_post - post a RDMA READ or RDMA WRITE operation
+ * @ctx:	context to operate on
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @cqe:	completion queue entry for the last WR
+ * @chain_wr:	WR to append to the posted chain
+ *
+ * Post the set of RDMA READ/WRITE operations described by @ctx, as well as
+ * any memory registration operations needed.  If @chain_wr is non-NULL the
+ * WR it points to will be appended to the chain of WRs posted.  If @chain_wr
+ * is not set @cqe must be set so that the caller gets a completion
+ * notification.
+ */
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+	struct ib_send_wr *first_wr, *last_wr, *bad_wr;
+
+	if (rdma_rw_use_mr(qp->device, port_num)) {
+		if (ctx->reg[0].inv_wr.next)
+			first_wr = &ctx->reg[0].inv_wr;
+		else
+			first_wr = &ctx->reg[0].reg_wr.wr;
+		last_wr = &ctx->reg[ctx->nr_ops - 1].wr.wr;
+	} else if (ctx->dma_nents == 1) {
+		first_wr = &ctx->single.wr.wr;
+		last_wr = &ctx->single.wr.wr;
+	} else {
+		first_wr = &ctx->map.wrs[0].wr;
+		last_wr = &ctx->map.wrs[ctx->nr_ops - 1].wr;
+	}
+
+	if (chain_wr) {
+		last_wr->next = chain_wr;
+	} else {
+		last_wr->wr_cqe = cqe;
+		last_wr->send_flags |= IB_SEND_SIGNALED;
+	}
+
+	return ib_post_send(qp, first_wr, &bad_wr);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_post);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
+{
+	/*
+	 * Each context needs at least one RDMA READ or WRITE WR.
+	 *
+	 * For some hardware we might need more, eventually we should ask the
+	 * HCA driver for a multiplier here.
+	 */
+	attr->cap.max_send_wr += attr->cap.max_rdma_ctxs;
+
+	/*
+	 * If the devices needs MRs to perform RDMA READ or WRITE operations,
+	 * we'll need two additional MRs for the registrations and the
+	 * invalidation.
+	 */
+	if (rdma_rw_use_mr(dev, attr->port_num))
+		attr->cap.max_send_wr += 2 * attr->cap.max_rdma_ctxs;
+}
+
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
+{
+	struct ib_device *dev = qp->pd->device;
+	int ret = 0;
+
+	if (rdma_rw_use_mr(dev, attr->port_num)) {
+		ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
+				attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
+				dev->attrs.max_fast_reg_page_list_len);
+	}
+
+	return ret;
+}
+
+void rdma_rw_cleanup_mrs(struct ib_qp *qp)
+{
+	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
+}
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 9a77bb8..1ef3a1a 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -48,6 +48,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
 #include <rdma/ib_addr.h>
+#include <rdma/rw.h>
 
 #include "core_priv.h"
 
@@ -751,6 +752,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 {
 	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
 	struct ib_qp *qp;
+	int ret;
+
+	/*
+	 * If the callers is using the RDMA API calculate the resources
+	 * needed for the RDMA READ/WRITE operations.
+	 *
+	 * Note that these callers need to pass in a port number.
+	 */
+	if (qp_init_attr->cap.max_rdma_ctxs)
+		rdma_rw_init_qp(device, qp_init_attr);
 
 	qp = device->create_qp(pd, qp_init_attr, NULL);
 	if (IS_ERR(qp))
@@ -764,6 +775,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	atomic_set(&qp->usecnt, 0);
 	qp->mrs_used = 0;
 	spin_lock_init(&qp->mr_lock);
+	INIT_LIST_HEAD(&qp->rdma_mrs);
 
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
@@ -787,6 +799,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 
 	atomic_inc(&pd->usecnt);
 	atomic_inc(&qp_init_attr->send_cq->usecnt);
+
+	if (qp_init_attr->cap.max_rdma_ctxs) {
+		ret = rdma_rw_init_mrs(qp, qp_init_attr);
+		if (ret) {
+			pr_err("failed to init MR pool ret= %d\n", ret);
+			ib_destroy_qp(qp);
+			qp = ERR_PTR(ret);
+		}
+	}
+
 	return qp;
 }
 EXPORT_SYMBOL(ib_create_qp);
@@ -1271,6 +1293,9 @@ int ib_destroy_qp(struct ib_qp *qp)
 	rcq  = qp->recv_cq;
 	srq  = qp->srq;
 
+	if (!qp->uobject)
+		rdma_rw_cleanup_mrs(qp);
+
 	ret = qp->device->destroy_qp(qp);
 	if (!ret) {
 		if (pd)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2b94cea..035585a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -915,6 +915,13 @@ struct ib_qp_cap {
 	u32	max_send_sge;
 	u32	max_recv_sge;
 	u32	max_inline_data;
+
+	/*
+	 * Maximum number of rdma_rw_ctx structures in flight at a time.
+	 * ib_create_qp() will calculate the right amount of neededed WRs
+	 * and MRs based on this.
+	 */
+	u32	max_rdma_ctxs;
 };
 
 enum ib_sig_type {
@@ -986,7 +993,11 @@ struct ib_qp_init_attr {
 	enum ib_sig_type	sq_sig_type;
 	enum ib_qp_type		qp_type;
 	enum ib_qp_create_flags	create_flags;
-	u8			port_num; /* special QP types only */
+
+	/*
+	 * Only needed for special QP types, or when using the RW API.
+	 */
+	u8			port_num;
 };
 
 struct ib_qp_open_attr {
@@ -1410,6 +1421,7 @@ struct ib_qp {
 	struct list_head	xrcd_list;
 
 	spinlock_t		mr_lock;
+	struct list_head	rdma_mrs;
 	int			mrs_used;
 
 	/* count times opened, mcast attaches, flow attaches */
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
new file mode 100644
index 0000000..cd0521f
--- /dev/null
+++ b/include/rdma/rw.h
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_RW_H
+#define _RDMA_RW_H
+
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/mr_pool.h>
+
+struct rdma_rw_ctx {
+	/*
+	 * The scatterlist passed in, and the number of entries and total
+	 * length operated on.  Note that these might be smaller than the
+	 * values originally passed in if an offset or max_nents value was
+	 * passed to rdma_rw_ctx_init.
+	 *
+	 * dma_nents is the value returned from dma_map_sg, which might be
+	 * smaller than the original value passed in.
+	 */
+	struct scatterlist     *sg;
+	u32			orig_nents;
+	u32			dma_nents;
+
+	/* data direction of the transfer */
+	enum dma_data_direction dma_dir;
+
+	/* number of RDMA READ/WRITE WRs (not counting MR WRs) */
+	int			nr_ops;
+
+	union {
+		/* for mapping a single SGE: */
+		struct {
+			struct ib_sge		sge;
+			struct ib_rdma_wr	wr;
+		} single;
+
+		/* for mapping of multiple SGEs: */
+		struct {
+			struct ib_sge		*sges;
+			struct ib_rdma_wr	*wrs;
+		} map;
+
+		/* for registering multiple WRs: */
+		struct rdma_rw_reg_ctx {
+			struct ib_sge		sge;
+			struct ib_rdma_wr	wr;
+			struct ib_reg_wr	reg_wr;
+			struct ib_send_wr	inv_wr;
+			struct ib_mr		*mr;
+		} *reg;
+	};
+};
+
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 nents, u32 length,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir,
+		u32 offset);
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num);
+
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
+void rdma_rw_cleanup_mrs(struct ib_qp *qp);
+
+#endif /* _RDMA_RW_H */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/13] IB/isert: properly type the login buffer
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (4 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 07/13] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 09/13] IB/isert: convert to new CQ API Christoph Hellwig
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

and separate the request and the response separately, as it's not in a
performance critical path anyway.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 52 ++++++++++++++++-----------------
 drivers/infiniband/ulp/isert/ib_isert.h |  3 +-
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 7c7ad3a..b3f953b 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -597,10 +597,12 @@ isert_free_login_buf(struct isert_conn *isert_conn)
 
 	ib_dma_unmap_single(ib_dev, isert_conn->login_rsp_dma,
 			    ISER_RX_LOGIN_SIZE, DMA_TO_DEVICE);
+	kfree(isert_conn->login_rsp_buf);
+
 	ib_dma_unmap_single(ib_dev, isert_conn->login_req_dma,
 			    ISCSI_DEF_MAX_RECV_SEG_LEN,
 			    DMA_FROM_DEVICE);
-	kfree(isert_conn->login_buf);
+	kfree(isert_conn->login_req_buf);
 }
 
 static int
@@ -609,50 +611,48 @@ isert_alloc_login_buf(struct isert_conn *isert_conn,
 {
 	int ret;
 
-	isert_conn->login_buf = kzalloc(ISCSI_DEF_MAX_RECV_SEG_LEN +
-					ISER_RX_LOGIN_SIZE, GFP_KERNEL);
-	if (!isert_conn->login_buf) {
-		isert_err("Unable to allocate isert_conn->login_buf\n");
+	isert_conn->login_req_buf =
+		kzalloc(ISCSI_DEF_MAX_RECV_SEG_LEN, GFP_KERNEL);
+	if (!isert_conn->login_req_buf) {
+		isert_err("Unable to allocate isert_conn->login_req_buf\n");
 		return -ENOMEM;
 	}
 
-	isert_conn->login_req_buf = isert_conn->login_buf;
-	isert_conn->login_rsp_buf = isert_conn->login_buf +
-				    ISCSI_DEF_MAX_RECV_SEG_LEN;
-
-	isert_dbg("Set login_buf: %p login_req_buf: %p login_rsp_buf: %p\n",
-		 isert_conn->login_buf, isert_conn->login_req_buf,
-		 isert_conn->login_rsp_buf);
-
 	isert_conn->login_req_dma = ib_dma_map_single(ib_dev,
-				(void *)isert_conn->login_req_buf,
+				isert_conn->login_req_buf,
 				ISCSI_DEF_MAX_RECV_SEG_LEN, DMA_FROM_DEVICE);
-
 	ret = ib_dma_mapping_error(ib_dev, isert_conn->login_req_dma);
 	if (ret) {
 		isert_err("login_req_dma mapping error: %d\n", ret);
 		isert_conn->login_req_dma = 0;
-		goto out_login_buf;
+		goto out_free_login_req_buf;
+	}
+
+	isert_conn->login_rsp_buf = kzalloc(ISER_RX_LOGIN_SIZE, GFP_KERNEL);
+	if (!isert_conn->login_rsp_buf) {
+		isert_err("Unable to allocate isert_conn->login_rspbuf\n");
+		goto out_unmap_login_req_buf;
 	}
 
 	isert_conn->login_rsp_dma = ib_dma_map_single(ib_dev,
-					(void *)isert_conn->login_rsp_buf,
+					isert_conn->login_rsp_buf,
 					ISER_RX_LOGIN_SIZE, DMA_TO_DEVICE);
-
 	ret = ib_dma_mapping_error(ib_dev, isert_conn->login_rsp_dma);
 	if (ret) {
 		isert_err("login_rsp_dma mapping error: %d\n", ret);
 		isert_conn->login_rsp_dma = 0;
-		goto out_req_dma_map;
+		goto out_free_login_rsp_buf;
 	}
 
 	return 0;
 
-out_req_dma_map:
+out_free_login_rsp_buf:
+	kfree(isert_conn->login_rsp_buf);
+out_unmap_login_req_buf:
 	ib_dma_unmap_single(ib_dev, isert_conn->login_req_dma,
 			    ISCSI_DEF_MAX_RECV_SEG_LEN, DMA_FROM_DEVICE);
-out_login_buf:
-	kfree(isert_conn->login_buf);
+out_free_login_req_buf:
+	kfree(isert_conn->login_req_buf);
 	return ret;
 }
 
@@ -773,7 +773,7 @@ isert_connect_release(struct isert_conn *isert_conn)
 		ib_destroy_qp(isert_conn->qp);
 	}
 
-	if (isert_conn->login_buf)
+	if (isert_conn->login_req_buf)
 		isert_free_login_buf(isert_conn);
 
 	isert_device_put(device);
@@ -1218,7 +1218,7 @@ post_send:
 static void
 isert_rx_login_req(struct isert_conn *isert_conn)
 {
-	struct iser_rx_desc *rx_desc = (void *)isert_conn->login_req_buf;
+	struct iser_rx_desc *rx_desc = isert_conn->login_req_buf;
 	int rx_buflen = isert_conn->login_req_len;
 	struct iscsi_conn *conn = isert_conn->conn;
 	struct iscsi_login *login = conn->conn_login;
@@ -1596,7 +1596,7 @@ isert_rcv_completion(struct iser_rx_desc *desc,
 	u64 rx_dma;
 	int rx_buflen;
 
-	if ((char *)desc == isert_conn->login_req_buf) {
+	if (desc == isert_conn->login_req_buf) {
 		rx_dma = isert_conn->login_req_dma;
 		rx_buflen = ISER_RX_LOGIN_SIZE;
 		isert_dbg("login_buf: Using rx_dma: 0x%llx, rx_buflen: %d\n",
@@ -1615,7 +1615,7 @@ isert_rcv_completion(struct iser_rx_desc *desc,
 		 hdr->opcode, hdr->itt, hdr->flags,
 		 (int)(xfer_len - ISER_HEADERS_LEN));
 
-	if ((char *)desc == isert_conn->login_req_buf) {
+	if (desc == isert_conn->login_req_buf) {
 		isert_conn->login_req_len = xfer_len - ISER_HEADERS_LEN;
 		if (isert_conn->conn) {
 			struct iscsi_login *login = isert_conn->conn->conn_login;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 8d50453..1f15ff9 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -184,8 +184,7 @@ struct isert_conn {
 	u32			initiator_depth;
 	bool			pi_support;
 	u32			max_sge;
-	char			*login_buf;
-	char			*login_req_buf;
+	struct iser_rx_desc	*login_req_buf;
 	char			*login_rsp_buf;
 	u64			login_req_dma;
 	int			login_req_len;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/13] IB/isert: convert to new CQ API
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (5 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 08/13] IB/isert: properly type the login buffer Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 10/13] IB/isert: kill struct isert_rdma_wr Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 12/13] IB/core: add a MR pool for signature MRs Christoph Hellwig
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

Use the workqueue based CQ type similar to what isert was using previously,
and properly split up the completion handlers.

Note that this also takes special care to handle the magic login WRs
separately, and also renames the submission functions so that it's clear
that they are only to be used for the login buffers.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 415 ++++++++++++++------------------
 drivers/infiniband/ulp/isert/ib_isert.h |  11 +-
 2 files changed, 179 insertions(+), 247 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index b3f953b..8ef25c6 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -59,12 +59,16 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 static int
 isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 static int
-isert_rdma_post_recvl(struct isert_conn *isert_conn);
+isert_login_post_recv(struct isert_conn *isert_conn);
 static int
 isert_rdma_accept(struct isert_conn *isert_conn);
 struct rdma_cm_id *isert_setup_id(struct isert_np *isert_np);
 
 static void isert_release_work(struct work_struct *work);
+static void isert_recv_done(struct ib_cq *cq, struct ib_wc *wc);
+static void isert_send_done(struct ib_cq *cq, struct ib_wc *wc);
+static void isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc);
+static void isert_login_send_done(struct ib_cq *cq, struct ib_wc *wc);
 
 static inline bool
 isert_prot_cmd(struct isert_conn *conn, struct se_cmd *cmd)
@@ -177,12 +181,6 @@ err:
 	return ret;
 }
 
-static void
-isert_cq_event_callback(struct ib_event *e, void *context)
-{
-	isert_dbg("event: %d\n", e->event);
-}
-
 static int
 isert_alloc_rx_descriptors(struct isert_conn *isert_conn)
 {
@@ -250,9 +248,6 @@ isert_free_rx_descriptors(struct isert_conn *isert_conn)
 	isert_conn->rx_descs = NULL;
 }
 
-static void isert_cq_work(struct work_struct *);
-static void isert_cq_callback(struct ib_cq *, void *);
-
 static void
 isert_free_comps(struct isert_device *device)
 {
@@ -261,10 +256,8 @@ isert_free_comps(struct isert_device *device)
 	for (i = 0; i < device->comps_used; i++) {
 		struct isert_comp *comp = &device->comps[i];
 
-		if (comp->cq) {
-			cancel_work_sync(&comp->work);
-			ib_destroy_cq(comp->cq);
-		}
+		if (comp->cq)
+			ib_free_cq(comp->cq);
 	}
 	kfree(device->comps);
 }
@@ -293,28 +286,17 @@ isert_alloc_comps(struct isert_device *device)
 	max_cqe = min(ISER_MAX_CQ_LEN, device->ib_device->attrs.max_cqe);
 
 	for (i = 0; i < device->comps_used; i++) {
-		struct ib_cq_init_attr cq_attr = {};
 		struct isert_comp *comp = &device->comps[i];
 
 		comp->device = device;
-		INIT_WORK(&comp->work, isert_cq_work);
-		cq_attr.cqe = max_cqe;
-		cq_attr.comp_vector = i;
-		comp->cq = ib_create_cq(device->ib_device,
-					isert_cq_callback,
-					isert_cq_event_callback,
-					(void *)comp,
-					&cq_attr);
+		comp->cq = ib_alloc_cq(device->ib_device, comp, max_cqe, i,
+				IB_POLL_WORKQUEUE);
 		if (IS_ERR(comp->cq)) {
 			isert_err("Unable to allocate cq\n");
 			ret = PTR_ERR(comp->cq);
 			comp->cq = NULL;
 			goto out_cq;
 		}
-
-		ret = ib_req_notify_cq(comp->cq, IB_CQ_NEXT_COMP);
-		if (ret)
-			goto out_cq;
 	}
 
 	return 0;
@@ -726,7 +708,7 @@ isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
 	if (ret)
 		goto out_conn_dev;
 
-	ret = isert_rdma_post_recvl(isert_conn);
+	ret = isert_login_post_recv(isert_conn);
 	if (ret)
 		goto out_conn_dev;
 
@@ -977,7 +959,10 @@ isert_post_recvm(struct isert_conn *isert_conn, u32 count)
 
 	for (rx_wr = isert_conn->rx_wr, i = 0; i < count; i++, rx_wr++) {
 		rx_desc = &isert_conn->rx_descs[i];
-		rx_wr->wr_id = (uintptr_t)rx_desc;
+
+		rx_desc->rx_cqe.done = isert_recv_done;
+
+		rx_wr->wr_cqe = &rx_desc->rx_cqe;
 		rx_wr->sg_list = &rx_desc->rx_sg;
 		rx_wr->num_sge = 1;
 		rx_wr->next = rx_wr + 1;
@@ -1002,7 +987,9 @@ isert_post_recv(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc)
 	struct ib_recv_wr *rx_wr_failed, rx_wr;
 	int ret;
 
-	rx_wr.wr_id = (uintptr_t)rx_desc;
+	rx_desc->rx_cqe.done = isert_recv_done;
+
+	rx_wr.wr_cqe = &rx_desc->rx_cqe;
 	rx_wr.sg_list = &rx_desc->rx_sg;
 	rx_wr.num_sge = 1;
 	rx_wr.next = NULL;
@@ -1018,7 +1005,7 @@ isert_post_recv(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc)
 }
 
 static int
-isert_post_send(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc)
+isert_login_post_send(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc)
 {
 	struct ib_device *ib_dev = isert_conn->cm_id->device;
 	struct ib_send_wr send_wr, *send_wr_failed;
@@ -1027,8 +1014,10 @@ isert_post_send(struct isert_conn *isert_conn, struct iser_tx_desc *tx_desc)
 	ib_dma_sync_single_for_device(ib_dev, tx_desc->dma_addr,
 				      ISER_HEADERS_LEN, DMA_TO_DEVICE);
 
+	tx_desc->tx_cqe.done = isert_login_send_done;
+
 	send_wr.next	= NULL;
-	send_wr.wr_id	= (uintptr_t)tx_desc;
+	send_wr.wr_cqe	= &tx_desc->tx_cqe;
 	send_wr.sg_list	= tx_desc->tx_sg;
 	send_wr.num_sge	= tx_desc->num_sge;
 	send_wr.opcode	= IB_WR_SEND;
@@ -1098,7 +1087,8 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 	struct iser_tx_desc *tx_desc = &isert_cmd->tx_desc;
 
 	isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND;
-	send_wr->wr_id = (uintptr_t)&isert_cmd->tx_desc;
+	tx_desc->tx_cqe.done = isert_send_done;
+	send_wr->wr_cqe = &tx_desc->tx_cqe;
 
 	if (isert_conn->snd_w_inv && isert_cmd->inv_rkey) {
 		send_wr->opcode  = IB_WR_SEND_WITH_INV;
@@ -1113,7 +1103,7 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 }
 
 static int
-isert_rdma_post_recvl(struct isert_conn *isert_conn)
+isert_login_post_recv(struct isert_conn *isert_conn)
 {
 	struct ib_recv_wr rx_wr, *rx_wr_fail;
 	struct ib_sge sge;
@@ -1127,8 +1117,10 @@ isert_rdma_post_recvl(struct isert_conn *isert_conn)
 	isert_dbg("Setup sge: addr: %llx length: %d 0x%08x\n",
 		sge.addr, sge.length, sge.lkey);
 
+	isert_conn->login_req_buf->rx_cqe.done = isert_login_recv_done;
+
 	memset(&rx_wr, 0, sizeof(struct ib_recv_wr));
-	rx_wr.wr_id = (uintptr_t)isert_conn->login_req_buf;
+	rx_wr.wr_cqe = &isert_conn->login_req_buf->rx_cqe;
 	rx_wr.sg_list = &sge;
 	rx_wr.num_sge = 1;
 
@@ -1203,12 +1195,12 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login,
 			goto post_send;
 		}
 
-		ret = isert_rdma_post_recvl(isert_conn);
+		ret = isert_login_post_recv(isert_conn);
 		if (ret)
 			return ret;
 	}
 post_send:
-	ret = isert_post_send(isert_conn, tx_desc);
+	ret = isert_login_post_send(isert_conn, tx_desc);
 	if (ret)
 		return ret;
 
@@ -1551,12 +1543,45 @@ isert_rx_opcode(struct isert_conn *isert_conn, struct iser_rx_desc *rx_desc,
 }
 
 static void
-isert_rx_do_work(struct iser_rx_desc *rx_desc, struct isert_conn *isert_conn)
+isert_print_wc(struct ib_wc *wc)
+{
+	if (wc->status != IB_WC_WR_FLUSH_ERR)
+		isert_err("%s (%d): cqe 0x%p vend_err %x\n",
+			  ib_wc_status_msg(wc->status), wc->status,
+			  wc->wr_cqe, wc->vendor_err);
+	else
+		isert_dbg("%s (%d): cqe 0x%p\n",
+			  ib_wc_status_msg(wc->status), wc->status,
+			  wc->wr_cqe);
+
+}
+
+static void
+isert_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+	struct isert_conn *isert_conn = wc->qp->qp_context;
+	struct ib_device *ib_dev = isert_conn->cm_id->device;
+	struct iser_rx_desc *rx_desc =
+		container_of(wc->wr_cqe, struct iser_rx_desc, rx_cqe);
+	struct iscsi_hdr *hdr = &rx_desc->iscsi_header;
 	struct iser_ctrl *iser_ctrl = &rx_desc->iser_header;
 	uint64_t read_va = 0, write_va = 0;
 	uint32_t read_stag = 0, write_stag = 0;
 
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		isert_print_wc(wc);
+		if (!--isert_conn->post_recv_buf_count)
+			iscsit_cause_connection_reinstatement(isert_conn->conn, 0);
+		return;
+	}
+
+	ib_dma_sync_single_for_cpu(ib_dev, rx_desc->dma_addr,
+			ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE);
+
+	isert_dbg("DMA: 0x%llx, iSCSI opcode: 0x%02x, ITT: 0x%08x, flags: 0x%02x dlen: %d\n",
+		 rx_desc->dma_addr, hdr->opcode, hdr->itt, hdr->flags,
+		 (int)(wc->byte_len - ISER_HEADERS_LEN));
+
 	switch (iser_ctrl->flags & 0xF0) {
 	case ISCSI_CTRL:
 		if (iser_ctrl->flags & ISER_RSV) {
@@ -1584,55 +1609,44 @@ isert_rx_do_work(struct iser_rx_desc *rx_desc, struct isert_conn *isert_conn)
 
 	isert_rx_opcode(isert_conn, rx_desc,
 			read_stag, read_va, write_stag, write_va);
+
+	ib_dma_sync_single_for_device(ib_dev, rx_desc->dma_addr,
+			ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE);
+
+	isert_conn->post_recv_buf_count--;
 }
 
 static void
-isert_rcv_completion(struct iser_rx_desc *desc,
-		     struct isert_conn *isert_conn,
-		     u32 xfer_len)
+isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+	struct isert_conn *isert_conn = wc->qp->qp_context;
 	struct ib_device *ib_dev = isert_conn->cm_id->device;
-	struct iscsi_hdr *hdr;
-	u64 rx_dma;
-	int rx_buflen;
-
-	if (desc == isert_conn->login_req_buf) {
-		rx_dma = isert_conn->login_req_dma;
-		rx_buflen = ISER_RX_LOGIN_SIZE;
-		isert_dbg("login_buf: Using rx_dma: 0x%llx, rx_buflen: %d\n",
-			 rx_dma, rx_buflen);
-	} else {
-		rx_dma = desc->dma_addr;
-		rx_buflen = ISER_RX_PAYLOAD_SIZE;
-		isert_dbg("req_buf: Using rx_dma: 0x%llx, rx_buflen: %d\n",
-			 rx_dma, rx_buflen);
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		isert_print_wc(wc);
+		if (!--isert_conn->post_recv_buf_count)
+			iscsit_cause_connection_reinstatement(isert_conn->conn, 0);
+		return;
 	}
 
-	ib_dma_sync_single_for_cpu(ib_dev, rx_dma, rx_buflen, DMA_FROM_DEVICE);
+	ib_dma_sync_single_for_cpu(ib_dev, isert_conn->login_req_dma,
+			ISER_RX_LOGIN_SIZE, DMA_FROM_DEVICE);
 
-	hdr = &desc->iscsi_header;
-	isert_dbg("iSCSI opcode: 0x%02x, ITT: 0x%08x, flags: 0x%02x dlen: %d\n",
-		 hdr->opcode, hdr->itt, hdr->flags,
-		 (int)(xfer_len - ISER_HEADERS_LEN));
+	isert_conn->login_req_len = wc->byte_len - ISER_HEADERS_LEN;
 
-	if (desc == isert_conn->login_req_buf) {
-		isert_conn->login_req_len = xfer_len - ISER_HEADERS_LEN;
-		if (isert_conn->conn) {
-			struct iscsi_login *login = isert_conn->conn->conn_login;
+	if (isert_conn->conn) {
+		struct iscsi_login *login = isert_conn->conn->conn_login;
 
-			if (login && !login->first_request)
-				isert_rx_login_req(isert_conn);
-		}
-		mutex_lock(&isert_conn->mutex);
-		complete(&isert_conn->login_req_comp);
-		mutex_unlock(&isert_conn->mutex);
-	} else {
-		isert_rx_do_work(desc, isert_conn);
+		if (login && !login->first_request)
+			isert_rx_login_req(isert_conn);
 	}
 
-	ib_dma_sync_single_for_device(ib_dev, rx_dma, rx_buflen,
-				      DMA_FROM_DEVICE);
+	mutex_lock(&isert_conn->mutex);
+	complete(&isert_conn->login_req_comp);
+	mutex_unlock(&isert_conn->mutex);
 
+	ib_dma_sync_single_for_device(ib_dev, isert_conn->login_req_dma,
+				ISER_RX_LOGIN_SIZE, DMA_FROM_DEVICE);
 	isert_conn->post_recv_buf_count--;
 }
 
@@ -1882,42 +1896,59 @@ fail_mr_status:
 }
 
 static void
-isert_completion_rdma_write(struct iser_tx_desc *tx_desc,
-			    struct isert_cmd *isert_cmd)
+isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 {
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = isert_cmd->conn;
+	struct isert_conn *isert_conn = wc->qp->qp_context;
 	struct isert_device *device = isert_conn->device;
+	struct iser_tx_desc *desc =
+		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
+	struct isert_cmd *isert_cmd = desc->isert_cmd;
+	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
+	struct se_cmd *cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	int ret = 0;
 
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		isert_print_wc(wc);
+		isert_completion_put(desc, isert_cmd, device->ib_device, true);
+		return;
+	}
+
+	isert_dbg("Cmd %p\n", isert_cmd);
+
 	if (wr->fr_desc && wr->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(se_cmd,
-					    wr->fr_desc->pi_ctx->sig_mr);
+		ret = isert_check_pi_status(cmd, wr->fr_desc->pi_ctx->sig_mr);
 		wr->fr_desc->ind &= ~ISERT_PROTECTED;
 	}
 
 	device->unreg_rdma_mem(isert_cmd, isert_conn);
 	wr->rdma_wr_num = 0;
 	if (ret)
-		transport_send_check_condition_and_sense(se_cmd,
-							 se_cmd->pi_err, 0);
+		transport_send_check_condition_and_sense(cmd, cmd->pi_err, 0);
 	else
-		isert_put_response(isert_conn->conn, cmd);
+		isert_put_response(isert_conn->conn, isert_cmd->iscsi_cmd);
 }
 
 static void
-isert_completion_rdma_read(struct iser_tx_desc *tx_desc,
-			   struct isert_cmd *isert_cmd)
+isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+	struct isert_conn *isert_conn = wc->qp->qp_context;
+	struct isert_device *device = isert_conn->device;
+	struct iser_tx_desc *desc =
+		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
+	struct isert_cmd *isert_cmd = desc->isert_cmd;
 	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = isert_cmd->conn;
-	struct isert_device *device = isert_conn->device;
 	int ret = 0;
 
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		isert_print_wc(wc);
+		isert_completion_put(desc, isert_cmd, device->ib_device, true);
+		return;
+	}
+
+	isert_dbg("Cmd %p\n", isert_cmd);
+
 	if (wr->fr_desc && wr->fr_desc->ind & ISERT_PROTECTED) {
 		ret = isert_check_pi_status(se_cmd,
 					    wr->fr_desc->pi_ctx->sig_mr);
@@ -1975,170 +2006,53 @@ isert_do_control_comp(struct work_struct *work)
 }
 
 static void
-isert_response_completion(struct iser_tx_desc *tx_desc,
-			  struct isert_cmd *isert_cmd,
-			  struct isert_conn *isert_conn,
-			  struct ib_device *ib_dev)
+isert_login_send_done(struct ib_cq *cq, struct ib_wc *wc)
 {
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-
-	if (cmd->i_state == ISTATE_SEND_TASKMGTRSP ||
-	    cmd->i_state == ISTATE_SEND_LOGOUTRSP ||
-	    cmd->i_state == ISTATE_SEND_REJECT ||
-	    cmd->i_state == ISTATE_SEND_TEXTRSP) {
-		isert_unmap_tx_desc(tx_desc, ib_dev);
+	struct isert_conn *isert_conn = wc->qp->qp_context;
+	struct ib_device *ib_dev = isert_conn->cm_id->device;
+	struct iser_tx_desc *tx_desc =
+		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
 
-		INIT_WORK(&isert_cmd->comp_work, isert_do_control_comp);
-		queue_work(isert_comp_wq, &isert_cmd->comp_work);
-		return;
-	}
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		isert_print_wc(wc);
 
-	cmd->i_state = ISTATE_SENT_STATUS;
-	isert_completion_put(tx_desc, isert_cmd, ib_dev, false);
+	isert_unmap_tx_desc(tx_desc, ib_dev);
 }
 
 static void
-isert_snd_completion(struct iser_tx_desc *tx_desc,
-		      struct isert_conn *isert_conn)
+isert_send_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+	struct isert_conn *isert_conn = wc->qp->qp_context;
 	struct ib_device *ib_dev = isert_conn->cm_id->device;
+	struct iser_tx_desc *tx_desc =
+		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
 	struct isert_cmd *isert_cmd = tx_desc->isert_cmd;
-	struct isert_rdma_wr *wr;
 
-	if (!isert_cmd) {
-		isert_unmap_tx_desc(tx_desc, ib_dev);
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		isert_print_wc(wc);
+		isert_completion_put(tx_desc, isert_cmd, ib_dev, true);
 		return;
 	}
-	wr = &isert_cmd->rdma_wr;
 
-	isert_dbg("Cmd %p iser_ib_op %d\n", isert_cmd, wr->iser_ib_op);
+	isert_dbg("Cmd %p\n", isert_cmd);
 
-	switch (wr->iser_ib_op) {
-	case ISER_IB_SEND:
-		isert_response_completion(tx_desc, isert_cmd,
-					  isert_conn, ib_dev);
-		break;
-	case ISER_IB_RDMA_WRITE:
-		isert_completion_rdma_write(tx_desc, isert_cmd);
-		break;
-	case ISER_IB_RDMA_READ:
-		isert_completion_rdma_read(tx_desc, isert_cmd);
-		break;
+	switch (isert_cmd->iscsi_cmd->i_state) {
+	case ISTATE_SEND_TASKMGTRSP:
+	case ISTATE_SEND_LOGOUTRSP:
+	case ISTATE_SEND_REJECT:
+	case ISTATE_SEND_TEXTRSP:
+		isert_unmap_tx_desc(tx_desc, ib_dev);
+
+		INIT_WORK(&isert_cmd->comp_work, isert_do_control_comp);
+		queue_work(isert_comp_wq, &isert_cmd->comp_work);
+		return;
 	default:
-		isert_err("Unknown wr->iser_ib_op: 0x%x\n", wr->iser_ib_op);
-		dump_stack();
+		isert_cmd->iscsi_cmd->i_state = ISTATE_SENT_STATUS;
+		isert_completion_put(tx_desc, isert_cmd, ib_dev, false);
 		break;
 	}
 }
 
-/**
- * is_isert_tx_desc() - Indicate if the completion wr_id
- *     is a TX descriptor or not.
- * @isert_conn: iser connection
- * @wr_id: completion WR identifier
- *
- * Since we cannot rely on wc opcode in FLUSH errors
- * we must work around it by checking if the wr_id address
- * falls in the iser connection rx_descs buffer. If so
- * it is an RX descriptor, otherwize it is a TX.
- */
-static inline bool
-is_isert_tx_desc(struct isert_conn *isert_conn, void *wr_id)
-{
-	void *start = isert_conn->rx_descs;
-	int len = ISERT_QP_MAX_RECV_DTOS * sizeof(*isert_conn->rx_descs);
-
-	if (wr_id >= start && wr_id < start + len)
-		return false;
-
-	return true;
-}
-
-static void
-isert_cq_comp_err(struct isert_conn *isert_conn, struct ib_wc *wc)
-{
-	if (wc->wr_id == ISER_BEACON_WRID) {
-		isert_info("conn %p completing wait_comp_err\n",
-			   isert_conn);
-		complete(&isert_conn->wait_comp_err);
-	} else if (is_isert_tx_desc(isert_conn, (void *)(uintptr_t)wc->wr_id)) {
-		struct ib_device *ib_dev = isert_conn->cm_id->device;
-		struct isert_cmd *isert_cmd;
-		struct iser_tx_desc *desc;
-
-		desc = (struct iser_tx_desc *)(uintptr_t)wc->wr_id;
-		isert_cmd = desc->isert_cmd;
-		if (!isert_cmd)
-			isert_unmap_tx_desc(desc, ib_dev);
-		else
-			isert_completion_put(desc, isert_cmd, ib_dev, true);
-	} else {
-		isert_conn->post_recv_buf_count--;
-		if (!isert_conn->post_recv_buf_count)
-			iscsit_cause_connection_reinstatement(isert_conn->conn, 0);
-	}
-}
-
-static void
-isert_handle_wc(struct ib_wc *wc)
-{
-	struct isert_conn *isert_conn;
-	struct iser_tx_desc *tx_desc;
-	struct iser_rx_desc *rx_desc;
-
-	isert_conn = wc->qp->qp_context;
-	if (likely(wc->status == IB_WC_SUCCESS)) {
-		if (wc->opcode == IB_WC_RECV) {
-			rx_desc = (struct iser_rx_desc *)(uintptr_t)wc->wr_id;
-			isert_rcv_completion(rx_desc, isert_conn, wc->byte_len);
-		} else {
-			tx_desc = (struct iser_tx_desc *)(uintptr_t)wc->wr_id;
-			isert_snd_completion(tx_desc, isert_conn);
-		}
-	} else {
-		if (wc->status != IB_WC_WR_FLUSH_ERR)
-			isert_err("%s (%d): wr id %llx vend_err %x\n",
-				  ib_wc_status_msg(wc->status), wc->status,
-				  wc->wr_id, wc->vendor_err);
-		else
-			isert_dbg("%s (%d): wr id %llx\n",
-				  ib_wc_status_msg(wc->status), wc->status,
-				  wc->wr_id);
-
-		if (wc->wr_id != ISER_FASTREG_LI_WRID)
-			isert_cq_comp_err(isert_conn, wc);
-	}
-}
-
-static void
-isert_cq_work(struct work_struct *work)
-{
-	enum { isert_poll_budget = 65536 };
-	struct isert_comp *comp = container_of(work, struct isert_comp,
-					       work);
-	struct ib_wc *const wcs = comp->wcs;
-	int i, n, completed = 0;
-
-	while ((n = ib_poll_cq(comp->cq, ARRAY_SIZE(comp->wcs), wcs)) > 0) {
-		for (i = 0; i < n; i++)
-			isert_handle_wc(&wcs[i]);
-
-		completed += n;
-		if (completed >= isert_poll_budget)
-			break;
-	}
-
-	ib_req_notify_cq(comp->cq, IB_CQ_NEXT_COMP);
-}
-
-static void
-isert_cq_callback(struct ib_cq *cq, void *context)
-{
-	struct isert_comp *comp = context;
-
-	queue_work(isert_comp_wq, &comp->work);
-}
-
 static int
 isert_post_response(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd)
 {
@@ -2395,7 +2309,8 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 	page_off = offset % PAGE_SIZE;
 
 	rdma_wr->wr.sg_list = ib_sge;
-	rdma_wr->wr.wr_id = (uintptr_t)&isert_cmd->tx_desc;
+	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
+
 	/*
 	 * Perform mapping of TCM scatterlist memory ib_sge dma_addr.
 	 */
@@ -2478,6 +2393,8 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
 		rdma_wr->wr.send_flags = 0;
 		if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
+			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+
 			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
 			rdma_wr->remote_addr = isert_cmd->read_va + offset;
 			rdma_wr->rkey = isert_cmd->read_stag;
@@ -2486,6 +2403,8 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 			else
 				rdma_wr->wr.next = &wr->rdma_wr[i + 1].wr;
 		} else {
+			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
+
 			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
 			rdma_wr->remote_addr = isert_cmd->write_va + va_offset;
 			rdma_wr->rkey = isert_cmd->write_stag;
@@ -2517,7 +2436,7 @@ isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
 	u32 rkey;
 
 	memset(inv_wr, 0, sizeof(*inv_wr));
-	inv_wr->wr_id = ISER_FASTREG_LI_WRID;
+	inv_wr->wr_cqe = NULL;
 	inv_wr->opcode = IB_WR_LOCAL_INV;
 	inv_wr->ex.invalidate_rkey = mr->rkey;
 
@@ -2573,7 +2492,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 
 	reg_wr.wr.next = NULL;
 	reg_wr.wr.opcode = IB_WR_REG_MR;
-	reg_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
+	reg_wr.wr.wr_cqe = NULL;
 	reg_wr.wr.send_flags = 0;
 	reg_wr.wr.num_sge = 0;
 	reg_wr.mr = mr;
@@ -2684,7 +2603,7 @@ isert_reg_sig_mr(struct isert_conn *isert_conn,
 
 	memset(&sig_wr, 0, sizeof(sig_wr));
 	sig_wr.wr.opcode = IB_WR_REG_SIG_MR;
-	sig_wr.wr.wr_id = ISER_FASTREG_LI_WRID;
+	sig_wr.wr.wr_cqe = NULL;
 	sig_wr.wr.sg_list = &rdma_wr->ib_sg[DATA];
 	sig_wr.wr.num_sge = 1;
 	sig_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
@@ -2839,14 +2758,18 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 	rdma_wr = &isert_cmd->rdma_wr.s_rdma_wr;
 	rdma_wr->wr.sg_list = &wr->s_ib_sge;
 	rdma_wr->wr.num_sge = 1;
-	rdma_wr->wr.wr_id = (uintptr_t)&isert_cmd->tx_desc;
+	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
 	if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
+		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+
 		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
 		rdma_wr->remote_addr = isert_cmd->read_va;
 		rdma_wr->rkey = isert_cmd->read_stag;
 		rdma_wr->wr.send_flags = !isert_prot_cmd(isert_conn, se_cmd) ?
 				      0 : IB_SEND_SIGNALED;
 	} else {
+		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
+
 		rdma_wr->wr.opcode = IB_WR_RDMA_READ;
 		rdma_wr->remote_addr = isert_cmd->write_va;
 		rdma_wr->rkey = isert_cmd->write_stag;
@@ -3310,14 +3233,26 @@ isert_wait4cmds(struct iscsi_conn *conn)
 }
 
 static void
+isert_beacon_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct isert_conn *isert_conn = wc->qp->qp_context;
+
+	isert_print_wc(wc);
+
+	isert_info("conn %p completing wait_comp_err\n", isert_conn);
+	complete(&isert_conn->wait_comp_err);
+}
+
+static void
 isert_wait4flush(struct isert_conn *isert_conn)
 {
 	struct ib_recv_wr *bad_wr;
+	static struct ib_cqe cqe = { .done = isert_beacon_done };
 
 	isert_info("conn %p\n", isert_conn);
 
 	init_completion(&isert_conn->wait_comp_err);
-	isert_conn->beacon.wr_id = ISER_BEACON_WRID;
+	isert_conn->beacon.wr_cqe = &cqe;
 	/* post an indication that all flush errors were consumed */
 	if (ib_post_recv(isert_conn->qp, &isert_conn->beacon, &bad_wr)) {
 		isert_err("conn %p failed to post beacon", isert_conn);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 1f15ff9..aae4a91 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -63,11 +63,10 @@
 				ISERT_MAX_RX_MISC_PDUS)
 
 #define ISER_RX_PAD_SIZE	(ISER_RECV_DATA_SEG_LEN + 4096 - \
-		(ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct ib_sge)))
+		(ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct ib_sge) + \
+		 sizeof(struct ib_cqe)))
 
 #define ISCSI_ISER_SG_TABLESIZE		256
-#define ISER_FASTREG_LI_WRID		0xffffffffffffffffULL
-#define ISER_BEACON_WRID               0xfffffffffffffffeULL
 
 enum isert_desc_type {
 	ISCSI_TX_CONTROL,
@@ -95,6 +94,7 @@ struct iser_rx_desc {
 	char		data[ISER_RECV_DATA_SEG_LEN];
 	u64		dma_addr;
 	struct ib_sge	rx_sg;
+	struct ib_cqe	rx_cqe;
 	char		pad[ISER_RX_PAD_SIZE];
 } __packed;
 
@@ -104,6 +104,7 @@ struct iser_tx_desc {
 	enum isert_desc_type type;
 	u64		dma_addr;
 	struct ib_sge	tx_sg[2];
+	struct ib_cqe	tx_cqe;
 	int		num_sge;
 	struct isert_cmd *isert_cmd;
 	struct ib_send_wr send_wr;
@@ -220,17 +221,13 @@ struct isert_conn {
  *
  * @device:     pointer to device handle
  * @cq:         completion queue
- * @wcs:        work completion array
  * @active_qps: Number of active QPs attached
  *              to completion context
- * @work:       completion work handle
  */
 struct isert_comp {
 	struct isert_device     *device;
 	struct ib_cq		*cq;
-	struct ib_wc		 wcs[16];
 	int                      active_qps;
-	struct work_struct	 work;
 };
 
 struct isert_device {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/13] IB/isert: kill struct isert_rdma_wr
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (6 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 09/13] IB/isert: convert to new CQ API Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  2016-02-27 18:10   ` [PATCH 12/13] IB/core: add a MR pool for signature MRs Christoph Hellwig
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

There is exactly one instance per struct isert_cmd, so merge the two to
simplify everyones life.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 219 ++++++++++++++++----------------
 drivers/infiniband/ulp/isert/ib_isert.h |  30 ++---
 2 files changed, 119 insertions(+), 130 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 8ef25c6..c93be93 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -49,13 +49,11 @@ static struct workqueue_struct *isert_release_wq;
 static void
 isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
 static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-	       struct isert_rdma_wr *wr);
+isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
 static void
 isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
 static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-	       struct isert_rdma_wr *wr);
+isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
 static int
 isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 static int
@@ -1086,7 +1084,7 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 {
 	struct iser_tx_desc *tx_desc = &isert_cmd->tx_desc;
 
-	isert_cmd->rdma_wr.iser_ib_op = ISER_IB_SEND;
+	isert_cmd->iser_ib_op = ISER_IB_SEND;
 	tx_desc->tx_cqe.done = isert_send_done;
 	send_wr->wr_cqe = &tx_desc->tx_cqe;
 
@@ -1697,54 +1695,50 @@ isert_unmap_data_buf(struct isert_conn *isert_conn, struct isert_data_buf *data)
 static void
 isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
 {
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
-
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (wr->data.sg) {
+	if (isert_cmd->data.sg) {
 		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &wr->data);
+		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
 	}
 
-	if (wr->rdma_wr) {
+	if (isert_cmd->rdma_wr) {
 		isert_dbg("Cmd %p free send_wr\n", isert_cmd);
-		kfree(wr->rdma_wr);
-		wr->rdma_wr = NULL;
+		kfree(isert_cmd->rdma_wr);
+		isert_cmd->rdma_wr = NULL;
 	}
 
-	if (wr->ib_sge) {
+	if (isert_cmd->ib_sge) {
 		isert_dbg("Cmd %p free ib_sge\n", isert_cmd);
-		kfree(wr->ib_sge);
-		wr->ib_sge = NULL;
+		kfree(isert_cmd->ib_sge);
+		isert_cmd->ib_sge = NULL;
 	}
 }
 
 static void
 isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
 {
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
-
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (wr->fr_desc) {
-		isert_dbg("Cmd %p free fr_desc %p\n", isert_cmd, wr->fr_desc);
-		if (wr->fr_desc->ind & ISERT_PROTECTED) {
-			isert_unmap_data_buf(isert_conn, &wr->prot);
-			wr->fr_desc->ind &= ~ISERT_PROTECTED;
+	if (isert_cmd->fr_desc) {
+		isert_dbg("Cmd %p free fr_desc %p\n", isert_cmd, isert_cmd->fr_desc);
+		if (isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
+			isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
+			isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
 		}
 		spin_lock_bh(&isert_conn->pool_lock);
-		list_add_tail(&wr->fr_desc->list, &isert_conn->fr_pool);
+		list_add_tail(&isert_cmd->fr_desc->list, &isert_conn->fr_pool);
 		spin_unlock_bh(&isert_conn->pool_lock);
-		wr->fr_desc = NULL;
+		isert_cmd->fr_desc = NULL;
 	}
 
-	if (wr->data.sg) {
+	if (isert_cmd->data.sg) {
 		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &wr->data);
+		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
 	}
 
-	wr->ib_sge = NULL;
-	wr->rdma_wr = NULL;
+	isert_cmd->ib_sge = NULL;
+	isert_cmd->rdma_wr = NULL;
 }
 
 static void
@@ -1903,7 +1897,6 @@ isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct iser_tx_desc *desc =
 		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
 	struct isert_cmd *isert_cmd = desc->isert_cmd;
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 	struct se_cmd *cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	int ret = 0;
 
@@ -1915,13 +1908,14 @@ isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (wr->fr_desc && wr->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(cmd, wr->fr_desc->pi_ctx->sig_mr);
-		wr->fr_desc->ind &= ~ISERT_PROTECTED;
+	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
+		ret = isert_check_pi_status(cmd,
+				isert_cmd->fr_desc->pi_ctx->sig_mr);
+		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
 	}
 
 	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	wr->rdma_wr_num = 0;
+	isert_cmd->rdma_wr_num = 0;
 	if (ret)
 		transport_send_check_condition_and_sense(cmd, cmd->pi_err, 0);
 	else
@@ -1936,7 +1930,6 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct iser_tx_desc *desc =
 		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
 	struct isert_cmd *isert_cmd = desc->isert_cmd;
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	int ret = 0;
@@ -1949,16 +1942,16 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (wr->fr_desc && wr->fr_desc->ind & ISERT_PROTECTED) {
+	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
 		ret = isert_check_pi_status(se_cmd,
-					    wr->fr_desc->pi_ctx->sig_mr);
-		wr->fr_desc->ind &= ~ISERT_PROTECTED;
+					    isert_cmd->fr_desc->pi_ctx->sig_mr);
+		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
 	}
 
 	iscsit_stop_dataout_timer(cmd);
 	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	cmd->write_data_done = wr->data.len;
-	wr->rdma_wr_num = 0;
+	cmd->write_data_done = isert_cmd->data.len;
+	isert_cmd->rdma_wr_num = 0;
 
 	isert_dbg("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
 	spin_lock_bh(&cmd->istate_lock);
@@ -2343,13 +2336,12 @@ isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 }
 
 static int
-isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-	       struct isert_rdma_wr *wr)
+isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
 {
+	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_data_buf *data = &wr->data;
+	struct isert_data_buf *data = &isert_cmd->data;
 	struct ib_rdma_wr *rdma_wr;
 	struct ib_sge *ib_sge;
 	u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
@@ -2357,10 +2349,12 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
 	isert_cmd->tx_desc.isert_cmd = isert_cmd;
 
-	offset = wr->iser_ib_op == ISER_IB_RDMA_READ ? cmd->write_data_done : 0;
+	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
+			cmd->write_data_done : 0;
 	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
 				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, wr->iser_ib_op, &wr->data);
+				 offset, isert_cmd->iser_ib_op,
+				 &isert_cmd->data);
 	if (ret)
 		return ret;
 
@@ -2373,45 +2367,44 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 		ret = -ENOMEM;
 		goto unmap_cmd;
 	}
-	wr->ib_sge = ib_sge;
+	isert_cmd->ib_sge = ib_sge;
 
-	wr->rdma_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
-	wr->rdma_wr = kzalloc(sizeof(struct ib_rdma_wr) * wr->rdma_wr_num,
-				GFP_KERNEL);
-	if (!wr->rdma_wr) {
-		isert_dbg("Unable to allocate wr->rdma_wr\n");
+	isert_cmd->rdma_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
+	isert_cmd->rdma_wr = kzalloc(sizeof(struct ib_rdma_wr) *
+			isert_cmd->rdma_wr_num, GFP_KERNEL);
+	if (!isert_cmd->rdma_wr) {
+		isert_dbg("Unable to allocate isert_cmd->rdma_wr\n");
 		ret = -ENOMEM;
 		goto unmap_cmd;
 	}
 
-	wr->isert_cmd = isert_cmd;
 	rdma_write_max = isert_conn->max_sge * PAGE_SIZE;
 
-	for (i = 0; i < wr->rdma_wr_num; i++) {
-		rdma_wr = &isert_cmd->rdma_wr.rdma_wr[i];
+	for (i = 0; i < isert_cmd->rdma_wr_num; i++) {
+		rdma_wr = &isert_cmd->rdma_wr[i];
 		data_len = min(data_left, rdma_write_max);
 
 		rdma_wr->wr.send_flags = 0;
-		if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
+		if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
 			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
 
 			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
 			rdma_wr->remote_addr = isert_cmd->read_va + offset;
 			rdma_wr->rkey = isert_cmd->read_stag;
-			if (i + 1 == wr->rdma_wr_num)
+			if (i + 1 == isert_cmd->rdma_wr_num)
 				rdma_wr->wr.next = &isert_cmd->tx_desc.send_wr;
 			else
-				rdma_wr->wr.next = &wr->rdma_wr[i + 1].wr;
+				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
 		} else {
 			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
 
 			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
 			rdma_wr->remote_addr = isert_cmd->write_va + va_offset;
 			rdma_wr->rkey = isert_cmd->write_stag;
-			if (i + 1 == wr->rdma_wr_num)
+			if (i + 1 == isert_cmd->rdma_wr_num)
 				rdma_wr->wr.send_flags = IB_SEND_SIGNALED;
 			else
-				rdma_wr->wr.next = &wr->rdma_wr[i + 1].wr;
+				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
 		}
 
 		ib_sge_cnt = isert_build_rdma_wr(isert_conn, isert_cmd, ib_sge,
@@ -2579,10 +2572,10 @@ isert_set_prot_checks(u8 prot_checks)
 
 static int
 isert_reg_sig_mr(struct isert_conn *isert_conn,
-		 struct se_cmd *se_cmd,
-		 struct isert_rdma_wr *rdma_wr,
+		 struct isert_cmd *isert_cmd,
 		 struct fast_reg_descriptor *fr_desc)
 {
+	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	struct ib_sig_handover_wr sig_wr;
 	struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
 	struct pi_context *pi_ctx = fr_desc->pi_ctx;
@@ -2604,13 +2597,13 @@ isert_reg_sig_mr(struct isert_conn *isert_conn,
 	memset(&sig_wr, 0, sizeof(sig_wr));
 	sig_wr.wr.opcode = IB_WR_REG_SIG_MR;
 	sig_wr.wr.wr_cqe = NULL;
-	sig_wr.wr.sg_list = &rdma_wr->ib_sg[DATA];
+	sig_wr.wr.sg_list = &isert_cmd->ib_sg[DATA];
 	sig_wr.wr.num_sge = 1;
 	sig_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
 	sig_wr.sig_attrs = &sig_attrs;
 	sig_wr.sig_mr = pi_ctx->sig_mr;
 	if (se_cmd->t_prot_sg)
-		sig_wr.prot = &rdma_wr->ib_sg[PROT];
+		sig_wr.prot = &isert_cmd->ib_sg[PROT];
 
 	if (!wr)
 		wr = &sig_wr.wr;
@@ -2624,35 +2617,34 @@ isert_reg_sig_mr(struct isert_conn *isert_conn,
 	}
 	fr_desc->ind &= ~ISERT_SIG_KEY_VALID;
 
-	rdma_wr->ib_sg[SIG].lkey = pi_ctx->sig_mr->lkey;
-	rdma_wr->ib_sg[SIG].addr = 0;
-	rdma_wr->ib_sg[SIG].length = se_cmd->data_length;
+	isert_cmd->ib_sg[SIG].lkey = pi_ctx->sig_mr->lkey;
+	isert_cmd->ib_sg[SIG].addr = 0;
+	isert_cmd->ib_sg[SIG].length = se_cmd->data_length;
 	if (se_cmd->prot_op != TARGET_PROT_DIN_STRIP &&
 	    se_cmd->prot_op != TARGET_PROT_DOUT_INSERT)
 		/*
 		 * We have protection guards on the wire
 		 * so we need to set a larget transfer
 		 */
-		rdma_wr->ib_sg[SIG].length += se_cmd->prot_length;
+		isert_cmd->ib_sg[SIG].length += se_cmd->prot_length;
 
 	isert_dbg("sig_sge: addr: 0x%llx  length: %u lkey: %x\n",
-		  rdma_wr->ib_sg[SIG].addr, rdma_wr->ib_sg[SIG].length,
-		  rdma_wr->ib_sg[SIG].lkey);
+		  isert_cmd->ib_sg[SIG].addr, isert_cmd->ib_sg[SIG].length,
+		  isert_cmd->ib_sg[SIG].lkey);
 err:
 	return ret;
 }
 
 static int
 isert_handle_prot_cmd(struct isert_conn *isert_conn,
-		      struct isert_cmd *isert_cmd,
-		      struct isert_rdma_wr *wr)
+		      struct isert_cmd *isert_cmd)
 {
 	struct isert_device *device = isert_conn->device;
 	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	int ret;
 
-	if (!wr->fr_desc->pi_ctx) {
-		ret = isert_create_pi_ctx(wr->fr_desc,
+	if (!isert_cmd->fr_desc->pi_ctx) {
+		ret = isert_create_pi_ctx(isert_cmd->fr_desc,
 					  device->ib_device,
 					  device->pd);
 		if (ret) {
@@ -2667,16 +2659,20 @@ isert_handle_prot_cmd(struct isert_conn *isert_conn,
 					 se_cmd->t_prot_sg,
 					 se_cmd->t_prot_nents,
 					 se_cmd->prot_length,
-					 0, wr->iser_ib_op, &wr->prot);
+					 0,
+					 isert_cmd->iser_ib_op,
+					 &isert_cmd->prot);
 		if (ret) {
 			isert_err("conn %p failed to map protection buffer\n",
 				  isert_conn);
 			return ret;
 		}
 
-		memset(&wr->ib_sg[PROT], 0, sizeof(wr->ib_sg[PROT]));
-		ret = isert_fast_reg_mr(isert_conn, wr->fr_desc, &wr->prot,
-					ISERT_PROT_KEY_VALID, &wr->ib_sg[PROT]);
+		memset(&isert_cmd->ib_sg[PROT], 0, sizeof(isert_cmd->ib_sg[PROT]));
+		ret = isert_fast_reg_mr(isert_conn, isert_cmd->fr_desc,
+					&isert_cmd->prot,
+					ISERT_PROT_KEY_VALID,
+					&isert_cmd->ib_sg[PROT]);
 		if (ret) {
 			isert_err("conn %p failed to fast reg mr\n",
 				  isert_conn);
@@ -2684,29 +2680,28 @@ isert_handle_prot_cmd(struct isert_conn *isert_conn,
 		}
 	}
 
-	ret = isert_reg_sig_mr(isert_conn, se_cmd, wr, wr->fr_desc);
+	ret = isert_reg_sig_mr(isert_conn, isert_cmd, isert_cmd->fr_desc);
 	if (ret) {
 		isert_err("conn %p failed to fast reg mr\n",
 			  isert_conn);
 		goto unmap_prot_cmd;
 	}
-	wr->fr_desc->ind |= ISERT_PROTECTED;
+	isert_cmd->fr_desc->ind |= ISERT_PROTECTED;
 
 	return 0;
 
 unmap_prot_cmd:
 	if (se_cmd->t_prot_sg)
-		isert_unmap_data_buf(isert_conn, &wr->prot);
+		isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
 
 	return ret;
 }
 
 static int
-isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
-	       struct isert_rdma_wr *wr)
+isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
 {
+	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
 	struct fast_reg_descriptor *fr_desc = NULL;
 	struct ib_rdma_wr *rdma_wr;
@@ -2717,49 +2712,51 @@ isert_reg_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
 	isert_cmd->tx_desc.isert_cmd = isert_cmd;
 
-	offset = wr->iser_ib_op == ISER_IB_RDMA_READ ? cmd->write_data_done : 0;
+	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
+			cmd->write_data_done : 0;
 	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
 				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, wr->iser_ib_op, &wr->data);
+				 offset, isert_cmd->iser_ib_op,
+				 &isert_cmd->data);
 	if (ret)
 		return ret;
 
-	if (wr->data.dma_nents != 1 || isert_prot_cmd(isert_conn, se_cmd)) {
+	if (isert_cmd->data.dma_nents != 1 ||
+	    isert_prot_cmd(isert_conn, se_cmd)) {
 		spin_lock_irqsave(&isert_conn->pool_lock, flags);
 		fr_desc = list_first_entry(&isert_conn->fr_pool,
 					   struct fast_reg_descriptor, list);
 		list_del(&fr_desc->list);
 		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
-		wr->fr_desc = fr_desc;
+		isert_cmd->fr_desc = fr_desc;
 	}
 
-	ret = isert_fast_reg_mr(isert_conn, fr_desc, &wr->data,
-				ISERT_DATA_KEY_VALID, &wr->ib_sg[DATA]);
+	ret = isert_fast_reg_mr(isert_conn, fr_desc, &isert_cmd->data,
+				ISERT_DATA_KEY_VALID, &isert_cmd->ib_sg[DATA]);
 	if (ret)
 		goto unmap_cmd;
 
 	if (isert_prot_cmd(isert_conn, se_cmd)) {
-		ret = isert_handle_prot_cmd(isert_conn, isert_cmd, wr);
+		ret = isert_handle_prot_cmd(isert_conn, isert_cmd);
 		if (ret)
 			goto unmap_cmd;
 
-		ib_sg = &wr->ib_sg[SIG];
+		ib_sg = &isert_cmd->ib_sg[SIG];
 	} else {
-		ib_sg = &wr->ib_sg[DATA];
+		ib_sg = &isert_cmd->ib_sg[DATA];
 	}
 
-	memcpy(&wr->s_ib_sge, ib_sg, sizeof(*ib_sg));
-	wr->ib_sge = &wr->s_ib_sge;
-	wr->rdma_wr_num = 1;
-	memset(&wr->s_rdma_wr, 0, sizeof(wr->s_rdma_wr));
-	wr->rdma_wr = &wr->s_rdma_wr;
-	wr->isert_cmd = isert_cmd;
+	memcpy(&isert_cmd->s_ib_sge, ib_sg, sizeof(*ib_sg));
+	isert_cmd->ib_sge = &isert_cmd->s_ib_sge;
+	isert_cmd->rdma_wr_num = 1;
+	memset(&isert_cmd->s_rdma_wr, 0, sizeof(isert_cmd->s_rdma_wr));
+	isert_cmd->rdma_wr = &isert_cmd->s_rdma_wr;
 
-	rdma_wr = &isert_cmd->rdma_wr.s_rdma_wr;
-	rdma_wr->wr.sg_list = &wr->s_ib_sge;
+	rdma_wr = &isert_cmd->s_rdma_wr;
+	rdma_wr->wr.sg_list = &isert_cmd->s_ib_sge;
 	rdma_wr->wr.num_sge = 1;
 	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
-	if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
+	if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
 		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
 
 		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
@@ -2784,7 +2781,7 @@ unmap_cmd:
 		list_add_tail(&fr_desc->list, &isert_conn->fr_pool);
 		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
 	}
-	isert_unmap_data_buf(isert_conn, &wr->data);
+	isert_unmap_data_buf(isert_conn, &isert_cmd->data);
 
 	return ret;
 }
@@ -2794,7 +2791,6 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 	struct isert_conn *isert_conn = conn->context;
 	struct isert_device *device = isert_conn->device;
 	struct ib_send_wr *wr_failed;
@@ -2803,8 +2799,8 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 	isert_dbg("Cmd: %p RDMA_WRITE data_length: %u\n",
 		 isert_cmd, se_cmd->data_length);
 
-	wr->iser_ib_op = ISER_IB_RDMA_WRITE;
-	rc = device->reg_rdma_mem(conn, cmd, wr);
+	isert_cmd->iser_ib_op = ISER_IB_RDMA_WRITE;
+	rc = device->reg_rdma_mem(isert_cmd, conn);
 	if (rc) {
 		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
 		return rc;
@@ -2821,8 +2817,8 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 		isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
 		isert_init_send_wr(isert_conn, isert_cmd,
 				   &isert_cmd->tx_desc.send_wr);
-		isert_cmd->rdma_wr.s_rdma_wr.wr.next = &isert_cmd->tx_desc.send_wr;
-		wr->rdma_wr_num += 1;
+		isert_cmd->s_rdma_wr.wr.next = &isert_cmd->tx_desc.send_wr;
+		isert_cmd->rdma_wr_num += 1;
 
 		rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
 		if (rc) {
@@ -2831,7 +2827,7 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 		}
 	}
 
-	rc = ib_post_send(isert_conn->qp, &wr->rdma_wr->wr, &wr_failed);
+	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
 	if (rc)
 		isert_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
 
@@ -2850,7 +2846,6 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 {
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
-	struct isert_rdma_wr *wr = &isert_cmd->rdma_wr;
 	struct isert_conn *isert_conn = conn->context;
 	struct isert_device *device = isert_conn->device;
 	struct ib_send_wr *wr_failed;
@@ -2858,14 +2853,14 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 
 	isert_dbg("Cmd: %p RDMA_READ data_length: %u write_data_done: %u\n",
 		 isert_cmd, se_cmd->data_length, cmd->write_data_done);
-	wr->iser_ib_op = ISER_IB_RDMA_READ;
-	rc = device->reg_rdma_mem(conn, cmd, wr);
+	isert_cmd->iser_ib_op = ISER_IB_RDMA_READ;
+	rc = device->reg_rdma_mem(isert_cmd, conn);
 	if (rc) {
 		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
 		return rc;
 	}
 
-	rc = ib_post_send(isert_conn->qp, &wr->rdma_wr->wr, &wr_failed);
+	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
 	if (rc)
 		isert_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
 
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index aae4a91..b751b6a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -145,20 +145,6 @@ enum {
 	SIG = 2,
 };
 
-struct isert_rdma_wr {
-	struct isert_cmd	*isert_cmd;
-	enum iser_ib_op_code	iser_ib_op;
-	struct ib_sge		*ib_sge;
-	struct ib_sge		s_ib_sge;
-	int			rdma_wr_num;
-	struct ib_rdma_wr	*rdma_wr;
-	struct ib_rdma_wr	s_rdma_wr;
-	struct ib_sge		ib_sg[3];
-	struct isert_data_buf	data;
-	struct isert_data_buf	prot;
-	struct fast_reg_descriptor *fr_desc;
-};
-
 struct isert_cmd {
 	uint32_t		read_stag;
 	uint32_t		write_stag;
@@ -171,7 +157,16 @@ struct isert_cmd {
 	struct iscsi_cmd	*iscsi_cmd;
 	struct iser_tx_desc	tx_desc;
 	struct iser_rx_desc	*rx_desc;
-	struct isert_rdma_wr	rdma_wr;
+	enum iser_ib_op_code	iser_ib_op;
+	struct ib_sge		*ib_sge;
+	struct ib_sge		s_ib_sge;
+	int			rdma_wr_num;
+	struct ib_rdma_wr	*rdma_wr;
+	struct ib_rdma_wr	s_rdma_wr;
+	struct ib_sge		ib_sg[3];
+	struct isert_data_buf	data;
+	struct isert_data_buf	prot;
+	struct fast_reg_descriptor *fr_desc;
 	struct work_struct	comp_work;
 	struct scatterlist	sg;
 };
@@ -239,9 +234,8 @@ struct isert_device {
 	struct isert_comp	*comps;
 	int                     comps_used;
 	struct list_head	dev_node;
-	int			(*reg_rdma_mem)(struct iscsi_conn *conn,
-						    struct iscsi_cmd *cmd,
-						    struct isert_rdma_wr *wr);
+	int			(*reg_rdma_mem)(struct isert_cmd *isert_cmd,
+						struct iscsi_conn *conn);
 	void			(*unreg_rdma_mem)(struct isert_cmd *isert_cmd,
 						  struct isert_conn *isert_conn);
 };
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/13] IB/isert: the kill ->isert_cmd back pointer in the struct iser_tx_desc
  2016-02-27 18:10 RFC: a first draft of a generic RDMA READ/WRITE API Christoph Hellwig
                   ` (2 preceding siblings ...)
  2016-02-27 18:10 ` [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
@ 2016-02-27 18:10 ` Christoph Hellwig
  2016-02-27 18:10 ` [PATCH 13/13] IB/isert: RW API WIP Christoph Hellwig
  4 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: swise, sagig, target-devel

We only use the pointer when processing regular iSER commands, and it then
always points to the struct iser_cmd that contains the TX descriptor.

Remove it and rely on container_of to save a little space and avoid a
pointer that is updated multiple times per processed command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 14 ++++++--------
 drivers/infiniband/ulp/isert/ib_isert.h |  1 -
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index c93be93..e1a553f 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1043,7 +1043,6 @@ isert_create_send_desc(struct isert_conn *isert_conn,
 	tx_desc->iser_header.flags = ISCSI_CTRL;
 
 	tx_desc->num_sge = 1;
-	tx_desc->isert_cmd = isert_cmd;
 
 	if (tx_desc->tx_sg[0].lkey != device->pd->local_dma_lkey) {
 		tx_desc->tx_sg[0].lkey = device->pd->local_dma_lkey;
@@ -1896,7 +1895,8 @@ isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct isert_device *device = isert_conn->device;
 	struct iser_tx_desc *desc =
 		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
-	struct isert_cmd *isert_cmd = desc->isert_cmd;
+	struct isert_cmd *isert_cmd =
+		container_of(desc, struct isert_cmd, tx_desc);
 	struct se_cmd *cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	int ret = 0;
 
@@ -1929,7 +1929,8 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct isert_device *device = isert_conn->device;
 	struct iser_tx_desc *desc =
 		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
-	struct isert_cmd *isert_cmd = desc->isert_cmd;
+	struct isert_cmd *isert_cmd =
+		container_of(desc, struct isert_cmd, tx_desc);
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	int ret = 0;
@@ -2019,7 +2020,8 @@ isert_send_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct ib_device *ib_dev = isert_conn->cm_id->device;
 	struct iser_tx_desc *tx_desc =
 		container_of(wc->wr_cqe, struct iser_tx_desc, tx_cqe);
-	struct isert_cmd *isert_cmd = tx_desc->isert_cmd;
+	struct isert_cmd *isert_cmd =
+		container_of(tx_desc, struct isert_cmd, tx_desc);
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
 		isert_print_wc(wc);
@@ -2347,8 +2349,6 @@ isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
 	u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
 	int ret = 0, i, ib_sge_cnt;
 
-	isert_cmd->tx_desc.isert_cmd = isert_cmd;
-
 	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
 			cmd->write_data_done : 0;
 	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
@@ -2710,8 +2710,6 @@ isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
 	int ret = 0;
 	unsigned long flags;
 
-	isert_cmd->tx_desc.isert_cmd = isert_cmd;
-
 	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
 			cmd->write_data_done : 0;
 	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index b751b6a..2614403 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -106,7 +106,6 @@ struct iser_tx_desc {
 	struct ib_sge	tx_sg[2];
 	struct ib_cqe	tx_cqe;
 	int		num_sge;
-	struct isert_cmd *isert_cmd;
 	struct ib_send_wr send_wr;
 } __packed;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/13] IB/core: add a MR pool for signature MRs
       [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                     ` (7 preceding siblings ...)
  2016-02-27 18:10   ` [PATCH 10/13] IB/isert: kill struct isert_rdma_wr Christoph Hellwig
@ 2016-02-27 18:10   ` Christoph Hellwig
  8 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/mr_pool.c |  4 +++-
 drivers/infiniband/core/rw.c      | 13 +++++++++++++
 drivers/infiniband/core/verbs.c   |  1 +
 include/rdma/ib_verbs.h           |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/mr_pool.c b/drivers/infiniband/core/mr_pool.c
index b1eb27a..9751bb1 100644
--- a/drivers/infiniband/core/mr_pool.c
+++ b/drivers/infiniband/core/mr_pool.c
@@ -20,8 +20,10 @@ struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
 
 	spin_lock_irqsave(&qp->mr_lock, flags);
 	mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
-	if (mr)
+	if (mr) {
+		list_del(&mr->qp_entry);
 		qp->mrs_used++;
+	}
 	spin_unlock_irqrestore(&qp->mr_lock, flags);
 
 	return mr;
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 69c3ca5..00af4ce 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -384,10 +384,23 @@ int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
 				dev->attrs.max_fast_reg_page_list_len);
 	}
 
+	if (attr->create_flags & IB_QP_CREATE_SIGNATURE_EN) {
+		ret = ib_mr_pool_init(qp, &qp->sig_mrs,
+				attr->cap.max_rdma_ctxs, IB_MR_TYPE_SIGNATURE,
+				2);
+		if (ret)
+			goto out_free_rdma_mrs;
+	}
+
+	return 0;
+
+out_free_rdma_mrs:
+	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
 	return ret;
 }
 
 void rdma_rw_cleanup_mrs(struct ib_qp *qp)
 {
+	ib_mr_pool_destroy(qp, &qp->sig_mrs);
 	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
 }
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 1ef3a1a..c5034af 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -776,6 +776,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	qp->mrs_used = 0;
 	spin_lock_init(&qp->mr_lock);
 	INIT_LIST_HEAD(&qp->rdma_mrs);
+	INIT_LIST_HEAD(&qp->sig_mrs);
 
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 035585a..00af0a7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1422,6 +1422,7 @@ struct ib_qp {
 
 	spinlock_t		mr_lock;
 	struct list_head	rdma_mrs;
+	struct list_head	sig_mrs;
 	int			mrs_used;
 
 	/* count times opened, mcast attaches, flow attaches */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/13] IB/isert: RW API WIP
  2016-02-27 18:10 RFC: a first draft of a generic RDMA READ/WRITE API Christoph Hellwig
                   ` (3 preceding siblings ...)
  2016-02-27 18:10 ` [PATCH 11/13] IB/isert: the kill ->isert_cmd back pointer in the struct iser_tx_desc Christoph Hellwig
@ 2016-02-27 18:10 ` Christoph Hellwig
  2016-02-28 13:57   ` Sagi Grimberg
  4 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-27 18:10 UTC (permalink / raw)
  To: linux-rdma; +Cc: swise, sagig, target-devel

The SIG code still lives in ib_isert for now, although it operates
on the RW API data structures and could probably be moved to the
core without too much work once fully debugged.

Ignores the issue that more than a single MR might be required for
a command (guess that doesn't happen with mlx5 as the old code
didn't handle it either).
---
 drivers/infiniband/ulp/isert/ib_isert.c | 855 ++++++--------------------------
 drivers/infiniband/ulp/isert/ib_isert.h |  70 +--
 2 files changed, 161 insertions(+), 764 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index e1a553f..900533c 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -33,7 +33,8 @@
 
 #define	ISERT_MAX_CONN		8
 #define ISER_MAX_RX_CQ_LEN	(ISERT_QP_MAX_RECV_DTOS * ISERT_MAX_CONN)
-#define ISER_MAX_TX_CQ_LEN	(ISERT_QP_MAX_REQ_DTOS  * ISERT_MAX_CONN)
+#define ISER_MAX_TX_CQ_LEN \
+	((ISERT_QP_MAX_REQ_DTOS + ISERT_QP_MAX_RDMA_OPS) * ISERT_MAX_CONN)
 #define ISER_MAX_CQ_LEN		(ISER_MAX_RX_CQ_LEN + ISER_MAX_TX_CQ_LEN + \
 				 ISERT_MAX_CONN)
 
@@ -46,14 +47,6 @@ static LIST_HEAD(device_list);
 static struct workqueue_struct *isert_comp_wq;
 static struct workqueue_struct *isert_release_wq;
 
-static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
-static int
-isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
-static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
-static int
-isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
 static int
 isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 static int
@@ -142,6 +135,7 @@ isert_create_qp(struct isert_conn *isert_conn,
 	attr.recv_cq = comp->cq;
 	attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS;
 	attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS + 1;
+	attr.cap.max_rdma_ctxs = ISERT_QP_MAX_RDMA_OPS;
 	attr.cap.max_send_sge = device->ib_device->attrs.max_sge;
 	isert_conn->max_sge = min(device->ib_device->attrs.max_sge,
 				  device->ib_device->attrs.max_sge_rd);
@@ -269,9 +263,9 @@ isert_alloc_comps(struct isert_device *device)
 				 device->ib_device->num_comp_vectors));
 
 	isert_info("Using %d CQs, %s supports %d vectors support "
-		   "Fast registration %d pi_capable %d\n",
+		   "pi_capable %d\n",
 		   device->comps_used, device->ib_device->name,
-		   device->ib_device->num_comp_vectors, device->use_fastreg,
+		   device->ib_device->num_comp_vectors,
 		   device->pi_capable);
 
 	device->comps = kcalloc(device->comps_used, sizeof(struct isert_comp),
@@ -312,18 +306,6 @@ isert_create_device_ib_res(struct isert_device *device)
 	isert_dbg("devattr->max_sge: %d\n", ib_dev->attrs.max_sge);
 	isert_dbg("devattr->max_sge_rd: %d\n", ib_dev->attrs.max_sge_rd);
 
-	/* asign function handlers */
-	if (ib_dev->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
-	    ib_dev->attrs.device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
-		device->use_fastreg = 1;
-		device->reg_rdma_mem = isert_reg_rdma;
-		device->unreg_rdma_mem = isert_unreg_rdma;
-	} else {
-		device->use_fastreg = 0;
-		device->reg_rdma_mem = isert_map_rdma;
-		device->unreg_rdma_mem = isert_unmap_cmd;
-	}
-
 	ret = isert_alloc_comps(device);
 	if (ret)
 		goto out;
@@ -416,146 +398,6 @@ isert_device_get(struct rdma_cm_id *cma_id)
 }
 
 static void
-isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
-{
-	struct fast_reg_descriptor *fr_desc, *tmp;
-	int i = 0;
-
-	if (list_empty(&isert_conn->fr_pool))
-		return;
-
-	isert_info("Freeing conn %p fastreg pool", isert_conn);
-
-	list_for_each_entry_safe(fr_desc, tmp,
-				 &isert_conn->fr_pool, list) {
-		list_del(&fr_desc->list);
-		ib_dereg_mr(fr_desc->data_mr);
-		if (fr_desc->pi_ctx) {
-			ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
-			ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
-			kfree(fr_desc->pi_ctx);
-		}
-		kfree(fr_desc);
-		++i;
-	}
-
-	if (i < isert_conn->fr_pool_size)
-		isert_warn("Pool still has %d regions registered\n",
-			isert_conn->fr_pool_size - i);
-}
-
-static int
-isert_create_pi_ctx(struct fast_reg_descriptor *desc,
-		    struct ib_device *device,
-		    struct ib_pd *pd)
-{
-	struct pi_context *pi_ctx;
-	int ret;
-
-	pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
-	if (!pi_ctx) {
-		isert_err("Failed to allocate pi context\n");
-		return -ENOMEM;
-	}
-
-	pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
-				      ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(pi_ctx->prot_mr)) {
-		isert_err("Failed to allocate prot frmr err=%ld\n",
-			  PTR_ERR(pi_ctx->prot_mr));
-		ret = PTR_ERR(pi_ctx->prot_mr);
-		goto err_pi_ctx;
-	}
-	desc->ind |= ISERT_PROT_KEY_VALID;
-
-	pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2);
-	if (IS_ERR(pi_ctx->sig_mr)) {
-		isert_err("Failed to allocate signature enabled mr err=%ld\n",
-			  PTR_ERR(pi_ctx->sig_mr));
-		ret = PTR_ERR(pi_ctx->sig_mr);
-		goto err_prot_mr;
-	}
-
-	desc->pi_ctx = pi_ctx;
-	desc->ind |= ISERT_SIG_KEY_VALID;
-	desc->ind &= ~ISERT_PROTECTED;
-
-	return 0;
-
-err_prot_mr:
-	ib_dereg_mr(pi_ctx->prot_mr);
-err_pi_ctx:
-	kfree(pi_ctx);
-
-	return ret;
-}
-
-static int
-isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
-		     struct fast_reg_descriptor *fr_desc)
-{
-	fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
-				       ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(fr_desc->data_mr)) {
-		isert_err("Failed to allocate data frmr err=%ld\n",
-			  PTR_ERR(fr_desc->data_mr));
-		return PTR_ERR(fr_desc->data_mr);
-	}
-	fr_desc->ind |= ISERT_DATA_KEY_VALID;
-
-	isert_dbg("Created fr_desc %p\n", fr_desc);
-
-	return 0;
-}
-
-static int
-isert_conn_create_fastreg_pool(struct isert_conn *isert_conn)
-{
-	struct fast_reg_descriptor *fr_desc;
-	struct isert_device *device = isert_conn->device;
-	struct se_session *se_sess = isert_conn->conn->sess->se_sess;
-	struct se_node_acl *se_nacl = se_sess->se_node_acl;
-	int i, ret, tag_num;
-	/*
-	 * Setup the number of FRMRs based upon the number of tags
-	 * available to session in iscsi_target_locate_portal().
-	 */
-	tag_num = max_t(u32, ISCSIT_MIN_TAGS, se_nacl->queue_depth);
-	tag_num = (tag_num * 2) + ISCSIT_EXTRA_TAGS;
-
-	isert_conn->fr_pool_size = 0;
-	for (i = 0; i < tag_num; i++) {
-		fr_desc = kzalloc(sizeof(*fr_desc), GFP_KERNEL);
-		if (!fr_desc) {
-			isert_err("Failed to allocate fast_reg descriptor\n");
-			ret = -ENOMEM;
-			goto err;
-		}
-
-		ret = isert_create_fr_desc(device->ib_device,
-					   device->pd, fr_desc);
-		if (ret) {
-			isert_err("Failed to create fastreg descriptor err=%d\n",
-			       ret);
-			kfree(fr_desc);
-			goto err;
-		}
-
-		list_add_tail(&fr_desc->list, &isert_conn->fr_pool);
-		isert_conn->fr_pool_size++;
-	}
-
-	isert_dbg("Creating conn %p fastreg pool size=%d",
-		 isert_conn, isert_conn->fr_pool_size);
-
-	return 0;
-
-err:
-	isert_conn_free_fastreg_pool(isert_conn);
-	return ret;
-}
-
-static void
 isert_init_conn(struct isert_conn *isert_conn)
 {
 	isert_conn->state = ISER_CONN_INIT;
@@ -565,8 +407,6 @@ isert_init_conn(struct isert_conn *isert_conn)
 	init_completion(&isert_conn->wait);
 	kref_init(&isert_conn->kref);
 	mutex_init(&isert_conn->mutex);
-	spin_lock_init(&isert_conn->pool_lock);
-	INIT_LIST_HEAD(&isert_conn->fr_pool);
 	INIT_WORK(&isert_conn->release_work, isert_release_work);
 }
 
@@ -739,9 +579,6 @@ isert_connect_release(struct isert_conn *isert_conn)
 
 	BUG_ON(!device);
 
-	if (device->use_fastreg)
-		isert_conn_free_fastreg_pool(isert_conn);
-
 	isert_free_rx_descriptors(isert_conn);
 	if (isert_conn->cm_id)
 		rdma_destroy_id(isert_conn->cm_id);
@@ -1083,7 +920,6 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 {
 	struct iser_tx_desc *tx_desc = &isert_cmd->tx_desc;
 
-	isert_cmd->iser_ib_op = ISER_IB_SEND;
 	tx_desc->tx_cqe.done = isert_send_done;
 	send_wr->wr_cqe = &tx_desc->tx_cqe;
 
@@ -1166,16 +1002,6 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login,
 	}
 	if (!login->login_failed) {
 		if (login->login_complete) {
-			if (!conn->sess->sess_ops->SessionType &&
-			    isert_conn->device->use_fastreg) {
-				ret = isert_conn_create_fastreg_pool(isert_conn);
-				if (ret) {
-					isert_err("Conn: %p failed to create"
-					       " fastreg pool\n", isert_conn);
-					return ret;
-				}
-			}
-
 			ret = isert_alloc_rx_descriptors(isert_conn);
 			if (ret)
 				return ret;
@@ -1647,97 +1473,29 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 	isert_conn->post_recv_buf_count--;
 }
 
-static int
-isert_map_data_buf(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
-		   struct scatterlist *sg, u32 nents, u32 length, u32 offset,
-		   enum iser_ib_op_code op, struct isert_data_buf *data)
-{
-	struct ib_device *ib_dev = isert_conn->cm_id->device;
-
-	data->dma_dir = op == ISER_IB_RDMA_WRITE ?
-			      DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
-	data->len = length - offset;
-	data->offset = offset;
-	data->sg_off = data->offset / PAGE_SIZE;
-
-	data->sg = &sg[data->sg_off];
-	data->nents = min_t(unsigned int, nents - data->sg_off,
-					  ISCSI_ISER_SG_TABLESIZE);
-	data->len = min_t(unsigned int, data->len, ISCSI_ISER_SG_TABLESIZE *
-					PAGE_SIZE);
-
-	data->dma_nents = ib_dma_map_sg(ib_dev, data->sg, data->nents,
-					data->dma_dir);
-	if (unlikely(!data->dma_nents)) {
-		isert_err("Cmd: unable to dma map SGs %p\n", sg);
-		return -EINVAL;
-	}
-
-	isert_dbg("Mapped cmd: %p count: %u sg: %p sg_nents: %u rdma_len %d\n",
-		  isert_cmd, data->dma_nents, data->sg, data->nents, data->len);
-
-	return 0;
-}
-
-static void
-isert_unmap_data_buf(struct isert_conn *isert_conn, struct isert_data_buf *data)
-{
-	struct ib_device *ib_dev = isert_conn->cm_id->device;
-
-	ib_dma_unmap_sg(ib_dev, data->sg, data->nents, data->dma_dir);
-	memset(data, 0, sizeof(*data));
-}
-
-
-
 static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
+isert_unreg_rdma_mem(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
 {
 	isert_dbg("Cmd %p\n", isert_cmd);
-
-	if (isert_cmd->data.sg) {
-		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
-	}
-
-	if (isert_cmd->rdma_wr) {
-		isert_dbg("Cmd %p free send_wr\n", isert_cmd);
-		kfree(isert_cmd->rdma_wr);
-		isert_cmd->rdma_wr = NULL;
-	}
-
-	if (isert_cmd->ib_sge) {
-		isert_dbg("Cmd %p free ib_sge\n", isert_cmd);
-		kfree(isert_cmd->ib_sge);
-		isert_cmd->ib_sge = NULL;
-	}
-}
-
-static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
-{
-	isert_dbg("Cmd %p\n", isert_cmd);
-
-	if (isert_cmd->fr_desc) {
-		isert_dbg("Cmd %p free fr_desc %p\n", isert_cmd, isert_cmd->fr_desc);
-		if (isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-			isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
-			isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
+		
+	if (isert_cmd->sig_mr) {
+		if (isert_cmd->prot.sg) {
+			rdma_rw_ctx_destroy(&isert_cmd->prot, isert_conn->qp,
+					isert_conn->cm_id->port_num);
+			isert_cmd->prot.sg = NULL;
 		}
-		spin_lock_bh(&isert_conn->pool_lock);
-		list_add_tail(&isert_cmd->fr_desc->list, &isert_conn->fr_pool);
-		spin_unlock_bh(&isert_conn->pool_lock);
-		isert_cmd->fr_desc = NULL;
+
+		ib_mr_pool_put(isert_conn->qp, &isert_conn->qp->sig_mrs,
+			isert_cmd->sig_mr);
+		isert_cmd->sig_mr = NULL;
 	}
 
 	if (isert_cmd->data.sg) {
 		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
+		rdma_rw_ctx_destroy(&isert_cmd->data, isert_conn->qp,
+				isert_conn->cm_id->port_num);
+		isert_cmd->data.sg = NULL;
 	}
-
-	isert_cmd->ib_sge = NULL;
-	isert_cmd->rdma_wr = NULL;
 }
 
 static void
@@ -1746,7 +1504,6 @@ isert_put_cmd(struct isert_cmd *isert_cmd, bool comp_err)
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct isert_conn *isert_conn = isert_cmd->conn;
 	struct iscsi_conn *conn = isert_conn->conn;
-	struct isert_device *device = isert_conn->device;
 	struct iscsi_text_rsp *hdr;
 
 	isert_dbg("Cmd %p\n", isert_cmd);
@@ -1774,7 +1531,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd, bool comp_err)
 			}
 		}
 
-		device->unreg_rdma_mem(isert_cmd, isert_conn);
+		isert_unreg_rdma_mem(isert_cmd, isert_conn);
 		transport_generic_free_cmd(&cmd->se_cmd, 0);
 		break;
 	case ISCSI_OP_SCSI_TMFUNC:
@@ -1908,14 +1665,10 @@ isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(cmd,
-				isert_cmd->fr_desc->pi_ctx->sig_mr);
-		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
-	}
+	if (isert_cmd->sig_mr)
+		ret = isert_check_pi_status(cmd, isert_cmd->sig_mr);
 
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	isert_cmd->rdma_wr_num = 0;
+	isert_unreg_rdma_mem(isert_cmd, isert_conn);
 	if (ret)
 		transport_send_check_condition_and_sense(cmd, cmd->pi_err, 0);
 	else
@@ -1943,16 +1696,13 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(se_cmd,
-					    isert_cmd->fr_desc->pi_ctx->sig_mr);
-		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
-	}
+	if (isert_cmd->sig_mr)
+		ret = isert_check_pi_status(se_cmd, isert_cmd->sig_mr);
 
 	iscsit_stop_dataout_timer(cmd);
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	cmd->write_data_done = isert_cmd->data.len;
-	isert_cmd->rdma_wr_num = 0;
+	isert_unreg_rdma_mem(isert_cmd, isert_conn);
+
+	cmd->write_data_done = 0;
 
 	isert_dbg("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
 	spin_lock_bh(&cmd->istate_lock);
@@ -2123,7 +1873,6 @@ isert_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
 
 	spin_lock_bh(&conn->cmd_lock);
 	if (!list_empty(&cmd->i_conn_node))
@@ -2133,7 +1882,7 @@ isert_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 	if (cmd->data_direction == DMA_TO_DEVICE)
 		iscsit_stop_dataout_timer(cmd);
 
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
+	isert_unreg_rdma_mem(isert_cmd, isert_conn);
 }
 
 static enum target_prot_op
@@ -2286,234 +2035,6 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
 	return isert_post_response(isert_conn, isert_cmd);
 }
 
-static int
-isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
-		    struct ib_sge *ib_sge, struct ib_rdma_wr *rdma_wr,
-		    u32 data_left, u32 offset)
-{
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct scatterlist *sg_start, *tmp_sg;
-	struct isert_device *device = isert_conn->device;
-	struct ib_device *ib_dev = device->ib_device;
-	u32 sg_off, page_off;
-	int i = 0, sg_nents;
-
-	sg_off = offset / PAGE_SIZE;
-	sg_start = &cmd->se_cmd.t_data_sg[sg_off];
-	sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, isert_conn->max_sge);
-	page_off = offset % PAGE_SIZE;
-
-	rdma_wr->wr.sg_list = ib_sge;
-	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
-
-	/*
-	 * Perform mapping of TCM scatterlist memory ib_sge dma_addr.
-	 */
-	for_each_sg(sg_start, tmp_sg, sg_nents, i) {
-		isert_dbg("RDMA from SGL dma_addr: 0x%llx dma_len: %u, "
-			  "page_off: %u\n",
-			  (unsigned long long)tmp_sg->dma_address,
-			  tmp_sg->length, page_off);
-
-		ib_sge->addr = ib_sg_dma_address(ib_dev, tmp_sg) + page_off;
-		ib_sge->length = min_t(u32, data_left,
-				ib_sg_dma_len(ib_dev, tmp_sg) - page_off);
-		ib_sge->lkey = device->pd->local_dma_lkey;
-
-		isert_dbg("RDMA ib_sge: addr: 0x%llx  length: %u lkey: %x\n",
-			  ib_sge->addr, ib_sge->length, ib_sge->lkey);
-		page_off = 0;
-		data_left -= ib_sge->length;
-		if (!data_left)
-			break;
-		ib_sge++;
-		isert_dbg("Incrementing ib_sge pointer to %p\n", ib_sge);
-	}
-
-	rdma_wr->wr.num_sge = ++i;
-	isert_dbg("Set outgoing sg_list: %p num_sg: %u from TCM SGLs\n",
-		  rdma_wr->wr.sg_list, rdma_wr->wr.num_sge);
-
-	return rdma_wr->wr.num_sge;
-}
-
-static int
-isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
-{
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = conn->context;
-	struct isert_data_buf *data = &isert_cmd->data;
-	struct ib_rdma_wr *rdma_wr;
-	struct ib_sge *ib_sge;
-	u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
-	int ret = 0, i, ib_sge_cnt;
-
-	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
-			cmd->write_data_done : 0;
-	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
-				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, isert_cmd->iser_ib_op,
-				 &isert_cmd->data);
-	if (ret)
-		return ret;
-
-	data_left = data->len;
-	offset = data->offset;
-
-	ib_sge = kzalloc(sizeof(struct ib_sge) * data->nents, GFP_KERNEL);
-	if (!ib_sge) {
-		isert_warn("Unable to allocate ib_sge\n");
-		ret = -ENOMEM;
-		goto unmap_cmd;
-	}
-	isert_cmd->ib_sge = ib_sge;
-
-	isert_cmd->rdma_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
-	isert_cmd->rdma_wr = kzalloc(sizeof(struct ib_rdma_wr) *
-			isert_cmd->rdma_wr_num, GFP_KERNEL);
-	if (!isert_cmd->rdma_wr) {
-		isert_dbg("Unable to allocate isert_cmd->rdma_wr\n");
-		ret = -ENOMEM;
-		goto unmap_cmd;
-	}
-
-	rdma_write_max = isert_conn->max_sge * PAGE_SIZE;
-
-	for (i = 0; i < isert_cmd->rdma_wr_num; i++) {
-		rdma_wr = &isert_cmd->rdma_wr[i];
-		data_len = min(data_left, rdma_write_max);
-
-		rdma_wr->wr.send_flags = 0;
-		if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
-			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
-
-			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
-			rdma_wr->remote_addr = isert_cmd->read_va + offset;
-			rdma_wr->rkey = isert_cmd->read_stag;
-			if (i + 1 == isert_cmd->rdma_wr_num)
-				rdma_wr->wr.next = &isert_cmd->tx_desc.send_wr;
-			else
-				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
-		} else {
-			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
-
-			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
-			rdma_wr->remote_addr = isert_cmd->write_va + va_offset;
-			rdma_wr->rkey = isert_cmd->write_stag;
-			if (i + 1 == isert_cmd->rdma_wr_num)
-				rdma_wr->wr.send_flags = IB_SEND_SIGNALED;
-			else
-				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
-		}
-
-		ib_sge_cnt = isert_build_rdma_wr(isert_conn, isert_cmd, ib_sge,
-					rdma_wr, data_len, offset);
-		ib_sge += ib_sge_cnt;
-
-		offset += data_len;
-		va_offset += data_len;
-		data_left -= data_len;
-	}
-
-	return 0;
-unmap_cmd:
-	isert_unmap_data_buf(isert_conn, data);
-
-	return ret;
-}
-
-static inline void
-isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
-{
-	u32 rkey;
-
-	memset(inv_wr, 0, sizeof(*inv_wr));
-	inv_wr->wr_cqe = NULL;
-	inv_wr->opcode = IB_WR_LOCAL_INV;
-	inv_wr->ex.invalidate_rkey = mr->rkey;
-
-	/* Bump the key */
-	rkey = ib_inc_rkey(mr->rkey);
-	ib_update_fast_reg_key(mr, rkey);
-}
-
-static int
-isert_fast_reg_mr(struct isert_conn *isert_conn,
-		  struct fast_reg_descriptor *fr_desc,
-		  struct isert_data_buf *mem,
-		  enum isert_indicator ind,
-		  struct ib_sge *sge)
-{
-	struct isert_device *device = isert_conn->device;
-	struct ib_device *ib_dev = device->ib_device;
-	struct ib_mr *mr;
-	struct ib_reg_wr reg_wr;
-	struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
-	int ret, n;
-
-	if (mem->dma_nents == 1) {
-		sge->lkey = device->pd->local_dma_lkey;
-		sge->addr = ib_sg_dma_address(ib_dev, &mem->sg[0]);
-		sge->length = ib_sg_dma_len(ib_dev, &mem->sg[0]);
-		isert_dbg("sge: addr: 0x%llx  length: %u lkey: %x\n",
-			 sge->addr, sge->length, sge->lkey);
-		return 0;
-	}
-
-	if (ind == ISERT_DATA_KEY_VALID)
-		/* Registering data buffer */
-		mr = fr_desc->data_mr;
-	else
-		/* Registering protection buffer */
-		mr = fr_desc->pi_ctx->prot_mr;
-
-	if (!(fr_desc->ind & ind)) {
-		isert_inv_rkey(&inv_wr, mr);
-		wr = &inv_wr;
-	}
-
-	n = ib_map_mr_sg(mr, mem->sg, mem->nents, 0, PAGE_SIZE);
-	if (unlikely(n != mem->nents)) {
-		isert_err("failed to map mr sg (%d/%d)\n",
-			 n, mem->nents);
-		return n < 0 ? n : -EINVAL;
-	}
-
-	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
-		  fr_desc, mem->nents, mem->offset);
-
-	reg_wr.wr.next = NULL;
-	reg_wr.wr.opcode = IB_WR_REG_MR;
-	reg_wr.wr.wr_cqe = NULL;
-	reg_wr.wr.send_flags = 0;
-	reg_wr.wr.num_sge = 0;
-	reg_wr.mr = mr;
-	reg_wr.key = mr->lkey;
-	reg_wr.access = IB_ACCESS_LOCAL_WRITE;
-
-	if (!wr)
-		wr = &reg_wr.wr;
-	else
-		wr->next = &reg_wr.wr;
-
-	ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
-	if (ret) {
-		isert_err("fast registration failed, ret:%d\n", ret);
-		return ret;
-	}
-	fr_desc->ind &= ~ind;
-
-	sge->lkey = mr->lkey;
-	sge->addr = mr->iova;
-	sge->length = mr->length;
-
-	isert_dbg("sge: addr: 0x%llx  length: %u lkey: %x\n",
-		  sge->addr, sge->length, sge->lkey);
-
-	return ret;
-}
-
 static inline void
 isert_set_dif_domain(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs,
 		     struct ib_sig_domain *domain)
@@ -2538,6 +2059,8 @@ isert_set_dif_domain(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs,
 static int
 isert_set_sig_attrs(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs)
 {
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
 	switch (se_cmd->prot_op) {
 	case TARGET_PROT_DIN_INSERT:
 	case TARGET_PROT_DOUT_STRIP:
@@ -2559,229 +2082,152 @@ isert_set_sig_attrs(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs)
 		return -EINVAL;
 	}
 
+	sig_attrs->check_mask =
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
 	return 0;
 }
 
-static inline u8
-isert_set_prot_checks(u8 prot_checks)
-{
-	return (prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-	       (prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
-	       (prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
-}
-
 static int
-isert_reg_sig_mr(struct isert_conn *isert_conn,
+isert_reg_sig_mr(struct ib_qp *qp,
 		 struct isert_cmd *isert_cmd,
-		 struct fast_reg_descriptor *fr_desc)
+		 struct ib_sig_attrs *sig_attrs)
 {
 	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
 	struct ib_sig_handover_wr sig_wr;
 	struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
-	struct pi_context *pi_ctx = fr_desc->pi_ctx;
-	struct ib_sig_attrs sig_attrs;
+	struct ib_rdma_wr rdma_wr;
+	struct ib_sge sge;
 	int ret;
 
-	memset(&sig_attrs, 0, sizeof(sig_attrs));
-	ret = isert_set_sig_attrs(se_cmd, &sig_attrs);
-	if (ret)
-		goto err;
-
-	sig_attrs.check_mask = isert_set_prot_checks(se_cmd->prot_checks);
+	isert_cmd->sig_mr = ib_mr_pool_get(qp, &qp->sig_mrs);
+	if (!isert_cmd->sig_mr)
+		return -EAGAIN;
 
-	if (!(fr_desc->ind & ISERT_SIG_KEY_VALID)) {
-		isert_inv_rkey(&inv_wr, pi_ctx->sig_mr);
+	if (1 /* add need_inval flag to the MR */) {
+		memset(&inv_wr, 0, sizeof(inv_wr));
+		inv_wr.opcode = IB_WR_LOCAL_INV;
+		inv_wr.ex.invalidate_rkey = isert_cmd->sig_mr->rkey;
+		ib_update_fast_reg_key(isert_cmd->sig_mr,
+			ib_inc_rkey(isert_cmd->sig_mr->rkey));
 		wr = &inv_wr;
 	}
 
 	memset(&sig_wr, 0, sizeof(sig_wr));
 	sig_wr.wr.opcode = IB_WR_REG_SIG_MR;
 	sig_wr.wr.wr_cqe = NULL;
-	sig_wr.wr.sg_list = &isert_cmd->ib_sg[DATA];
+	sig_wr.wr.sg_list = &isert_cmd->data.reg->sge;
 	sig_wr.wr.num_sge = 1;
 	sig_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
-	sig_wr.sig_attrs = &sig_attrs;
-	sig_wr.sig_mr = pi_ctx->sig_mr;
+	sig_wr.sig_attrs = sig_attrs;
+	sig_wr.sig_mr = isert_cmd->sig_mr;
 	if (se_cmd->t_prot_sg)
-		sig_wr.prot = &isert_cmd->ib_sg[PROT];
+		sig_wr.prot = &isert_cmd->prot.reg->sge;
 
 	if (!wr)
 		wr = &sig_wr.wr;
 	else
 		wr->next = &sig_wr.wr;
 
-	ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
-	if (ret) {
-		isert_err("fast registration failed, ret:%d\n", ret);
-		goto err;
-	}
-	fr_desc->ind &= ~ISERT_SIG_KEY_VALID;
-
-	isert_cmd->ib_sg[SIG].lkey = pi_ctx->sig_mr->lkey;
-	isert_cmd->ib_sg[SIG].addr = 0;
-	isert_cmd->ib_sg[SIG].length = se_cmd->data_length;
+	sge.lkey = isert_cmd->sig_mr->lkey;
+	sge.addr = 0;
+	sge.length = se_cmd->data_length;
 	if (se_cmd->prot_op != TARGET_PROT_DIN_STRIP &&
 	    se_cmd->prot_op != TARGET_PROT_DOUT_INSERT)
 		/*
 		 * We have protection guards on the wire
 		 * so we need to set a larget transfer
 		 */
-		isert_cmd->ib_sg[SIG].length += se_cmd->prot_length;
+		sge.length += se_cmd->prot_length;
 
 	isert_dbg("sig_sge: addr: 0x%llx  length: %u lkey: %x\n",
-		  isert_cmd->ib_sg[SIG].addr, isert_cmd->ib_sg[SIG].length,
-		  isert_cmd->ib_sg[SIG].lkey);
-err:
-	return ret;
-}
+		  sge.addr, sge.length, sge.lkey);
 
-static int
-isert_handle_prot_cmd(struct isert_conn *isert_conn,
-		      struct isert_cmd *isert_cmd)
-{
-	struct isert_device *device = isert_conn->device;
-	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
-	int ret;
+	rdma_wr.wr.sg_list = &sge;
+	rdma_wr.wr.num_sge = 1;
+	rdma_wr.wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
+	rdma_wr.wr.send_flags = IB_SEND_SIGNALED;
 
-	if (!isert_cmd->fr_desc->pi_ctx) {
-		ret = isert_create_pi_ctx(isert_cmd->fr_desc,
-					  device->ib_device,
-					  device->pd);
-		if (ret) {
-			isert_err("conn %p failed to allocate pi_ctx\n",
-				  isert_conn);
-			return ret;
-		}
+	if (isert_cmd->data.dma_dir == DMA_TO_DEVICE) {
+		rdma_wr.wr.opcode = IB_WR_RDMA_WRITE;
+		rdma_wr.remote_addr = isert_cmd->read_va;
+		rdma_wr.rkey = isert_cmd->read_stag;
+	} else {
+		rdma_wr.wr.opcode = IB_WR_RDMA_READ;
+		rdma_wr.remote_addr = isert_cmd->write_va;
+		rdma_wr.rkey = isert_cmd->write_stag;
 	}
 
-	if (se_cmd->t_prot_sg) {
-		ret = isert_map_data_buf(isert_conn, isert_cmd,
-					 se_cmd->t_prot_sg,
-					 se_cmd->t_prot_nents,
-					 se_cmd->prot_length,
-					 0,
-					 isert_cmd->iser_ib_op,
-					 &isert_cmd->prot);
-		if (ret) {
-			isert_err("conn %p failed to map protection buffer\n",
-				  isert_conn);
-			return ret;
-		}
+	sig_wr.wr.next = &rdma_wr.wr;
 
-		memset(&isert_cmd->ib_sg[PROT], 0, sizeof(isert_cmd->ib_sg[PROT]));
-		ret = isert_fast_reg_mr(isert_conn, isert_cmd->fr_desc,
-					&isert_cmd->prot,
-					ISERT_PROT_KEY_VALID,
-					&isert_cmd->ib_sg[PROT]);
-		if (ret) {
-			isert_err("conn %p failed to fast reg mr\n",
-				  isert_conn);
-			goto unmap_prot_cmd;
-		}
-	}
-
-	ret = isert_reg_sig_mr(isert_conn, isert_cmd, isert_cmd->fr_desc);
+	ret = ib_post_send(qp, wr, &bad_wr);
 	if (ret) {
-		isert_err("conn %p failed to fast reg mr\n",
-			  isert_conn);
-		goto unmap_prot_cmd;
+		isert_err("sig registration failed, ret:%d\n", ret);
+		goto err;
 	}
-	isert_cmd->fr_desc->ind |= ISERT_PROTECTED;
-
-	return 0;
-
-unmap_prot_cmd:
-	if (se_cmd->t_prot_sg)
-		isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
 
+err:
 	return ret;
 }
 
-static int
-isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
+static void
+isert_handle_prot_cmd(struct isert_conn *isert_conn,
+		      struct isert_cmd *isert_cmd)
 {
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = conn->context;
-	struct fast_reg_descriptor *fr_desc = NULL;
-	struct ib_rdma_wr *rdma_wr;
-	struct ib_sge *ib_sg;
-	u32 offset;
-	int ret = 0;
-	unsigned long flags;
-
-	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
-			cmd->write_data_done : 0;
-	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
-				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, isert_cmd->iser_ib_op,
-				 &isert_cmd->data);
-	if (ret)
-		return ret;
-
-	if (isert_cmd->data.dma_nents != 1 ||
-	    isert_prot_cmd(isert_conn, se_cmd)) {
-		spin_lock_irqsave(&isert_conn->pool_lock, flags);
-		fr_desc = list_first_entry(&isert_conn->fr_pool,
-					   struct fast_reg_descriptor, list);
-		list_del(&fr_desc->list);
-		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
-		isert_cmd->fr_desc = fr_desc;
-	}
-
-	ret = isert_fast_reg_mr(isert_conn, fr_desc, &isert_cmd->data,
-				ISERT_DATA_KEY_VALID, &isert_cmd->ib_sg[DATA]);
-	if (ret)
-		goto unmap_cmd;
+	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
+	struct ib_sig_attrs sig_attrs;
+	struct ib_send_wr *first_wr, *bad_wr;
+	int ret;
 
-	if (isert_prot_cmd(isert_conn, se_cmd)) {
-		ret = isert_handle_prot_cmd(isert_conn, isert_cmd);
-		if (ret)
-			goto unmap_cmd;
+	/*
+	 * Nasty layering violation by poking deep into RW API internals in
+	 * terms of WR chaining.  Hopefully we can move this to the core
+	 * and/or clean it up.
+	 */
+	BUG_ON(isert_cmd->data.nr_ops > 1);
 
-		ib_sg = &isert_cmd->ib_sg[SIG];
-	} else {
-		ib_sg = &isert_cmd->ib_sg[DATA];
+	if (se_cmd->t_prot_sg) {
+		ret = rdma_rw_ctx_init(&isert_cmd->prot, isert_conn->qp,
+				isert_conn->cm_id->port_num, se_cmd->t_prot_sg,
+				se_cmd->t_prot_nents, se_cmd->prot_length,
+				0, 0, /* va and stag are unused */
+				DMA_TO_DEVICE, 0);
+		if (ret < 0)
+			return;
+
+		BUG_ON(isert_cmd->prot.nr_ops > 1);
+
+		isert_cmd->data.reg->reg_wr.wr.next = &isert_cmd->prot.reg->reg_wr.wr;
+		if (1 /* need_inval in MR */) {
+			isert_cmd->data.reg->inv_wr.next = &isert_cmd->prot.reg->inv_wr;
+			isert_cmd->prot.reg->inv_wr.next = &isert_cmd->data.reg->reg_wr.wr;
+		}
 	}
 
-	memcpy(&isert_cmd->s_ib_sge, ib_sg, sizeof(*ib_sg));
-	isert_cmd->ib_sge = &isert_cmd->s_ib_sge;
-	isert_cmd->rdma_wr_num = 1;
-	memset(&isert_cmd->s_rdma_wr, 0, sizeof(isert_cmd->s_rdma_wr));
-	isert_cmd->rdma_wr = &isert_cmd->s_rdma_wr;
-
-	rdma_wr = &isert_cmd->s_rdma_wr;
-	rdma_wr->wr.sg_list = &isert_cmd->s_ib_sge;
-	rdma_wr->wr.num_sge = 1;
-	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
-	if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
-		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+	isert_cmd->data.reg->reg_wr.wr.next = NULL;
 
-		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
-		rdma_wr->remote_addr = isert_cmd->read_va;
-		rdma_wr->rkey = isert_cmd->read_stag;
-		rdma_wr->wr.send_flags = !isert_prot_cmd(isert_conn, se_cmd) ?
-				      0 : IB_SEND_SIGNALED;
-	} else {
-		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
+	if (1 /* need_inval in MR */)
+		first_wr = &isert_cmd->data.reg->inv_wr;
+	else
+		first_wr = &isert_cmd->data.reg->reg_wr.wr;
 
-		rdma_wr->wr.opcode = IB_WR_RDMA_READ;
-		rdma_wr->remote_addr = isert_cmd->write_va;
-		rdma_wr->rkey = isert_cmd->write_stag;
-		rdma_wr->wr.send_flags = IB_SEND_SIGNALED;
+	ret = ib_post_send(isert_conn->qp, first_wr, &bad_wr);
+	if (ret) {
+		isert_err("mem registration failed, ret:%d\n", ret);
+		return;
 	}
 
-	return 0;
+	ret = isert_set_sig_attrs(se_cmd, &sig_attrs);
+	if (ret)
+		return;
 
-unmap_cmd:
-	if (fr_desc) {
-		spin_lock_irqsave(&isert_conn->pool_lock, flags);
-		list_add_tail(&fr_desc->list, &isert_conn->fr_pool);
-		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
+	ret = isert_reg_sig_mr(isert_conn->qp, isert_cmd, &sig_attrs);
+	if (ret) {
+		isert_err("conn %p failed to fast reg mr\n",
+			  isert_conn);
+		return;
 	}
-	isert_unmap_data_buf(isert_conn, &isert_cmd->data);
-
-	return ret;
 }
 
 static int
@@ -2790,21 +2236,25 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
-	struct ib_send_wr *wr_failed;
 	int rc;
 
 	isert_dbg("Cmd: %p RDMA_WRITE data_length: %u\n",
 		 isert_cmd, se_cmd->data_length);
 
-	isert_cmd->iser_ib_op = ISER_IB_RDMA_WRITE;
-	rc = device->reg_rdma_mem(isert_cmd, conn);
-	if (rc) {
+	rc = rdma_rw_ctx_init(&isert_cmd->data, isert_conn->qp,
+			isert_conn->cm_id->port_num, se_cmd->t_data_sg,
+			se_cmd->t_data_nents, se_cmd->data_length,
+			isert_cmd->read_va, isert_cmd->read_stag,
+			DMA_TO_DEVICE, 0);
+	if (rc < 0) {
 		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
 		return rc;
 	}
 
-	if (!isert_prot_cmd(isert_conn, se_cmd)) {
+	if (isert_prot_cmd(isert_conn, se_cmd)) {
+		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+		isert_handle_prot_cmd(isert_conn, isert_cmd);
+	} else {
 		/*
 		 * Build isert_conn->tx_desc for iSCSI response PDU and attach
 		 */
@@ -2815,27 +2265,21 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 		isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
 		isert_init_send_wr(isert_conn, isert_cmd,
 				   &isert_cmd->tx_desc.send_wr);
-		isert_cmd->s_rdma_wr.wr.next = &isert_cmd->tx_desc.send_wr;
-		isert_cmd->rdma_wr_num += 1;
 
 		rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
 		if (rc) {
 			isert_err("ib_post_recv failed with %d\n", rc);
 			return rc;
 		}
+	
+		rc = rdma_rw_ctx_post(&isert_cmd->data, isert_conn->qp,
+				isert_conn->cm_id->port_num,
+				NULL, &isert_cmd->tx_desc.send_wr);
+		if (rc)
+			isert_warn("posting RDMA WRITE failed\n");
 	}
 
-	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
-	if (rc)
-		isert_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
-
-	if (!isert_prot_cmd(isert_conn, se_cmd))
-		isert_dbg("Cmd: %p posted RDMA_WRITE + Response for iSER Data "
-			 "READ\n", isert_cmd);
-	else
-		isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ\n",
-			 isert_cmd);
-
+	isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ\n", isert_cmd);
 	return 1;
 }
 
@@ -2845,22 +2289,31 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
-	struct ib_send_wr *wr_failed;
 	int rc;
 
 	isert_dbg("Cmd: %p RDMA_READ data_length: %u write_data_done: %u\n",
 		 isert_cmd, se_cmd->data_length, cmd->write_data_done);
-	isert_cmd->iser_ib_op = ISER_IB_RDMA_READ;
-	rc = device->reg_rdma_mem(isert_cmd, conn);
-	if (rc) {
+
+	rc = rdma_rw_ctx_init(&isert_cmd->data, isert_conn->qp,
+			isert_conn->cm_id->port_num, se_cmd->t_data_sg,
+			se_cmd->t_data_nents, se_cmd->data_length,
+			isert_cmd->write_va, isert_cmd->write_stag,
+			DMA_FROM_DEVICE, cmd->write_data_done);
+	if (rc < 0) {
 		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
 		return rc;
 	}
 
-	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
-	if (rc)
-		isert_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
+	isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
+	if (isert_prot_cmd(isert_conn, se_cmd)) {
+		isert_handle_prot_cmd(isert_conn, isert_cmd);
+	} else {
+		rc = rdma_rw_ctx_post(&isert_cmd->data, isert_conn->qp,
+				isert_conn->cm_id->port_num,
+				&isert_cmd->tx_desc.tx_cqe, NULL);
+		if (rc)
+			isert_warn("posting RDMA READ failed\n");
+	}
 
 	isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
 		 isert_cmd);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 2614403..4e6af12 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
 #include <linux/in6.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <rdma/rw.h>
 #include <scsi/iser.h>
 
 
@@ -57,11 +58,12 @@
 
 #define ISERT_INFLIGHT_DATAOUTS	8
 
-#define ISERT_QP_MAX_REQ_DTOS	(ISCSI_DEF_XMIT_CMDS_MAX *    \
-				(1 + ISERT_INFLIGHT_DATAOUTS) + \
+#define ISERT_QP_MAX_REQ_DTOS	(ISCSI_DEF_XMIT_CMDS_MAX +    \
 				ISERT_MAX_TX_MISC_PDUS	+ \
 				ISERT_MAX_RX_MISC_PDUS)
 
+#define ISERT_QP_MAX_RDMA_OPS	(ISCSI_DEF_XMIT_CMDS_MAX * ISERT_INFLIGHT_DATAOUTS)
+
 #define ISER_RX_PAD_SIZE	(ISER_RECV_DATA_SEG_LEN + 4096 - \
 		(ISER_RX_PAYLOAD_SIZE + sizeof(u64) + sizeof(struct ib_sge) + \
 		 sizeof(struct ib_cqe)))
@@ -73,13 +75,6 @@ enum isert_desc_type {
 	ISCSI_TX_DATAIN
 };
 
-enum iser_ib_op_code {
-	ISER_IB_RECV,
-	ISER_IB_SEND,
-	ISER_IB_RDMA_WRITE,
-	ISER_IB_RDMA_READ,
-};
-
 enum iser_conn_state {
 	ISER_CONN_INIT,
 	ISER_CONN_UP,
@@ -109,41 +104,6 @@ struct iser_tx_desc {
 	struct ib_send_wr send_wr;
 } __packed;
 
-enum isert_indicator {
-	ISERT_PROTECTED		= 1 << 0,
-	ISERT_DATA_KEY_VALID	= 1 << 1,
-	ISERT_PROT_KEY_VALID	= 1 << 2,
-	ISERT_SIG_KEY_VALID	= 1 << 3,
-};
-
-struct pi_context {
-	struct ib_mr		       *prot_mr;
-	struct ib_mr		       *sig_mr;
-};
-
-struct fast_reg_descriptor {
-	struct list_head		list;
-	struct ib_mr		       *data_mr;
-	u8				ind;
-	struct pi_context	       *pi_ctx;
-};
-
-struct isert_data_buf {
-	struct scatterlist     *sg;
-	int			nents;
-	u32			sg_off;
-	u32			len; /* cur_rdma_length */
-	u32			offset;
-	unsigned int		dma_nents;
-	enum dma_data_direction dma_dir;
-};
-
-enum {
-	DATA = 0,
-	PROT = 1,
-	SIG = 2,
-};
-
 struct isert_cmd {
 	uint32_t		read_stag;
 	uint32_t		write_stag;
@@ -156,16 +116,9 @@ struct isert_cmd {
 	struct iscsi_cmd	*iscsi_cmd;
 	struct iser_tx_desc	tx_desc;
 	struct iser_rx_desc	*rx_desc;
-	enum iser_ib_op_code	iser_ib_op;
-	struct ib_sge		*ib_sge;
-	struct ib_sge		s_ib_sge;
-	int			rdma_wr_num;
-	struct ib_rdma_wr	*rdma_wr;
-	struct ib_rdma_wr	s_rdma_wr;
-	struct ib_sge		ib_sg[3];
-	struct isert_data_buf	data;
-	struct isert_data_buf	prot;
-	struct fast_reg_descriptor *fr_desc;
+	struct rdma_rw_ctx	data;
+	struct rdma_rw_ctx	prot;
+	struct ib_mr		*sig_mr;
 	struct work_struct	comp_work;
 	struct scatterlist	sg;
 };
@@ -198,10 +151,6 @@ struct isert_conn {
 	struct completion	wait;
 	struct completion	wait_comp_err;
 	struct kref		kref;
-	struct list_head	fr_pool;
-	int			fr_pool_size;
-	/* lock to protect fastreg pool */
-	spinlock_t		pool_lock;
 	struct work_struct	release_work;
 	struct ib_recv_wr       beacon;
 	bool                    logout_posted;
@@ -225,7 +174,6 @@ struct isert_comp {
 };
 
 struct isert_device {
-	int			use_fastreg;
 	bool			pi_capable;
 	int			refcount;
 	struct ib_device	*ib_device;
@@ -233,10 +181,6 @@ struct isert_device {
 	struct isert_comp	*comps;
 	int                     comps_used;
 	struct list_head	dev_node;
-	int			(*reg_rdma_mem)(struct isert_cmd *isert_cmd,
-						struct iscsi_conn *conn);
-	void			(*unreg_rdma_mem)(struct isert_cmd *isert_cmd,
-						  struct isert_conn *isert_conn);
 };
 
 struct isert_np {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] IB/isert: RW API WIP
  2016-02-27 18:10 ` [PATCH 13/13] IB/isert: RW API WIP Christoph Hellwig
@ 2016-02-28 13:57   ` Sagi Grimberg
  2016-02-28 16:04     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-28 13:57 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, target-devel


>   drivers/infiniband/ulp/isert/ib_isert.c | 855 ++++++--------------------------
>   drivers/infiniband/ulp/isert/ib_isert.h |  70 +--
>   2 files changed, 161 insertions(+), 764 deletions(-)

Ha! 603 LOC deletion!

I plan to incorporate the signature stuff into the rdma_rw API
and then we can reduce it even further! It'll take me some time
though...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
       [not found]     ` <1456596631-19418-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-02-28 14:57       ` Sagi Grimberg
  2016-02-28 16:20         ` Christoph Hellwig
  2016-02-29 11:15         ` Christoph Hellwig
  0 siblings, 2 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-28 14:57 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA


> @@ -1593,6 +1592,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
>    * @mr:            memory region
>    * @sgl:           dma mapped scatterlist
>    * @sg_nents:      number of entries in sg
> + * @sg_offset:     offset in byes into sg
>    * @set_page:      driver page assignment function pointer
>    *
>    * Core service helper for drivers to convert the largest
> @@ -1603,10 +1603,8 @@ EXPORT_SYMBOL(ib_map_mr_sg);
>    * Returns the number of sg elements that were assigned to
>    * a page vector.
>    */
> -int ib_sg_to_pages(struct ib_mr *mr,
> -		   struct scatterlist *sgl,
> -		   int sg_nents,
> -		   int (*set_page)(struct ib_mr *, u64))
> +int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
> +		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64))
>   {
>   	struct scatterlist *sg;
>   	u64 last_end_dma_addr = 0;
> @@ -1618,8 +1616,8 @@ int ib_sg_to_pages(struct ib_mr *mr,
>   	mr->length = 0;
>
>   	for_each_sg(sgl, sg, sg_nents, i) {
> -		u64 dma_addr = sg_dma_address(sg);
> -		unsigned int dma_len = sg_dma_len(sg);
> +		u64 dma_addr = sg_dma_address(sg) + sg_offset;
> +		unsigned int dma_len = sg_dma_len(sg) - sg_offset;
>   		u64 end_dma_addr = dma_addr + dma_len;
>   		u64 page_addr = dma_addr & page_mask;
>
> @@ -1652,6 +1650,8 @@ next_page:
>   		mr->length += dma_len;
>   		last_end_dma_addr = end_dma_addr;
>   		last_page_off = end_dma_addr & ~page_mask;
> +
> +		sg_offset = 0;
>   	}

This looks wrong...

The SG offset should not be taken into account in the address
translation vector that is given to the HW. The sg_offset needs to
be accounted in the mr->iova:

mr->iova = sg_dma_address(&sgl[0]) + sg_offset;

I think the page mappings should stay as the are.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] IB/core: generic RDMA READ/WRITE API
  2016-02-27 18:10   ` [PATCH 07/13] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-02-28 15:05     ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-28 15:05 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, target-devel


> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 9a77bb8..1ef3a1a 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -48,6 +48,7 @@
>   #include <rdma/ib_verbs.h>
>   #include <rdma/ib_cache.h>
>   #include <rdma/ib_addr.h>
> +#include <rdma/rw.h>
>
>   #include "core_priv.h"
>
> @@ -751,6 +752,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
>   {
>   	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
>   	struct ib_qp *qp;
> +	int ret;
> +
> +	/*
> +	 * If the callers is using the RDMA API calculate the resources
> +	 * needed for the RDMA READ/WRITE operations.
> +	 *
> +	 * Note that these callers need to pass in a port number.
> +	 */
> +	if (qp_init_attr->cap.max_rdma_ctxs)
> +		rdma_rw_init_qp(device, qp_init_attr);
>
>   	qp = device->create_qp(pd, qp_init_attr, NULL);
>   	if (IS_ERR(qp))
> @@ -764,6 +775,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
>   	atomic_set(&qp->usecnt, 0);
>   	qp->mrs_used = 0;
>   	spin_lock_init(&qp->mr_lock);
> +	INIT_LIST_HEAD(&qp->rdma_mrs);
>
>   	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
>   		return ib_create_xrc_qp(qp, qp_init_attr);
> @@ -787,6 +799,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
>
>   	atomic_inc(&pd->usecnt);
>   	atomic_inc(&qp_init_attr->send_cq->usecnt);
> +
> +	if (qp_init_attr->cap.max_rdma_ctxs) {
> +		ret = rdma_rw_init_mrs(qp, qp_init_attr);
> +		if (ret) {
> +			pr_err("failed to init MR pool ret= %d\n", ret);
> +			ib_destroy_qp(qp);
> +			qp = ERR_PTR(ret);
> +		}

This is new compared to what we did in fabrics.

Note that this is an ideal place to reserve send queue
entries for MR assisted rdmas (e.g. iWARP). This way the
ULP does not even need to be aware of the reservations!

> +	}
> +
>   	return qp;
>   }
>   EXPORT_SYMBOL(ib_create_qp);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr
  2016-02-27 18:10 ` [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
@ 2016-02-28 15:10   ` Sagi Grimberg
  2016-02-28 16:05     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-28 15:10 UTC (permalink / raw)
  To: Christoph Hellwig, linux-rdma; +Cc: swise, sagig, target-devel, Steve Wise

> This is the first step toward moving MR invalidation decisions
> to the core.  It will be needed by the upcoming RW API.

This makes sense even before the rdma rw stuff. We can
add this bit and get rid of the iser flagging (and when
nfs/srp will support remote invalidate they will use it too)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] IB/isert: RW API WIP
  2016-02-28 13:57   ` Sagi Grimberg
@ 2016-02-28 16:04     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:04 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel

On Sun, Feb 28, 2016 at 03:57:15PM +0200, Sagi Grimberg wrote:
>
>>   drivers/infiniband/ulp/isert/ib_isert.c | 855 ++++++--------------------------
>>   drivers/infiniband/ulp/isert/ib_isert.h |  70 +--
>>   2 files changed, 161 insertions(+), 764 deletions(-)
>
> Ha! 603 LOC deletion!
>
> I plan to incorporate the signature stuff into the rdma_rw API
> and then we can reduce it even further! It'll take me some time
> though...

The whole series including the CQ API actually removes more code
than it adds even with a single consumer.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr
  2016-02-28 15:10   ` Sagi Grimberg
@ 2016-02-28 16:05     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:05 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel,
	Steve Wise

On Sun, Feb 28, 2016 at 05:10:26PM +0200, Sagi Grimberg wrote:
>> This is the first step toward moving MR invalidation decisions
>> to the core.  It will be needed by the upcoming RW API.
>
> This makes sense even before the rdma rw stuff. We can
> add this bit and get rid of the iser flagging (and when
> nfs/srp will support remote invalidate they will use it too)

If someone else has a use for it ASAP feel free to send it along
with that.  The R/W code currently is my priority project because
the NVMe target driver will need it.  Once that and a few other
urgent items are off my plate I'll happily help out with various
cleanups on the client side MR API again.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-02-28 14:57       ` Sagi Grimberg
@ 2016-02-28 16:20         ` Christoph Hellwig
  2016-02-28 17:50           ` Sagi Grimberg
  2016-02-29 11:15         ` Christoph Hellwig
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-28 16:20 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel

On Sun, Feb 28, 2016 at 04:57:59PM +0200, Sagi Grimberg wrote:
> This looks wrong...
>
> The SG offset should not be taken into account in the address
> translation vector that is given to the HW. The sg_offset needs to
> be accounted in the mr->iova:
>
> mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
>
> I think the page mappings should stay as the are.

I think you're right.  Do you have any good suggestion how to trigger
iSER first burst data, as apparently my normal testing didn't hit
this?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-02-28 16:20         ` Christoph Hellwig
@ 2016-02-28 17:50           ` Sagi Grimberg
       [not found]             ` <56D33356.1020707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-28 17:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma, swise, sagig, target-devel


> I think you're right.  Do you have any good suggestion how to trigger
> iSER first burst data, as apparently my normal testing didn't hit
> this?

The first burst should happen by default (against LIO which support
ImmediateData and UnsolicitedDataOut). But it won't make a difference
for the initiator which registers the entire buffer, sends the first
burst and let the target read the rest accordingly...

Perhaps if you change the iscsi tpg FirstBurstLength to be subpage say
3k (default is 64k) you can get the isert (when using MRs over iwarp)
hit this...

Steve can you help?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-02-28 14:57       ` Sagi Grimberg
  2016-02-28 16:20         ` Christoph Hellwig
@ 2016-02-29 11:15         ` Christoph Hellwig
       [not found]           ` <20160229111557.GA11499-jcswGhMUV9g@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-29 11:15 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel

> This looks wrong...
>
> The SG offset should not be taken into account in the address
> translation vector that is given to the HW. The sg_offset needs to
> be accounted in the mr->iova:
>
> mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
>
> I think the page mappings should stay as the are.

I looked at this in a bit more detail, and I think we need both.

For PAGE_SIZE or smaller SG entries you're correct, and we don't
need the offset for dma_addr.  But it doesn't harm either.  But
for lager SG entries we need it to calculate the correct
base address.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
       [not found]           ` <20160229111557.GA11499-jcswGhMUV9g@public.gmane.org>
@ 2016-02-29 11:35             ` Sagi Grimberg
  2016-02-29 11:56               ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-29 11:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagig-VPRAkNaXOzVWk0Htik3J/w, target-devel-u79uwXL29TY76Z2rM5mHXA


> I looked at this in a bit more detail, and I think we need both.
>
> For PAGE_SIZE or smaller SG entries you're correct, and we don't
> need the offset for dma_addr.  But it doesn't harm either.

I think it can harm us.

> But for lager SG entries we need it to calculate the correct
> base address.

I'm not sure if this is true either. Can you explain why?

The Memory region mapping is described by:
1. page vector: [addr0, addr1, addr2,...]
2. iova: the first byte offset
3. length: the total byte count of the mr
4. page_size: size of each page in the page vector

This means that the HCA assumes that each address in
the page vector has the size of page_size, also the region
can start at some offset (iova - addr0), and it has some length.

So say the HCA wants to write 8k to the MR:
first page_size (4k) will be written starting from addr0, and
the next page_size (4k) will be written starting from addr1.
If you set addr0 = page_addr + offset then the HW will assume it can
access addr0 + page_size which is not what we want.

I think that the page vectors should contain page addresses and not
incorporate offsets.

Thoughts?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-02-29 11:35             ` Sagi Grimberg
@ 2016-02-29 11:56               ` Christoph Hellwig
  2016-02-29 12:08                 ` Sagi Grimberg
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-02-29 11:56 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Christoph Hellwig, linux-rdma, swise, sagig, target-devel

On Mon, Feb 29, 2016 at 01:35:44PM +0200, Sagi Grimberg wrote:
>> But for lager SG entries we need it to calculate the correct
>> base address.
>
> I'm not sure if this is true either. Can you explain why?

Assume we get a SG entry that is exactly 2 pages (8k) long.  But we
also have an offset of 6k into it, so we need to skip into the
second page to make the following work:

>
> The Memory region mapping is described by:
> 1. page vector: [addr0, addr1, addr2,...]
> 2. iova: the first byte offset
> 3. length: the total byte count of the mr
> 4. page_size: size of each page in the page vector
>
> This means that the HCA assumes that each address in
> the page vector has the size of page_size, also the region
> can start at some offset (iova - addr0), and it has some length.

Exactly.

For the above case we don't need the page at sg_dma_address(), though.
We need the one after it, so we need to make sure the page address
is calculated for the second page in the SG entry.

If we add sg_offset to dma_addr is in my page this means we get the
right page address from this line:

		u64 page_addr = dma_addr & page_mask;

without that's we'd get the address of the first page, which doesn't
actually contain any data we want to map.

> So say the HCA wants to write 8k to the MR:
> first page_size (4k) will be written starting from addr0, and
> the next page_size (4k) will be written starting from addr1.
> If you set addr0 = page_addr + offset then the HW will assume it can
> access addr0 + page_size which is not what we want.
>
> I think that the page vectors should contain page addresses and not
> incorporate offsets.

The

		u64 page_addr = dma_addr & page_mask;

line ensures we always have a page address.  But to get the page address
for the correct page we need to add the offset to dma_addr.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-02-29 11:56               ` Christoph Hellwig
@ 2016-02-29 12:08                 ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-02-29 12:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma, swise, sagig, target-devel



On 29/02/2016 13:56, Christoph Hellwig wrote:
> On Mon, Feb 29, 2016 at 01:35:44PM +0200, Sagi Grimberg wrote:
>>> But for lager SG entries we need it to calculate the correct
>>> base address.
>>
>> I'm not sure if this is true either. Can you explain why?
>
> Assume we get a SG entry that is exactly 2 pages (8k) long.  But we
> also have an offset of 6k into it, so we need to skip into the
> second page to make the following work:
>
>>
>> The Memory region mapping is described by:
>> 1. page vector: [addr0, addr1, addr2,...]
>> 2. iova: the first byte offset
>> 3. length: the total byte count of the mr
>> 4. page_size: size of each page in the page vector
>>
>> This means that the HCA assumes that each address in
>> the page vector has the size of page_size, also the region
>> can start at some offset (iova - addr0), and it has some length.
>
> Exactly.
>
> For the above case we don't need the page at sg_dma_address(), though.
> We need the one after it, so we need to make sure the page address
> is calculated for the second page in the SG entry.
>
> If we add sg_offset to dma_addr is in my page this means we get the
> right page address from this line:
>
> 		u64 page_addr = dma_addr & page_mask;
>
> without that's we'd get the address of the first page, which doesn't
> actually contain any data we want to map.

Yea, makes sense...

I get it now! Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
       [not found]             ` <56D33356.1020707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2016-02-29 22:22               ` Steve Wise
  0 siblings, 0 replies; 31+ messages in thread
From: Steve Wise @ 2016-02-29 22:22 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sagig-VPRAkNaXOzVWk0Htik3J/w,
	target-devel-u79uwXL29TY76Z2rM5mHXA



On 2/28/2016 11:50 AM, Sagi Grimberg wrote:
>
>> I think you're right.  Do you have any good suggestion how to trigger
>> iSER first burst data, as apparently my normal testing didn't hit
>> this?
>
> The first burst should happen by default (against LIO which support
> ImmediateData and UnsolicitedDataOut). But it won't make a difference
> for the initiator which registers the entire buffer, sends the first
> burst and let the target read the rest accordingly...
>
> Perhaps if you change the iscsi tpg FirstBurstLength to be subpage say
> 3k (default is 64k) you can get the isert (when using MRs over iwarp)
> hit this...
>
> Steve can you help?

I can try this out.  Did this discussion result in needing a code change 
though?

Also, one acid test I do that produced non-zero initial offsets for 
nfsrdma server was mounting a large ramdisk to the client and build the 
kernel with 'make -j 16 O=blah' with O= pointing to the mounted remote 
ramdisk.  nfsrdma is very different from iser/block protocols but its 
still a good (evil) test. :)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/13] IB/core: add a simple MR pool
  2016-02-27 18:10   ` [PATCH 05/13] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-03-02  2:48     ` Parav Pandit
  2016-03-02  9:15       ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2016-03-02  2:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-rdma, swise, Sagi Grimberg, target-devel

Hi Christoph,

On Sat, Feb 27, 2016 at 11:40 PM, Christoph Hellwig <hch@lst.de> wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 267f11e..1e68dae 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1408,6 +1408,10 @@ struct ib_qp {
>         struct ib_srq          *srq;
>         struct ib_xrcd         *xrcd; /* XRC TGT QPs only */
>         struct list_head        xrcd_list;
> +
> +       spinlock_t              mr_lock;
> +       int                     mrs_used;
> +

Can you please add the comment for mr_lock as requested by the
checkpatch script?

Also you might want to consider adding this field after recv_cq so
that we find mr_lock and used count in single cache line along with
other data for the qp?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/13] IB/core: add a simple MR pool
  2016-03-02  2:48     ` Parav Pandit
@ 2016-03-02  9:15       ` Christoph Hellwig
  2016-03-02 15:22         ` Bart Van Assche
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-03-02  9:15 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Christoph Hellwig, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Sagi Grimberg,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 02, 2016 at 08:18:14AM +0530, Parav Pandit wrote:
> >         struct list_head        xrcd_list;
> > +
> > +       spinlock_t              mr_lock;
> > +       int                     mrs_used;
> > +
> 
> Can you please add the comment for mr_lock as requested by the
> checkpatch script?

No.  checkpath is asking for silly things (and apparently this one
is so silly it doesn't even ask for it by default).

> Also you might want to consider adding this field after recv_cq so
> that we find mr_lock and used count in single cache line along with
> other data for the qp?

That sounds useful, I'll look into it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/13] IB/core: add a simple MR pool
  2016-03-02  9:15       ` Christoph Hellwig
@ 2016-03-02 15:22         ` Bart Van Assche
       [not found]           ` <56D70543.1000506-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Bart Van Assche @ 2016-03-02 15:22 UTC (permalink / raw)
  To: Christoph Hellwig, Parav Pandit
  Cc: linux-rdma, swise, Sagi Grimberg, target-devel

On 03/02/16 01:15, Christoph Hellwig wrote:
> On Wed, Mar 02, 2016 at 08:18:14AM +0530, Parav Pandit wrote:
>>>          struct list_head        xrcd_list;
>>> +
>>> +       spinlock_t              mr_lock;
>>> +       int                     mrs_used;
>>> +
>>
>> Can you please add the comment for mr_lock as requested by the
>> checkpatch script?
>
> No.  checkpath is asking for silly things (and apparently this one
> is so silly it doesn't even ask for it by default).
>
>> Also you might want to consider adding this field after recv_cq so
>> that we find mr_lock and used count in single cache line along with
>> other data for the qp?
>
> That sounds useful, I'll look into it.

Hello Christoph,

With the approach of V2 of this patch series mr_lock and the MR pool 
list head exist in different structures which is unfortunate. How about 
introducing a new structure for the MR pool list head and mr_lock? An 
additional advantage of this approach is that it would allow to move the 
initialization of both structure members into ib_mr_pool_init().

Bart.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/13] IB/core: add a simple MR pool
       [not found]           ` <56D70543.1000506-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-03-03  8:30             ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-03-03  8:30 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Parav Pandit,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Sagi Grimberg,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 02, 2016 at 07:22:43AM -0800, Bart Van Assche wrote:
> Hello Christoph,
>
> With the approach of V2 of this patch series mr_lock and the MR pool list 
> head exist in different structures which is unfortunate. How about 
> introducing a new structure for the MR pool list head and mr_lock? An 
> additional advantage of this approach is that it would allow to move the 
> initialization of both structure members into ib_mr_pool_init().

Hi Bart,

I don't really understand what you mean.  Both the lock and the list_head
are in struct ib_qp, right next to each other.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-03-03  8:30 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-27 18:10 RFC: a first draft of a generic RDMA READ/WRITE API Christoph Hellwig
2016-02-27 18:10 ` [PATCH 04/13] IB/core: refactor ib_create_qp Christoph Hellwig
     [not found] ` <1456596631-19418-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-27 18:10   ` [PATCH 01/13] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
2016-02-27 18:10   ` [PATCH 02/13] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
     [not found]     ` <1456596631-19418-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-02-28 14:57       ` Sagi Grimberg
2016-02-28 16:20         ` Christoph Hellwig
2016-02-28 17:50           ` Sagi Grimberg
     [not found]             ` <56D33356.1020707-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-02-29 22:22               ` Steve Wise
2016-02-29 11:15         ` Christoph Hellwig
     [not found]           ` <20160229111557.GA11499-jcswGhMUV9g@public.gmane.org>
2016-02-29 11:35             ` Sagi Grimberg
2016-02-29 11:56               ` Christoph Hellwig
2016-02-29 12:08                 ` Sagi Grimberg
2016-02-27 18:10   ` [PATCH 03/13] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
2016-02-27 18:10   ` [PATCH 05/13] IB/core: add a simple MR pool Christoph Hellwig
2016-03-02  2:48     ` Parav Pandit
2016-03-02  9:15       ` Christoph Hellwig
2016-03-02 15:22         ` Bart Van Assche
     [not found]           ` <56D70543.1000506-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-03-03  8:30             ` Christoph Hellwig
2016-02-27 18:10   ` [PATCH 07/13] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
2016-02-28 15:05     ` Sagi Grimberg
2016-02-27 18:10   ` [PATCH 08/13] IB/isert: properly type the login buffer Christoph Hellwig
2016-02-27 18:10   ` [PATCH 09/13] IB/isert: convert to new CQ API Christoph Hellwig
2016-02-27 18:10   ` [PATCH 10/13] IB/isert: kill struct isert_rdma_wr Christoph Hellwig
2016-02-27 18:10   ` [PATCH 12/13] IB/core: add a MR pool for signature MRs Christoph Hellwig
2016-02-27 18:10 ` [PATCH 06/13] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
2016-02-28 15:10   ` Sagi Grimberg
2016-02-28 16:05     ` Christoph Hellwig
2016-02-27 18:10 ` [PATCH 11/13] IB/isert: the kill ->isert_cmd back pointer in the struct iser_tx_desc Christoph Hellwig
2016-02-27 18:10 ` [PATCH 13/13] IB/isert: RW API WIP Christoph Hellwig
2016-02-28 13:57   ` Sagi Grimberg
2016-02-28 16:04     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox