[PATCH v2 0/9] SRP initiator patches for kernel 3.16

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/9] SRP initiator patches for kernel 3.16
@ 2014-05-13 14:38 Bart Van Assche
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:38 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Changes compared to v1:
- Modified the FMR code such that one FMR pool is allocated per
  connection instead of one pool per HCA.
- Dropped the patch "Make srp_alloc_req_data() reallocate request data".
- Moved introduction of the register_always kernel module parameter
  into a separate patch.
- Removed the loop from around ib_create_fmr_pool() and
  srp_create_fr_pool(). max_pages_per_mr is now computed from
  max_mr_size and max_fast_reg_page_list_len.
- Reduced fast registration pool size from 1024 to scsi_host->can_queue.
- Added a patch that should fix a crash that had been reported by Sagi
  but that I have not yet been able to reproduce myself.

This patch series consists of the following nine patches:

0001-IB-srp-Fix-a-sporadic-crash-triggered-by-cable-pulli.patch
0002-IB-srp-Fix-kernel-doc-warnings.patch
0003-IB-srp-Introduce-an-additional-local-variable.patch
0004-IB-srp-Introduce-srp_map_fmr.patch
0005-IB-srp-Introduce-srp_finish_mapping.patch
0006-IB-srp-Introduce-the-register_always-kernel-module-p.patch
0007-IB-srp-One-FMR-pool-per-SRP-connection.patch
0008-IB-srp-Rename-FMR-related-variables.patch
0009-IB-srp-Add-fast-registration-support.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/9] IB/srp: Fix a sporadic crash triggered by cable pulling
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
@ 2014-05-13 14:39   ` Bart Van Assche
  2014-05-13 14:40   ` [PATCH v2 2/9] IB/srp: Fix kernel-doc warnings Bart Van Assche
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:39 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Avoid that srp_finish_req() can encounter a pointer to a SCSI command
in req->scmnd that is no longer associated with that request. If the
function srp_finish_req() is invoked twice for a SCSI command that is
not in flight then that would cause srp_unmap_data() to try to invoke
ib_fmr_pool_unmap() with an invalid pointer as argument, resulting in
a kernel oops.

Reported by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: stable <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 3.13+ ("IB/srp: Use SRP transport layer error recovery")
---
 drivers/infiniband/ulp/srp/ib_srp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 66a908b..427336a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1594,6 +1594,8 @@ err_unmap:
 err_iu:
 	srp_put_tx_iu(target, iu, SRP_IU_CMD);
 
+	req->scmnd = NULL; /* for srp_finish_req() */
+
 	spin_lock_irqsave(&target->lock, flags);
 	list_add(&req->list, &target->free_reqs);
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/9] IB/srp: Fix kernel-doc warnings
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
  2014-05-13 14:39   ` [PATCH v2 1/9] IB/srp: Fix a sporadic crash triggered by cable pulling Bart Van Assche
@ 2014-05-13 14:40   ` Bart Van Assche
  2014-05-13 14:40   ` [PATCH v2 3/9] IB/srp: Introduce an additional local variable Bart Van Assche
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:40 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Avoid that the kernel-doc tool warns about missing argument descriptions
for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 427336a..32955fa 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -813,6 +813,10 @@ static struct scsi_cmnd *srp_claim_req(struct srp_target_port *target,
 
 /**
  * srp_free_req() - Unmap data and add request to the free request list.
+ * @target: SRP target port.
+ * @req:    Request to be freed.
+ * @scmnd:  SCSI command associated with @req.
+ * @req_lim_delta: Amount to be added to @target->req_lim.
  */
 static void srp_free_req(struct srp_target_port *target,
 			 struct srp_request *req, struct scsi_cmnd *scmnd,
@@ -1455,6 +1459,7 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 
 /**
  * srp_tl_err_work() - handle a transport layer error
+ * @work: Work structure embedded in an SRP target port.
  *
  * Note: This function may get invoked before the rport has been created,
  * hence the target->rport test.
@@ -2312,6 +2317,8 @@ static struct class srp_class = {
 
 /**
  * srp_conn_unique() - check whether the connection to a target is unique
+ * @host:   SRP host.
+ * @target: SRP target port.
  */
 static bool srp_conn_unique(struct srp_host *host,
 			    struct srp_target_port *target)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/9] IB/srp: Introduce an additional local variable
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
  2014-05-13 14:39   ` [PATCH v2 1/9] IB/srp: Fix a sporadic crash triggered by cable pulling Bart Van Assche
  2014-05-13 14:40   ` [PATCH v2 2/9] IB/srp: Fix kernel-doc warnings Bart Van Assche
@ 2014-05-13 14:40   ` Bart Van Assche
  2014-05-13 14:41   ` [PATCH v2 4/9] IB/srp: Introduce srp_map_fmr() Bart Van Assche
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:40 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 32955fa..fa28ec3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -290,6 +290,7 @@ static int srp_new_cm_id(struct srp_target_port *target)
 
 static int srp_create_target_ib(struct srp_target_port *target)
 {
+	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_qp_init_attr *init_attr;
 	struct ib_cq *recv_cq, *send_cq;
 	struct ib_qp *qp;
@@ -299,16 +300,14 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	if (!init_attr)
 		return -ENOMEM;
 
-	recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_recv_completion, NULL, target,
+	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
 			       target->queue_size, target->comp_vector);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
-	send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_send_completion, NULL, target,
+	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
 			       target->queue_size, target->comp_vector);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
@@ -327,7 +326,7 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	init_attr->send_cq             = send_cq;
 	init_attr->recv_cq             = recv_cq;
 
-	qp = ib_create_qp(target->srp_host->srp_dev->pd, init_attr);
+	qp = ib_create_qp(dev->pd, init_attr);
 	if (IS_ERR(qp)) {
 		ret = PTR_ERR(qp);
 		goto err_send_cq;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/9] IB/srp: Introduce srp_map_fmr()
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-05-13 14:40   ` [PATCH v2 3/9] IB/srp: Introduce an additional local variable Bart Van Assche
@ 2014-05-13 14:41   ` Bart Van Assche
  2014-05-13 14:41   ` [PATCH v2 5/9] IB/srp: Introduce srp_finish_mapping() Bart Van Assche
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 77 ++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index fa28ec3..cc81c2f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1047,12 +1047,54 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	return ret;
 }
 
+static void srp_map_fmr(struct srp_map_state *state,
+			struct srp_target_port *target, struct srp_request *req,
+			struct scatterlist *scat, int count)
+{
+	struct srp_device *dev = target->srp_host->srp_dev;
+	struct ib_device *ibdev = dev->dev;
+	struct scatterlist *sg;
+	int i, use_fmr;
+
+	state->desc	= req->indirect_desc;
+	state->pages	= req->map_page;
+	state->next_fmr	= req->fmr_list;
+
+	use_fmr = dev->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+
+	for_each_sg(scat, sg, count, i) {
+		if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
+			/* FMR mapping failed, so backtrack to the first
+			 * unmapped entry and continue on without using FMR.
+			 */
+			dma_addr_t dma_addr;
+			unsigned int dma_len;
+
+backtrack:
+			sg = state->unmapped_sg;
+			i = state->unmapped_index;
+
+			dma_addr = ib_sg_dma_address(ibdev, sg);
+			dma_len = ib_sg_dma_len(ibdev, sg);
+			dma_len -= (state->unmapped_addr - dma_addr);
+			dma_addr = state->unmapped_addr;
+			use_fmr = SRP_MAP_NO_FMR;
+			srp_map_desc(state, dma_addr, dma_len, target->rkey);
+		}
+	}
+
+	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(state, target))
+		goto backtrack;
+
+	req->nfmr = state->nfmr;
+}
+
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 			struct srp_request *req)
 {
-	struct scatterlist *scat, *sg;
+	struct scatterlist *scat;
 	struct srp_cmd *cmd = req->cmd->buf;
-	int i, len, nents, count, use_fmr;
+	int len, nents, count;
 	struct srp_device *dev;
 	struct ib_device *ibdev;
 	struct srp_map_state state;
@@ -1111,35 +1153,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 				   target->indirect_size, DMA_TO_DEVICE);
 
 	memset(&state, 0, sizeof(state));
-	state.desc	= req->indirect_desc;
-	state.pages	= req->map_page;
-	state.next_fmr	= req->fmr_list;
-
-	use_fmr = dev->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
-
-	for_each_sg(scat, sg, count, i) {
-		if (srp_map_sg_entry(&state, target, sg, i, use_fmr)) {
-			/* FMR mapping failed, so backtrack to the first
-			 * unmapped entry and continue on without using FMR.
-			 */
-			dma_addr_t dma_addr;
-			unsigned int dma_len;
-
-backtrack:
-			sg = state.unmapped_sg;
-			i = state.unmapped_index;
-
-			dma_addr = ib_sg_dma_address(ibdev, sg);
-			dma_len = ib_sg_dma_len(ibdev, sg);
-			dma_len -= (state.unmapped_addr - dma_addr);
-			dma_addr = state.unmapped_addr;
-			use_fmr = SRP_MAP_NO_FMR;
-			srp_map_desc(&state, dma_addr, dma_len, target->rkey);
-		}
-	}
-
-	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(&state, target))
-		goto backtrack;
+	srp_map_fmr(&state, target, req, scat, count);
 
 	/* We've mapped the request, now pull as much of the indirect
 	 * descriptor table as we can into the command buffer. If this
@@ -1147,7 +1161,6 @@ backtrack:
 	 * guaranteed to fit into the command, as the SCSI layer won't
 	 * give us more S/G entries than we allow.
 	 */
-	req->nfmr = state.nfmr;
 	if (state.ndesc == 1) {
 		/* FMR mapping was able to collapse this to one entry,
 		 * so use a direct descriptor.
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/9] IB/srp: Introduce srp_finish_mapping()
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-05-13 14:41   ` [PATCH v2 4/9] IB/srp: Introduce srp_map_fmr() Bart Van Assche
@ 2014-05-13 14:41   ` Bart Van Assche
  2014-05-13 14:42   ` [PATCH v2 6/9] IB/srp: Introduce the 'register_always' kernel module parameter Bart Van Assche
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 42 ++++++++++++++++++++++++-------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index cc81c2f..c3c8333 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -935,16 +935,6 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
 	struct ib_pool_fmr *fmr;
 	u64 io_addr = 0;
 
-	if (!state->npages)
-		return 0;
-
-	if (state->npages == 1) {
-		srp_map_desc(state, state->base_dma_addr, state->fmr_len,
-			     target->rkey);
-		state->npages = state->fmr_len = 0;
-		return 0;
-	}
-
 	fmr = ib_fmr_pool_map_phys(dev->fmr_pool, state->pages,
 				   state->npages, io_addr);
 	if (IS_ERR(fmr))
@@ -954,10 +944,32 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
 	state->nfmr++;
 
 	srp_map_desc(state, 0, state->fmr_len, fmr->fmr->rkey);
-	state->npages = state->fmr_len = 0;
+
 	return 0;
 }
 
+static int srp_finish_mapping(struct srp_map_state *state,
+			      struct srp_target_port *target)
+{
+	int ret = 0;
+
+	if (state->npages == 0)
+		return 0;
+
+	if (state->npages == 1)
+		srp_map_desc(state, state->base_dma_addr, state->fmr_len,
+			     target->rkey);
+	else
+		ret = srp_map_finish_fmr(state, target);
+
+	if (ret == 0) {
+		state->npages = 0;
+		state->fmr_len = 0;
+	}
+
+	return ret;
+}
+
 static void srp_map_update_start(struct srp_map_state *state,
 				 struct scatterlist *sg, int sg_index,
 				 dma_addr_t dma_addr)
@@ -998,7 +1010,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 * avoided using FMR on such page fragments.
 	 */
 	if (dma_addr & ~dev->fmr_page_mask || dma_len > dev->fmr_max_size) {
-		ret = srp_map_finish_fmr(state, target);
+		ret = srp_finish_mapping(state, target);
 		if (ret)
 			return ret;
 
@@ -1017,7 +1029,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 
 	while (dma_len) {
 		if (state->npages == SRP_FMR_SIZE) {
-			ret = srp_map_finish_fmr(state, target);
+			ret = srp_finish_mapping(state, target);
 			if (ret)
 				return ret;
 
@@ -1040,7 +1052,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 */
 	ret = 0;
 	if (len != dev->fmr_page_size) {
-		ret = srp_map_finish_fmr(state, target);
+		ret = srp_finish_mapping(state, target);
 		if (!ret)
 			srp_map_update_start(state, NULL, 0, 0);
 	}
@@ -1083,7 +1095,7 @@ backtrack:
 		}
 	}
 
-	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_map_finish_fmr(state, target))
+	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_finish_mapping(state, target))
 		goto backtrack;
 
 	req->nfmr = state->nfmr;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/9] IB/srp: Introduce the 'register_always' kernel module parameter
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-05-13 14:41   ` [PATCH v2 5/9] IB/srp: Introduce srp_finish_mapping() Bart Van Assche
@ 2014-05-13 14:42   ` Bart Van Assche
  2014-05-13 14:43   ` [PATCH v2 7/9] IB/srp: One FMR pool per SRP connection Bart Van Assche
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:42 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for kernel
developers to test the memory registration code.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index c3c8333..eb88f80 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -66,6 +66,7 @@ static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
 static unsigned int indirect_sg_entries;
 static bool allow_ext_sg;
+static bool register_always;
 static int topspin_workarounds = 1;
 
 module_param(srp_sg_tablesize, uint, 0444);
@@ -87,6 +88,10 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 		 "Enable workarounds for Topspin/Cisco SRP target bugs if != 0");
 
+module_param(register_always, bool, 0444);
+MODULE_PARM_DESC(register_always,
+		 "Use memory registration even for contiguous memory regions");
+
 static struct kernel_param_ops srp_tmo_ops;
 
 static int srp_reconnect_delay = 10;
@@ -956,7 +961,7 @@ static int srp_finish_mapping(struct srp_map_state *state,
 	if (state->npages == 0)
 		return 0;
 
-	if (state->npages == 1)
+	if (state->npages == 1 && !register_always)
 		srp_map_desc(state, state->base_dma_addr, state->fmr_len,
 			     target->rkey);
 	else
@@ -1138,7 +1143,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 	fmt = SRP_DATA_DESC_DIRECT;
 	len = sizeof (struct srp_cmd) +	sizeof (struct srp_direct_buf);
 
-	if (count == 1) {
+	if (count == 1 && !register_always) {
 		/*
 		 * The midlayer only generated a single gather/scatter
 		 * entry, or DMA mapping coalesced everything to a
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 7/9] IB/srp: One FMR pool per SRP connection
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2014-05-13 14:42   ` [PATCH v2 6/9] IB/srp: Introduce the 'register_always' kernel module parameter Bart Van Assche
@ 2014-05-13 14:43   ` Bart Van Assche
  2014-05-13 14:44   ` [PATCH v2 8/9] IB/srp: Rename FMR-related variables Bart Van Assche
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:43 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.

Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.

For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 135 +++++++++++++++++++++---------------
 drivers/infiniband/ulp/srp/ib_srp.h |   6 +-
 2 files changed, 82 insertions(+), 59 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index eb88f80..2bfc3dd 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -293,12 +293,31 @@ static int srp_new_cm_id(struct srp_target_port *target)
 	return 0;
 }
 
+static struct ib_fmr_pool *srp_alloc_fmr_pool(struct srp_target_port *target)
+{
+	struct srp_device *dev = target->srp_host->srp_dev;
+	struct ib_fmr_pool_param fmr_param;
+
+	memset(&fmr_param, 0, sizeof(fmr_param));
+	fmr_param.pool_size	    = target->scsi_host->can_queue;
+	fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
+	fmr_param.cache		    = 1;
+	fmr_param.max_pages_per_fmr = dev->max_pages_per_fmr;
+	fmr_param.page_shift	    = ilog2(dev->fmr_page_size);
+	fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |
+				       IB_ACCESS_REMOTE_WRITE |
+				       IB_ACCESS_REMOTE_READ);
+
+	return ib_create_fmr_pool(dev->pd, &fmr_param);
+}
+
 static int srp_create_target_ib(struct srp_target_port *target)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_qp_init_attr *init_attr;
 	struct ib_cq *recv_cq, *send_cq;
 	struct ib_qp *qp;
+	struct ib_fmr_pool *fmr_pool = NULL;
 	int ret;
 
 	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -341,6 +360,21 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	if (ret)
 		goto err_qp;
 
+	if (!target->qp || target->fmr_pool) {
+		fmr_pool = srp_alloc_fmr_pool(target);
+		if (IS_ERR(fmr_pool)) {
+			ret = PTR_ERR(fmr_pool);
+			shost_printk(KERN_WARNING, target->scsi_host, PFX
+				     "FMR pool allocation failed (%d)\n", ret);
+			if (target->qp)
+				goto err_qp;
+			fmr_pool = NULL;
+		}
+		if (target->fmr_pool)
+			ib_destroy_fmr_pool(target->fmr_pool);
+		target->fmr_pool = fmr_pool;
+	}
+
 	if (target->qp)
 		ib_destroy_qp(target->qp);
 	if (target->recv_cq)
@@ -377,6 +411,8 @@ static void srp_free_target_ib(struct srp_target_port *target)
 {
 	int i;
 
+	if (target->fmr_pool)
+		ib_destroy_fmr_pool(target->fmr_pool);
 	ib_destroy_qp(target->qp);
 	ib_destroy_cq(target->send_cq);
 	ib_destroy_cq(target->recv_cq);
@@ -936,11 +972,10 @@ static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
 static int srp_map_finish_fmr(struct srp_map_state *state,
 			      struct srp_target_port *target)
 {
-	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_pool_fmr *fmr;
 	u64 io_addr = 0;
 
-	fmr = ib_fmr_pool_map_phys(dev->fmr_pool, state->pages,
+	fmr = ib_fmr_pool_map_phys(target->fmr_pool, state->pages,
 				   state->npages, io_addr);
 	if (IS_ERR(fmr))
 		return PTR_ERR(fmr);
@@ -1077,7 +1112,7 @@ static void srp_map_fmr(struct srp_map_state *state,
 	state->pages	= req->map_page;
 	state->next_fmr	= req->fmr_list;
 
-	use_fmr = dev->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+	use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
 
 	for_each_sg(scat, sg, count, i) {
 		if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
@@ -1555,7 +1590,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	struct srp_cmd *cmd;
 	struct ib_device *dev;
 	unsigned long flags;
-	int len, result;
+	int len, ret;
 	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
 
 	/*
@@ -1567,12 +1602,9 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	if (in_scsi_eh)
 		mutex_lock(&rport->mutex);
 
-	result = srp_chkready(target->rport);
-	if (unlikely(result)) {
-		scmnd->result = result;
-		scmnd->scsi_done(scmnd);
-		goto unlock_rport;
-	}
+	scmnd->result = srp_chkready(target->rport);
+	if (unlikely(scmnd->result))
+		goto err;
 
 	spin_lock_irqsave(&target->lock, flags);
 	iu = __srp_get_tx_iu(target, SRP_IU_CMD);
@@ -1587,7 +1619,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_iu_len,
 				   DMA_TO_DEVICE);
 
-	scmnd->result        = 0;
 	scmnd->host_scribble = (void *) req;
 
 	cmd = iu->buf;
@@ -1604,7 +1635,15 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	len = srp_map_data(scmnd, target, req);
 	if (len < 0) {
 		shost_printk(KERN_ERR, target->scsi_host,
-			     PFX "Failed to map data\n");
+			     PFX "Failed to map data (%d)\n", len);
+		/*
+		 * If we ran out of memory descriptors (-ENOMEM) because an
+		 * application is queuing many requests with more than
+		 * max_pages_per_fmr sg-list elements, tell the SCSI mid-layer
+		 * to reduce queue depth temporarily.
+		 */
+		scmnd->result = len == -ENOMEM ?
+			DID_OK << 16 | QUEUE_FULL << 1 : DID_ERROR << 16;
 		goto err_iu;
 	}
 
@@ -1616,11 +1655,13 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		goto err_unmap;
 	}
 
+	ret = 0;
+
 unlock_rport:
 	if (in_scsi_eh)
 		mutex_unlock(&rport->mutex);
 
-	return 0;
+	return ret;
 
 err_unmap:
 	srp_unmap_data(scmnd, target, req);
@@ -1636,10 +1677,15 @@ err_iu:
 err_unlock:
 	spin_unlock_irqrestore(&target->lock, flags);
 
-	if (in_scsi_eh)
-		mutex_unlock(&rport->mutex);
+err:
+	if (scmnd->result) {
+		scmnd->scsi_done(scmnd);
+		ret = 0;
+	} else {
+		ret = SCSI_MLQUEUE_HOST_BUSY;
+	}
 
-	return SCSI_MLQUEUE_HOST_BUSY;
+	goto unlock_rport;
 }
 
 /*
@@ -2688,19 +2734,6 @@ static ssize_t srp_create_target(struct device *dev,
 		goto err;
 	}
 
-	if (!host->srp_dev->fmr_pool && !target->allow_ext_sg &&
-				target->cmd_sg_cnt < target->sg_tablesize) {
-		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
-		target->sg_tablesize = target->cmd_sg_cnt;
-	}
-
-	target_host->sg_tablesize = target->sg_tablesize;
-	target->indirect_size = target->sg_tablesize *
-				sizeof (struct srp_direct_buf);
-	target->max_iu_len = sizeof (struct srp_cmd) +
-			     sizeof (struct srp_indirect_buf) +
-			     target->cmd_sg_cnt * sizeof (struct srp_direct_buf);
-
 	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
@@ -2717,6 +2750,19 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err_free_mem;
 
+	if (!target->fmr_pool && !target->allow_ext_sg &&
+	    target->cmd_sg_cnt < target->sg_tablesize) {
+		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
+		target->sg_tablesize = target->cmd_sg_cnt;
+	}
+
+	target_host->sg_tablesize = target->sg_tablesize;
+	target->indirect_size = target->sg_tablesize *
+				sizeof(struct srp_direct_buf);
+	target->max_iu_len = sizeof(struct srp_cmd) +
+			     sizeof(struct srp_indirect_buf) +
+			     target->cmd_sg_cnt * sizeof(struct srp_direct_buf);
+
 	ret = srp_new_cm_id(target);
 	if (ret)
 		goto err_free_ib;
@@ -2828,9 +2874,8 @@ static void srp_add_one(struct ib_device *device)
 {
 	struct srp_device *srp_dev;
 	struct ib_device_attr *dev_attr;
-	struct ib_fmr_pool_param fmr_param;
 	struct srp_host *host;
-	int max_pages_per_fmr, fmr_page_shift, s, e, p;
+	int fmr_page_shift, s, e, p;
 
 	dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
 	if (!dev_attr)
@@ -2853,7 +2898,10 @@ static void srp_add_one(struct ib_device *device)
 	fmr_page_shift		= max(12, ffs(dev_attr->page_size_cap) - 1);
 	srp_dev->fmr_page_size	= 1 << fmr_page_shift;
 	srp_dev->fmr_page_mask	= ~((u64) srp_dev->fmr_page_size - 1);
-	srp_dev->fmr_max_size	= srp_dev->fmr_page_size * SRP_FMR_SIZE;
+	srp_dev->max_pages_per_fmr = min_t(u64, SRP_FMR_SIZE,
+				dev_attr->max_mr_size / srp_dev->fmr_page_size);
+	srp_dev->fmr_max_size	= srp_dev->fmr_page_size *
+				   srp_dev->max_pages_per_fmr;
 
 	INIT_LIST_HEAD(&srp_dev->dev_list);
 
@@ -2869,27 +2917,6 @@ static void srp_add_one(struct ib_device *device)
 	if (IS_ERR(srp_dev->mr))
 		goto err_pd;
 
-	for (max_pages_per_fmr = SRP_FMR_SIZE;
-			max_pages_per_fmr >= SRP_FMR_MIN_SIZE;
-			max_pages_per_fmr /= 2, srp_dev->fmr_max_size /= 2) {
-		memset(&fmr_param, 0, sizeof fmr_param);
-		fmr_param.pool_size	    = SRP_FMR_POOL_SIZE;
-		fmr_param.dirty_watermark   = SRP_FMR_DIRTY_SIZE;
-		fmr_param.cache		    = 1;
-		fmr_param.max_pages_per_fmr = max_pages_per_fmr;
-		fmr_param.page_shift	    = fmr_page_shift;
-		fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |
-					       IB_ACCESS_REMOTE_WRITE |
-					       IB_ACCESS_REMOTE_READ);
-
-		srp_dev->fmr_pool = ib_create_fmr_pool(srp_dev->pd, &fmr_param);
-		if (!IS_ERR(srp_dev->fmr_pool))
-			break;
-	}
-
-	if (IS_ERR(srp_dev->fmr_pool))
-		srp_dev->fmr_pool = NULL;
-
 	if (device->node_type == RDMA_NODE_IB_SWITCH) {
 		s = 0;
 		e = 0;
@@ -2952,8 +2979,6 @@ static void srp_remove_one(struct ib_device *device)
 		kfree(host);
 	}
 
-	if (srp_dev->fmr_pool)
-		ib_destroy_fmr_pool(srp_dev->fmr_pool);
 	ib_dereg_mr(srp_dev->mr);
 	ib_dealloc_pd(srp_dev->pd);
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index aad27b7..e45379f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -67,9 +67,6 @@ enum {
 	SRP_TAG_TSK_MGMT	= 1U << 31,
 
 	SRP_FMR_SIZE		= 512,
-	SRP_FMR_MIN_SIZE	= 128,
-	SRP_FMR_POOL_SIZE	= 1024,
-	SRP_FMR_DIRTY_SIZE	= SRP_FMR_POOL_SIZE / 4,
 
 	SRP_MAP_ALLOW_FMR	= 0,
 	SRP_MAP_NO_FMR		= 1,
@@ -91,10 +88,10 @@ struct srp_device {
 	struct ib_device       *dev;
 	struct ib_pd	       *pd;
 	struct ib_mr	       *mr;
-	struct ib_fmr_pool     *fmr_pool;
 	u64			fmr_page_mask;
 	int			fmr_page_size;
 	int			fmr_max_size;
+	int			max_pages_per_fmr;
 };
 
 struct srp_host {
@@ -131,6 +128,7 @@ struct srp_target_port {
 	struct ib_cq	       *send_cq ____cacheline_aligned_in_smp;
 	struct ib_cq	       *recv_cq;
 	struct ib_qp	       *qp;
+	struct ib_fmr_pool     *fmr_pool;
 	u32			lkey;
 	u32			rkey;
 	enum srp_target_state	state;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 8/9] IB/srp: Rename FMR-related variables
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2014-05-13 14:43   ` [PATCH v2 7/9] IB/srp: One FMR pool per SRP connection Bart Van Assche
@ 2014-05-13 14:44   ` Bart Van Assche
  2014-05-13 14:44   ` [PATCH v2 9/9] IB/srp: Add fast registration support Bart Van Assche
  2014-05-13 16:50   ` [PATCH v2 0/9] SRP initiator patches for kernel 3.16 Sagi Grimberg
  9 siblings, 0 replies; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:

sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g;s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g;s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g;s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g;s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 48 ++++++++++++++++++-------------------
 drivers/infiniband/ulp/srp/ib_srp.h | 16 ++++++-------
 2 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 2bfc3dd..4fda7eb 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -302,8 +302,8 @@ static struct ib_fmr_pool *srp_alloc_fmr_pool(struct srp_target_port *target)
 	fmr_param.pool_size	    = target->scsi_host->can_queue;
 	fmr_param.dirty_watermark   = fmr_param.pool_size / 4;
 	fmr_param.cache		    = 1;
-	fmr_param.max_pages_per_fmr = dev->max_pages_per_fmr;
-	fmr_param.page_shift	    = ilog2(dev->fmr_page_size);
+	fmr_param.max_pages_per_fmr = dev->max_pages_per_mr;
+	fmr_param.page_shift	    = ilog2(dev->mr_page_size);
 	fmr_param.access	    = (IB_ACCESS_LOCAL_WRITE |
 				       IB_ACCESS_REMOTE_WRITE |
 				       IB_ACCESS_REMOTE_READ);
@@ -659,7 +659,7 @@ static int srp_alloc_req_data(struct srp_target_port *target)
 		req = &target->req_ring[i];
 		req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
 					GFP_KERNEL);
-		req->map_page = kmalloc(SRP_FMR_SIZE * sizeof(void *),
+		req->map_page = kmalloc(SRP_MAX_PAGES_PER_MR * sizeof(void *),
 					GFP_KERNEL);
 		req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
 		if (!req->fmr_list || !req->map_page || !req->indirect_desc)
@@ -812,7 +812,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
 		return;
 
 	pfmr = req->fmr_list;
-	while (req->nfmr--)
+	while (req->nmdesc--)
 		ib_fmr_pool_unmap(*pfmr++);
 
 	ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
@@ -981,9 +981,9 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
 		return PTR_ERR(fmr);
 
 	*state->next_fmr++ = fmr;
-	state->nfmr++;
+	state->nmdesc++;
 
-	srp_map_desc(state, 0, state->fmr_len, fmr->fmr->rkey);
+	srp_map_desc(state, 0, state->dma_len, fmr->fmr->rkey);
 
 	return 0;
 }
@@ -997,14 +997,14 @@ static int srp_finish_mapping(struct srp_map_state *state,
 		return 0;
 
 	if (state->npages == 1 && !register_always)
-		srp_map_desc(state, state->base_dma_addr, state->fmr_len,
+		srp_map_desc(state, state->base_dma_addr, state->dma_len,
 			     target->rkey);
 	else
 		ret = srp_map_finish_fmr(state, target);
 
 	if (ret == 0) {
 		state->npages = 0;
-		state->fmr_len = 0;
+		state->dma_len = 0;
 	}
 
 	return ret;
@@ -1049,7 +1049,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 * that were never quite defined, but went away when the initiator
 	 * avoided using FMR on such page fragments.
 	 */
-	if (dma_addr & ~dev->fmr_page_mask || dma_len > dev->fmr_max_size) {
+	if (dma_addr & ~dev->mr_page_mask || dma_len > dev->mr_max_size) {
 		ret = srp_finish_mapping(state, target);
 		if (ret)
 			return ret;
@@ -1068,7 +1068,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 		srp_map_update_start(state, sg, sg_index, dma_addr);
 
 	while (dma_len) {
-		if (state->npages == SRP_FMR_SIZE) {
+		if (state->npages == SRP_MAX_PAGES_PER_MR) {
 			ret = srp_finish_mapping(state, target);
 			if (ret)
 				return ret;
@@ -1076,12 +1076,12 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 			srp_map_update_start(state, sg, sg_index, dma_addr);
 		}
 
-		len = min_t(unsigned int, dma_len, dev->fmr_page_size);
+		len = min_t(unsigned int, dma_len, dev->mr_page_size);
 
 		if (!state->npages)
 			state->base_dma_addr = dma_addr;
 		state->pages[state->npages++] = dma_addr;
-		state->fmr_len += len;
+		state->dma_len += len;
 		dma_addr += len;
 		dma_len -= len;
 	}
@@ -1091,7 +1091,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 * boundries.
 	 */
 	ret = 0;
-	if (len != dev->fmr_page_size) {
+	if (len != dev->mr_page_size) {
 		ret = srp_finish_mapping(state, target);
 		if (!ret)
 			srp_map_update_start(state, NULL, 0, 0);
@@ -1138,7 +1138,7 @@ backtrack:
 	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_finish_mapping(state, target))
 		goto backtrack;
 
-	req->nfmr = state->nfmr;
+	req->nmdesc = state->nmdesc;
 }
 
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
@@ -1191,7 +1191,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 		buf->key = cpu_to_be32(target->rkey);
 		buf->len = cpu_to_be32(ib_sg_dma_len(ibdev, scat));
 
-		req->nfmr = 0;
+		req->nmdesc = 0;
 		goto map_complete;
 	}
 
@@ -1639,7 +1639,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		/*
 		 * If we ran out of memory descriptors (-ENOMEM) because an
 		 * application is queuing many requests with more than
-		 * max_pages_per_fmr sg-list elements, tell the SCSI mid-layer
+		 * max_pages_per_mr sg-list elements, tell the SCSI mid-layer
 		 * to reduce queue depth temporarily.
 		 */
 		scmnd->result = len == -ENOMEM ?
@@ -2875,7 +2875,7 @@ static void srp_add_one(struct ib_device *device)
 	struct srp_device *srp_dev;
 	struct ib_device_attr *dev_attr;
 	struct srp_host *host;
-	int fmr_page_shift, s, e, p;
+	int mr_page_shift, s, e, p;
 
 	dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
 	if (!dev_attr)
@@ -2895,13 +2895,13 @@ static void srp_add_one(struct ib_device *device)
 	 * minimum of 4096 bytes. We're unlikely to build large sglists
 	 * out of smaller entries.
 	 */
-	fmr_page_shift		= max(12, ffs(dev_attr->page_size_cap) - 1);
-	srp_dev->fmr_page_size	= 1 << fmr_page_shift;
-	srp_dev->fmr_page_mask	= ~((u64) srp_dev->fmr_page_size - 1);
-	srp_dev->max_pages_per_fmr = min_t(u64, SRP_FMR_SIZE,
-				dev_attr->max_mr_size / srp_dev->fmr_page_size);
-	srp_dev->fmr_max_size	= srp_dev->fmr_page_size *
-				   srp_dev->max_pages_per_fmr;
+	mr_page_shift		= max(12, ffs(dev_attr->page_size_cap) - 1);
+	srp_dev->mr_page_size	= 1 << mr_page_shift;
+	srp_dev->mr_page_mask	= ~((u64) srp_dev->mr_page_size - 1);
+	srp_dev->max_pages_per_mr = min_t(u64, SRP_MAX_PAGES_PER_MR,
+				dev_attr->max_mr_size / srp_dev->mr_page_size);
+	srp_dev->mr_max_size	= srp_dev->mr_page_size *
+				   srp_dev->max_pages_per_mr;
 
 	INIT_LIST_HEAD(&srp_dev->dev_list);
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index e45379f..fccf5df 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -66,7 +66,7 @@ enum {
 	SRP_TAG_NO_REQ		= ~0U,
 	SRP_TAG_TSK_MGMT	= 1U << 31,
 
-	SRP_FMR_SIZE		= 512,
+	SRP_MAX_PAGES_PER_MR	= 512,
 
 	SRP_MAP_ALLOW_FMR	= 0,
 	SRP_MAP_NO_FMR		= 1,
@@ -88,10 +88,10 @@ struct srp_device {
 	struct ib_device       *dev;
 	struct ib_pd	       *pd;
 	struct ib_mr	       *mr;
-	u64			fmr_page_mask;
-	int			fmr_page_size;
-	int			fmr_max_size;
-	int			max_pages_per_fmr;
+	u64			mr_page_mask;
+	int			mr_page_size;
+	int			mr_max_size;
+	int			max_pages_per_mr;
 };
 
 struct srp_host {
@@ -113,7 +113,7 @@ struct srp_request {
 	u64		       *map_page;
 	struct srp_direct_buf  *indirect_desc;
 	dma_addr_t		indirect_dma_addr;
-	short			nfmr;
+	short			nmdesc;
 	short			index;
 };
 
@@ -200,10 +200,10 @@ struct srp_map_state {
 	struct srp_direct_buf  *desc;
 	u64		       *pages;
 	dma_addr_t		base_dma_addr;
-	u32			fmr_len;
+	u32			dma_len;
 	u32			total_len;
 	unsigned int		npages;
-	unsigned int		nfmr;
+	unsigned int		nmdesc;
 	unsigned int		ndesc;
 	struct scatterlist     *unmapped_sg;
 	int			unmapped_index;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2014-05-13 14:44   ` [PATCH v2 8/9] IB/srp: Rename FMR-related variables Bart Van Assche
@ 2014-05-13 14:44   ` Bart Van Assche
       [not found]     ` <53722FE2.4010808-HInyCGIudOg@public.gmane.org>
  2014-05-13 16:50   ` [PATCH v2 0/9] SRP initiator patches for kernel 3.16 Sagi Grimberg
  9 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2014-05-13 14:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.

In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 457 +++++++++++++++++++++++++++++-------
 drivers/infiniband/ulp/srp/ib_srp.h |  74 +++++-
 2 files changed, 440 insertions(+), 91 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 4fda7eb..2e0bd4d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -66,6 +66,7 @@ static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
 static unsigned int indirect_sg_entries;
 static bool allow_ext_sg;
+static bool prefer_fr;
 static bool register_always;
 static int topspin_workarounds = 1;
 
@@ -88,6 +89,10 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 		 "Enable workarounds for Topspin/Cisco SRP target bugs if != 0");
 
+module_param(prefer_fr, bool, 0444);
+MODULE_PARM_DESC(prefer_fr,
+"Whether to use fast registration if both FMR and fast registration are supported");
+
 module_param(register_always, bool, 0444);
 MODULE_PARM_DESC(register_always,
 		 "Use memory registration even for contiguous memory regions");
@@ -311,6 +316,132 @@ static struct ib_fmr_pool *srp_alloc_fmr_pool(struct srp_target_port *target)
 	return ib_create_fmr_pool(dev->pd, &fmr_param);
 }
 
+/**
+ * srp_destroy_fr_pool() - free the resources owned by a pool
+ * @pool: Fast registration pool to be destroyed.
+ */
+static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
+{
+	int i;
+	struct srp_fr_desc *d;
+
+	if (!pool)
+		return;
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		if (d->frpl)
+			ib_free_fast_reg_page_list(d->frpl);
+		if (d->mr)
+			ib_dereg_mr(d->mr);
+	}
+	kfree(pool);
+}
+
+/**
+ * srp_create_fr_pool() - allocate and initialize a pool for fast registration
+ * @device:            IB device to allocate fast registration descriptors for.
+ * @pd:                Protection domain associated with the FR descriptors.
+ * @pool_size:         Number of descriptors to allocate.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ */
+static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
+					      struct ib_pd *pd, int pool_size,
+					      int max_page_list_len)
+{
+	struct srp_fr_pool *pool;
+	struct srp_fr_desc *d;
+	struct ib_mr *mr;
+	struct ib_fast_reg_page_list *frpl;
+	int i, ret = -EINVAL;
+
+	if (pool_size <= 0)
+		goto err;
+	ret = -ENOMEM;
+	pool = kzalloc(sizeof(struct srp_fr_pool) +
+		       pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
+	if (!pool)
+		goto err;
+	pool->size = pool_size;
+	pool->max_page_list_len = max_page_list_len;
+	spin_lock_init(&pool->lock);
+	INIT_LIST_HEAD(&pool->free_list);
+
+	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
+		mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
+		if (IS_ERR(mr)) {
+			ret = PTR_ERR(mr);
+			goto destroy_pool;
+		}
+		d->mr = mr;
+		frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
+		if (IS_ERR(frpl)) {
+			ret = PTR_ERR(frpl);
+			goto destroy_pool;
+		}
+		d->frpl = frpl;
+		list_add_tail(&d->entry, &pool->free_list);
+	}
+
+out:
+	return pool;
+
+destroy_pool:
+	srp_destroy_fr_pool(pool);
+
+err:
+	pool = ERR_PTR(ret);
+	goto out;
+}
+
+/**
+ * srp_fr_pool_get() - obtain a descriptor suitable for fast registration
+ * @pool: Pool to obtain descriptor from.
+ */
+static struct srp_fr_desc *srp_fr_pool_get(struct srp_fr_pool *pool)
+{
+	struct srp_fr_desc *d = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pool->lock, flags);
+	if (!list_empty(&pool->free_list)) {
+		d = list_first_entry(&pool->free_list, typeof(*d), entry);
+		list_del(&d->entry);
+	}
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	return d;
+}
+
+/**
+ * srp_fr_pool_put() - put an FR descriptor back in the free list
+ * @pool: Pool the descriptor was allocated from.
+ * @desc: Pointer to an array of fast registration descriptor pointers.
+ * @n:    Number of descriptors to put back.
+ *
+ * Note: The caller must already have queued an invalidation request for
+ * desc->mr->rkey before calling this function.
+ */
+static void srp_fr_pool_put(struct srp_fr_pool *pool, struct srp_fr_desc **desc,
+			    int n)
+{
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&pool->lock, flags);
+	for (i = 0; i < n; i++)
+		list_add(&desc[i]->entry, &pool->free_list);
+	spin_unlock_irqrestore(&pool->lock, flags);
+}
+
+static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
+{
+	struct srp_device *dev = target->srp_host->srp_dev;
+
+	return srp_create_fr_pool(dev->dev, dev->pd,
+				  target->scsi_host->can_queue,
+				  dev->max_pages_per_mr);
+}
+
 static int srp_create_target_ib(struct srp_target_port *target)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
@@ -318,6 +449,8 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	struct ib_cq *recv_cq, *send_cq;
 	struct ib_qp *qp;
 	struct ib_fmr_pool *fmr_pool = NULL;
+	struct srp_fr_pool *fr_pool = NULL;
+	const int m = 1 + dev->use_fast_reg;
 	int ret;
 
 	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
@@ -332,7 +465,7 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	}
 
 	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
-			       target->queue_size, target->comp_vector);
+			       m * target->queue_size, target->comp_vector);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
@@ -341,11 +474,11 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP);
 
 	init_attr->event_handler       = srp_qp_event;
-	init_attr->cap.max_send_wr     = target->queue_size;
+	init_attr->cap.max_send_wr     = m * target->queue_size;
 	init_attr->cap.max_recv_wr     = target->queue_size;
 	init_attr->cap.max_recv_sge    = 1;
 	init_attr->cap.max_send_sge    = 1;
-	init_attr->sq_sig_type         = IB_SIGNAL_ALL_WR;
+	init_attr->sq_sig_type         = IB_SIGNAL_REQ_WR;
 	init_attr->qp_type             = IB_QPT_RC;
 	init_attr->send_cq             = send_cq;
 	init_attr->recv_cq             = recv_cq;
@@ -360,19 +493,38 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	if (ret)
 		goto err_qp;
 
-	if (!target->qp || target->fmr_pool) {
-		fmr_pool = srp_alloc_fmr_pool(target);
-		if (IS_ERR(fmr_pool)) {
-			ret = PTR_ERR(fmr_pool);
-			shost_printk(KERN_WARNING, target->scsi_host, PFX
-				     "FMR pool allocation failed (%d)\n", ret);
-			if (target->qp)
-				goto err_qp;
-			fmr_pool = NULL;
+	if (dev->use_fast_reg) {
+		if (!target->qp || target->fr_pool) {
+			fr_pool = srp_alloc_fr_pool(target);
+			if (IS_ERR(fr_pool)) {
+				ret = PTR_ERR(fr_pool);
+				shost_printk(KERN_WARNING, target->scsi_host,
+					PFX "FR pool allocation failed (%d)\n",
+					ret);
+				if (target->qp)
+					goto err_qp;
+				fr_pool = NULL;
+			}
+			if (target->fr_pool)
+				srp_destroy_fr_pool(target->fr_pool);
+			target->fr_pool = fr_pool;
+		}
+	} else {
+		if (!target->qp || target->fmr_pool) {
+			fmr_pool = srp_alloc_fmr_pool(target);
+			if (IS_ERR(fmr_pool)) {
+				ret = PTR_ERR(fmr_pool);
+				shost_printk(KERN_WARNING, target->scsi_host,
+					PFX "FMR pool allocation failed (%d)\n",
+					ret);
+				if (target->qp)
+					goto err_qp;
+				fmr_pool = NULL;
+			}
+			if (target->fmr_pool)
+				ib_destroy_fmr_pool(target->fmr_pool);
+			target->fmr_pool = fmr_pool;
 		}
-		if (target->fmr_pool)
-			ib_destroy_fmr_pool(target->fmr_pool);
-		target->fmr_pool = fmr_pool;
 	}
 
 	if (target->qp)
@@ -409,10 +561,16 @@ err:
  */
 static void srp_free_target_ib(struct srp_target_port *target)
 {
+	struct srp_device *dev = target->srp_host->srp_dev;
 	int i;
 
-	if (target->fmr_pool)
-		ib_destroy_fmr_pool(target->fmr_pool);
+	if (dev->use_fast_reg) {
+		if (target->fr_pool)
+			srp_destroy_fr_pool(target->fr_pool);
+	} else {
+		if (target->fmr_pool)
+			ib_destroy_fmr_pool(target->fmr_pool);
+	}
 	ib_destroy_qp(target->qp);
 	ib_destroy_cq(target->send_cq);
 	ib_destroy_cq(target->recv_cq);
@@ -617,7 +775,8 @@ static void srp_disconnect_target(struct srp_target_port *target)
 
 static void srp_free_req_data(struct srp_target_port *target)
 {
-	struct ib_device *ibdev = target->srp_host->srp_dev->dev;
+	struct srp_device *dev = target->srp_host->srp_dev;
+	struct ib_device *ibdev = dev->dev;
 	struct srp_request *req;
 	int i;
 
@@ -626,7 +785,10 @@ static void srp_free_req_data(struct srp_target_port *target)
 
 	for (i = 0; i < target->req_ring_size; ++i) {
 		req = &target->req_ring[i];
-		kfree(req->fmr_list);
+		if (dev->use_fast_reg)
+			kfree(req->fr_list);
+		else
+			kfree(req->fmr_list);
 		kfree(req->map_page);
 		if (req->indirect_dma_addr) {
 			ib_dma_unmap_single(ibdev, req->indirect_dma_addr,
@@ -645,6 +807,7 @@ static int srp_alloc_req_data(struct srp_target_port *target)
 	struct srp_device *srp_dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = srp_dev->dev;
 	struct srp_request *req;
+	void *mr_list;
 	dma_addr_t dma_addr;
 	int i, ret = -ENOMEM;
 
@@ -657,12 +820,20 @@ static int srp_alloc_req_data(struct srp_target_port *target)
 
 	for (i = 0; i < target->req_ring_size; ++i) {
 		req = &target->req_ring[i];
-		req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
-					GFP_KERNEL);
+		mr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
+				  GFP_KERNEL);
+		if (!mr_list)
+			goto out;
+		if (srp_dev->use_fast_reg)
+			req->fr_list = mr_list;
+		else
+			req->fmr_list = mr_list;
 		req->map_page = kmalloc(SRP_MAX_PAGES_PER_MR * sizeof(void *),
 					GFP_KERNEL);
+		if (!req->map_page)
+			goto out;
 		req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
-		if (!req->fmr_list || !req->map_page || !req->indirect_desc)
+		if (!req->indirect_desc)
 			goto out;
 
 		dma_addr = ib_dma_map_single(ibdev, req->indirect_desc,
@@ -799,21 +970,48 @@ static int srp_connect_target(struct srp_target_port *target)
 	}
 }
 
+static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
+{
+	struct ib_send_wr *bad_wr;
+	struct ib_send_wr wr = {
+		.opcode		    = IB_WR_LOCAL_INV,
+		.wr_id		    = LOCAL_INV_WR_ID_MASK,
+		.next		    = NULL,
+		.num_sge	    = 0,
+		.send_flags	    = 0,
+		.ex.invalidate_rkey = rkey,
+	};
+
+	return ib_post_send(target->qp, &wr, &bad_wr);
+}
+
 static void srp_unmap_data(struct scsi_cmnd *scmnd,
 			   struct srp_target_port *target,
 			   struct srp_request *req)
 {
-	struct ib_device *ibdev = target->srp_host->srp_dev->dev;
-	struct ib_pool_fmr **pfmr;
+	struct srp_device *dev = target->srp_host->srp_dev;
+	struct ib_device *ibdev = dev->dev;
+	int i;
 
 	if (!scsi_sglist(scmnd) ||
 	    (scmnd->sc_data_direction != DMA_TO_DEVICE &&
 	     scmnd->sc_data_direction != DMA_FROM_DEVICE))
 		return;
 
-	pfmr = req->fmr_list;
-	while (req->nmdesc--)
-		ib_fmr_pool_unmap(*pfmr++);
+	if (dev->use_fast_reg) {
+		struct srp_fr_desc **pfr;
+
+		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
+			srp_inv_rkey(target, (*pfr)->mr->rkey);
+		if (req->nmdesc)
+			srp_fr_pool_put(target->fr_pool, req->fr_list,
+					req->nmdesc);
+	} else {
+		struct ib_pool_fmr **pfmr;
+
+		for (i = req->nmdesc, pfmr = req->fmr_list; i > 0; i--, pfmr++)
+			ib_fmr_pool_unmap(*pfmr);
+	}
 
 	ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
 			scmnd->sc_data_direction);
@@ -926,21 +1124,19 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	 * callbacks will have finished before a new QP is allocated.
 	 */
 	ret = srp_new_cm_id(target);
-	/*
-	 * Whether or not creating a new CM ID succeeded, create a new
-	 * QP. This guarantees that all completion callback function
-	 * invocations have finished before request resetting starts.
-	 */
-	if (ret == 0)
-		ret = srp_create_target_ib(target);
-	else
-		srp_create_target_ib(target);
 
 	for (i = 0; i < target->req_ring_size; ++i) {
 		struct srp_request *req = &target->req_ring[i];
 		srp_finish_req(target, req, NULL, DID_RESET << 16);
 	}
 
+	/*
+	 * Whether or not creating a new CM ID succeeded, create a new
+	 * QP. This guarantees that all callback functions for the old QP have
+	 * finished before any send requests are posted on the new QP.
+	 */
+	ret += srp_create_target_ib(target);
+
 	INIT_LIST_HEAD(&target->free_tx);
 	for (i = 0; i < target->queue_size; ++i)
 		list_add(&target->tx_ring[i]->list, &target->free_tx);
@@ -988,6 +1184,47 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
 	return 0;
 }
 
+static int srp_map_finish_fr(struct srp_map_state *state,
+			     struct srp_target_port *target)
+{
+	struct srp_device *dev = target->srp_host->srp_dev;
+	struct ib_send_wr *bad_wr;
+	struct ib_send_wr wr;
+	struct srp_fr_desc *desc;
+	u32 rkey;
+
+	desc = srp_fr_pool_get(target->fr_pool);
+	if (!desc)
+		return -ENOMEM;
+
+	rkey = ib_inc_rkey(desc->mr->rkey);
+	ib_update_fast_reg_key(desc->mr, rkey);
+
+	memcpy(desc->frpl->page_list, state->pages,
+	       sizeof(state->pages[0]) * state->npages);
+
+	memset(&wr, 0, sizeof(wr));
+	wr.opcode = IB_WR_FAST_REG_MR;
+	wr.wr_id = FAST_REG_WR_ID_MASK;
+	wr.wr.fast_reg.iova_start = state->base_dma_addr;
+	wr.wr.fast_reg.page_list = desc->frpl;
+	wr.wr.fast_reg.page_list_len = state->npages;
+	wr.wr.fast_reg.page_shift = ilog2(dev->mr_page_size);
+	wr.wr.fast_reg.length = state->dma_len;
+	wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE |
+				       IB_ACCESS_REMOTE_READ |
+				       IB_ACCESS_REMOTE_WRITE);
+	wr.wr.fast_reg.rkey = desc->mr->lkey;
+
+	*state->next_fr++ = desc;
+	state->nmdesc++;
+
+	srp_map_desc(state, state->base_dma_addr, state->dma_len,
+		     desc->mr->rkey);
+
+	return ib_post_send(target->qp, &wr, &bad_wr);
+}
+
 static int srp_finish_mapping(struct srp_map_state *state,
 			      struct srp_target_port *target)
 {
@@ -1000,7 +1237,9 @@ static int srp_finish_mapping(struct srp_map_state *state,
 		srp_map_desc(state, state->base_dma_addr, state->dma_len,
 			     target->rkey);
 	else
-		ret = srp_map_finish_fmr(state, target);
+		ret = target->srp_host->srp_dev->use_fast_reg ?
+			srp_map_finish_fr(state, target) :
+			srp_map_finish_fmr(state, target);
 
 	if (ret == 0) {
 		state->npages = 0;
@@ -1022,7 +1261,7 @@ static void srp_map_update_start(struct srp_map_state *state,
 static int srp_map_sg_entry(struct srp_map_state *state,
 			    struct srp_target_port *target,
 			    struct scatterlist *sg, int sg_index,
-			    int use_fmr)
+			    bool use_memory_registration)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
@@ -1034,22 +1273,24 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	if (!dma_len)
 		return 0;
 
-	if (use_fmr == SRP_MAP_NO_FMR) {
-		/* Once we're in direct map mode for a request, we don't
-		 * go back to FMR mode, so no need to update anything
+	if (!use_memory_registration) {
+		/*
+		 * Once we're in direct map mode for a request, we don't
+		 * go back to FMR or FR mode, so no need to update anything
 		 * other than the descriptor.
 		 */
 		srp_map_desc(state, dma_addr, dma_len, target->rkey);
 		return 0;
 	}
 
-	/* If we start at an offset into the FMR page, don't merge into
-	 * the current FMR. Finish it out, and use the kernel's MR for this
-	 * sg entry. This is to avoid potential bugs on some SRP targets
-	 * that were never quite defined, but went away when the initiator
-	 * avoided using FMR on such page fragments.
+	/*
+	 * Since not all RDMA HW drivers support non-zero page offsets for
+	 * FMR, if we start at an offset into a page, don't merge into the
+	 * current FMR mapping. Finish it out, and use the kernel's MR for
+	 * this sg entry.
 	 */
-	if (dma_addr & ~dev->mr_page_mask || dma_len > dev->mr_max_size) {
+	if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
+	    dma_len > dev->mr_max_size) {
 		ret = srp_finish_mapping(state, target);
 		if (ret)
 			return ret;
@@ -1059,16 +1300,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 		return 0;
 	}
 
-	/* If this is the first sg to go into the FMR, save our position.
-	 * We need to know the first unmapped entry, its index, and the
-	 * first unmapped address within that entry to be able to restart
-	 * mapping after an error.
+	/*
+	 * If this is the first sg that will be mapped via FMR or via FR, save
+	 * our position. We need to know the first unmapped entry, its index,
+	 * and the first unmapped address within that entry to be able to
+	 * restart mapping after an error.
 	 */
 	if (!state->unmapped_sg)
 		srp_map_update_start(state, sg, sg_index, dma_addr);
 
 	while (dma_len) {
-		if (state->npages == SRP_MAX_PAGES_PER_MR) {
+		unsigned offset = dma_addr & ~dev->mr_page_mask;
+		if (state->npages == SRP_MAX_PAGES_PER_MR || offset != 0) {
 			ret = srp_finish_mapping(state, target);
 			if (ret)
 				return ret;
@@ -1076,17 +1319,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 			srp_map_update_start(state, sg, sg_index, dma_addr);
 		}
 
-		len = min_t(unsigned int, dma_len, dev->mr_page_size);
+		len = min_t(unsigned int, dma_len, dev->mr_page_size - offset);
 
 		if (!state->npages)
 			state->base_dma_addr = dma_addr;
-		state->pages[state->npages++] = dma_addr;
+		state->pages[state->npages++] = dma_addr & dev->mr_page_mask;
 		state->dma_len += len;
 		dma_addr += len;
 		dma_len -= len;
 	}
 
-	/* If the last entry of the FMR wasn't a full page, then we need to
+	/*
+	 * If the last entry of the MR wasn't a full page, then we need to
 	 * close it out and start a new one -- we can only merge at page
 	 * boundries.
 	 */
@@ -1099,25 +1343,33 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	return ret;
 }
 
-static void srp_map_fmr(struct srp_map_state *state,
-			struct srp_target_port *target, struct srp_request *req,
-			struct scatterlist *scat, int count)
+static int srp_map_sg(struct srp_map_state *state,
+		      struct srp_target_port *target, struct srp_request *req,
+		      struct scatterlist *scat, int count)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
 	struct scatterlist *sg;
-	int i, use_fmr;
+	int i;
+	bool use_memory_registration;
 
 	state->desc	= req->indirect_desc;
 	state->pages	= req->map_page;
-	state->next_fmr	= req->fmr_list;
-
-	use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
+	if (dev->use_fast_reg) {
+		state->next_fmr = req->fmr_list;
+		use_memory_registration = !!target->fr_pool;
+	} else {
+		state->next_fr = req->fr_list;
+		use_memory_registration = !!target->fmr_pool;
+	}
 
 	for_each_sg(scat, sg, count, i) {
-		if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
-			/* FMR mapping failed, so backtrack to the first
-			 * unmapped entry and continue on without using FMR.
+		if (srp_map_sg_entry(state, target, sg, i,
+				     use_memory_registration)) {
+			/*
+			 * Memory registration failed, so backtrack to the
+			 * first unmapped entry and continue on without using
+			 * memory registration.
 			 */
 			dma_addr_t dma_addr;
 			unsigned int dma_len;
@@ -1130,15 +1382,17 @@ backtrack:
 			dma_len = ib_sg_dma_len(ibdev, sg);
 			dma_len -= (state->unmapped_addr - dma_addr);
 			dma_addr = state->unmapped_addr;
-			use_fmr = SRP_MAP_NO_FMR;
+			use_memory_registration = false;
 			srp_map_desc(state, dma_addr, dma_len, target->rkey);
 		}
 	}
 
-	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_finish_mapping(state, target))
+	if (use_memory_registration && srp_finish_mapping(state, target))
 		goto backtrack;
 
 	req->nmdesc = state->nmdesc;
+
+	return 0;
 }
 
 static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
@@ -1195,9 +1449,9 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 		goto map_complete;
 	}
 
-	/* We have more than one scatter/gather entry, so build our indirect
-	 * descriptor table, trying to merge as many entries with FMR as we
-	 * can.
+	/*
+	 * We have more than one scatter/gather entry, so build our indirect
+	 * descriptor table, trying to merge as many entries as we can.
 	 */
 	indirect_hdr = (void *) cmd->add_data;
 
@@ -1205,7 +1459,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 				   target->indirect_size, DMA_TO_DEVICE);
 
 	memset(&state, 0, sizeof(state));
-	srp_map_fmr(&state, target, req, scat, count);
+	srp_map_sg(&state, target, req, scat, count);
 
 	/* We've mapped the request, now pull as much of the indirect
 	 * descriptor table as we can into the command buffer. If this
@@ -1214,7 +1468,8 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 	 * give us more S/G entries than we allow.
 	 */
 	if (state.ndesc == 1) {
-		/* FMR mapping was able to collapse this to one entry,
+		/*
+		 * Memory registration collapsed the sg-list into one entry,
 		 * so use a direct descriptor.
 		 */
 		struct srp_direct_buf *buf = (void *) cmd->add_data;
@@ -1537,14 +1792,24 @@ static void srp_tl_err_work(struct work_struct *work)
 		srp_start_tl_fail_timers(target->rport);
 }
 
-static void srp_handle_qp_err(enum ib_wc_status wc_status, bool send_err,
-			      struct srp_target_port *target)
+static void srp_handle_qp_err(u64 wr_id, enum ib_wc_status wc_status,
+			      bool send_err, struct srp_target_port *target)
 {
 	if (target->connected && !target->qp_in_error) {
-		shost_printk(KERN_ERR, target->scsi_host,
-			     PFX "failed %s status %d\n",
-			     send_err ? "send" : "receive",
-			     wc_status);
+		if (wr_id & LOCAL_INV_WR_ID_MASK) {
+			shost_printk(KERN_ERR, target->scsi_host,
+				     "LOCAL_INV failed with status %d\n",
+				     wc_status);
+		} else if (wr_id & FAST_REG_WR_ID_MASK) {
+			shost_printk(KERN_ERR, target->scsi_host,
+				     "FAST_REG_MR failed status %d\n",
+				     wc_status);
+		} else {
+			shost_printk(KERN_ERR, target->scsi_host,
+				     PFX "failed %s status %d for iu %p\n",
+				     send_err ? "send" : "receive",
+				     wc_status, (void *)(uintptr_t)wr_id);
+		}
 		queue_work(system_long_wq, &target->tl_err_work);
 	}
 	target->qp_in_error = true;
@@ -1560,7 +1825,7 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
 		if (likely(wc.status == IB_WC_SUCCESS)) {
 			srp_handle_recv(target, &wc);
 		} else {
-			srp_handle_qp_err(wc.status, false, target);
+			srp_handle_qp_err(wc.wr_id, wc.status, false, target);
 		}
 	}
 }
@@ -1576,7 +1841,7 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
 			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
 			list_add(&iu->list, &target->free_tx);
 		} else {
-			srp_handle_qp_err(wc.status, true, target);
+			srp_handle_qp_err(wc.wr_id, wc.status, true, target);
 		}
 	}
 }
@@ -2689,7 +2954,8 @@ static ssize_t srp_create_target(struct device *dev,
 		container_of(dev, struct srp_host, dev);
 	struct Scsi_Host *target_host;
 	struct srp_target_port *target;
-	struct ib_device *ibdev = host->srp_dev->dev;
+	struct srp_device *srp_dev = host->srp_dev;
+	struct ib_device *ibdev = srp_dev->dev;
 	int ret;
 
 	target_host = scsi_host_alloc(&srp_template,
@@ -2738,10 +3004,6 @@ static ssize_t srp_create_target(struct device *dev,
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
 	INIT_LIST_HEAD(&target->free_tx);
-	ret = srp_alloc_req_data(target);
-	if (ret)
-		goto err_free_mem;
-
 	ret = ib_query_gid(ibdev, host->port, 0, &target->path.sgid);
 	if (ret)
 		goto err_free_mem;
@@ -2752,7 +3014,7 @@ static ssize_t srp_create_target(struct device *dev,
 
 	if (!target->fmr_pool && !target->allow_ext_sg &&
 	    target->cmd_sg_cnt < target->sg_tablesize) {
-		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
+		pr_warn("No MR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
 		target->sg_tablesize = target->cmd_sg_cnt;
 	}
 
@@ -2763,6 +3025,10 @@ static ssize_t srp_create_target(struct device *dev,
 			     sizeof(struct srp_indirect_buf) +
 			     target->cmd_sg_cnt * sizeof(struct srp_direct_buf);
 
+	ret = srp_alloc_req_data(target);
+	if (ret)
+		goto err_free_mem;
+
 	ret = srp_new_cm_id(target);
 	if (ret)
 		goto err_free_ib;
@@ -2876,6 +3142,7 @@ static void srp_add_one(struct ib_device *device)
 	struct ib_device_attr *dev_attr;
 	struct srp_host *host;
 	int mr_page_shift, s, e, p;
+	bool have_fmr = false, have_fr = false;
 
 	dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
 	if (!dev_attr)
@@ -2890,6 +3157,19 @@ static void srp_add_one(struct ib_device *device)
 	if (!srp_dev)
 		goto free_attr;
 
+	if (device->alloc_fmr && device->dealloc_fmr && device->map_phys_fmr &&
+	    device->unmap_fmr) {
+		have_fmr = true;
+	}
+	if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)
+		have_fr = true;
+	if (!have_fmr && !have_fr) {
+		dev_err(&device->dev, "neither FMR nor FR is supported\n");
+		goto free_dev;
+	}
+
+	srp_dev->use_fast_reg = have_fr && (!have_fmr || prefer_fr);
+
 	/*
 	 * Use the smallest page size supported by the HCA, down to a
 	 * minimum of 4096 bytes. We're unlikely to build large sglists
@@ -2900,6 +3180,11 @@ static void srp_add_one(struct ib_device *device)
 	srp_dev->mr_page_mask	= ~((u64) srp_dev->mr_page_size - 1);
 	srp_dev->max_pages_per_mr = min_t(u64, SRP_MAX_PAGES_PER_MR,
 				dev_attr->max_mr_size / srp_dev->mr_page_size);
+	if (srp_dev->use_fast_reg) {
+		srp_dev->max_pages_per_mr =
+			min_t(u32, srp_dev->max_pages_per_mr,
+			      dev_attr->max_fast_reg_page_list_len);
+	}
 	srp_dev->mr_max_size	= srp_dev->mr_page_size *
 				   srp_dev->max_pages_per_mr;
 
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index fccf5df..fb465fd 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -68,8 +68,8 @@ enum {
 
 	SRP_MAX_PAGES_PER_MR	= 512,
 
-	SRP_MAP_ALLOW_FMR	= 0,
-	SRP_MAP_NO_FMR		= 1,
+	LOCAL_INV_WR_ID_MASK	= 1,
+	FAST_REG_WR_ID_MASK	= 2,
 };
 
 enum srp_target_state {
@@ -83,6 +83,12 @@ enum srp_iu_type {
 	SRP_IU_RSP,
 };
 
+/*
+ * @mr_page_mask: HCA memory registration page mask.
+ * @mr_page_size: HCA memory registration page size.
+ * @mr_max_size: Maximum size in bytes of a single FMR / FR registration
+ *   request.
+ */
 struct srp_device {
 	struct list_head	dev_list;
 	struct ib_device       *dev;
@@ -92,6 +98,7 @@ struct srp_device {
 	int			mr_page_size;
 	int			mr_max_size;
 	int			max_pages_per_mr;
+	bool			use_fast_reg;
 };
 
 struct srp_host {
@@ -109,12 +116,15 @@ struct srp_request {
 	struct list_head	list;
 	struct scsi_cmnd       *scmnd;
 	struct srp_iu	       *cmd;
-	struct ib_pool_fmr    **fmr_list;
 	u64		       *map_page;
 	struct srp_direct_buf  *indirect_desc;
 	dma_addr_t		indirect_dma_addr;
 	short			nmdesc;
 	short			index;
+	union {
+		struct ib_pool_fmr **fmr_list;
+		struct srp_fr_desc **fr_list;
+	};
 };
 
 struct srp_target_port {
@@ -128,7 +138,10 @@ struct srp_target_port {
 	struct ib_cq	       *send_cq ____cacheline_aligned_in_smp;
 	struct ib_cq	       *recv_cq;
 	struct ib_qp	       *qp;
-	struct ib_fmr_pool     *fmr_pool;
+	union {
+		struct ib_fmr_pool     *fmr_pool;
+		struct srp_fr_pool     *fr_pool;
+	};
 	u32			lkey;
 	u32			rkey;
 	enum srp_target_state	state;
@@ -195,8 +208,59 @@ struct srp_iu {
 	enum dma_data_direction	direction;
 };
 
+/**
+ * struct srp_fr_desc - fast registration work request arguments
+ * @entry: Entry in srp_fr_pool.free_list.
+ * @mr:    Memory region.
+ * @frpl:  Fast registration page list.
+ */
+struct srp_fr_desc {
+	struct list_head		entry;
+	struct ib_mr			*mr;
+	struct ib_fast_reg_page_list	*frpl;
+};
+
+/**
+ * struct srp_fr_pool - pool of fast registration descriptors
+ *
+ * An entry is available for allocation if and only if it occurs in @free_list.
+ *
+ * @size:      Number of descriptors in this pool.
+ * @max_page_list_len: Maximum fast registration work request page list length.
+ * @lock:      Protects free_list.
+ * @free_list: List of free descriptors.
+ * @desc:      Fast registration descriptor pool.
+ */
+struct srp_fr_pool {
+	int			size;
+	int			max_page_list_len;
+	spinlock_t		lock;
+	struct list_head	free_list;
+	struct srp_fr_desc	desc[0];
+};
+
+/**
+ * struct srp_map_state - per-request DMA memory mapping state
+ * @desc:	    Pointer to the element of the SRP buffer descriptor array
+ *		    that is being filled in.
+ * @pages:	    Array with DMA addresses of pages being considered for
+ *		    memory registration.
+ * @base_dma_addr:  DMA address of the first page that has not yet been mapped.
+ * @dma_len:	    Number of bytes that will be registered with the next
+ *		    FMR or FR memory registration call.
+ * @total_len:	    Total number of bytes in the sg-list being mapped.
+ * @npages:	    Number of page addresses in the pages[] array.
+ * @nmdesc:	    Number of FMR or FR memory descriptors used for mapping.
+ * @ndesc:	    Number of SRP buffer descriptors that have been filled in.
+ * @unmapped_sg:    First element of the sg-list that is mapped via FMR or FR.
+ * @unmapped_index: Index of the first element mapped via FMR or FR.
+ * @unmapped_addr:  DMA address of the first element mapped via FMR or FR.
+ */
 struct srp_map_state {
-	struct ib_pool_fmr    **next_fmr;
+	union {
+		struct ib_pool_fmr **next_fmr;
+		struct srp_fr_desc **next_fr;
+	};
 	struct srp_direct_buf  *desc;
 	u64		       *pages;
 	dma_addr_t		base_dma_addr;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found]     ` <53722FE2.4010808-HInyCGIudOg@public.gmane.org>
@ 2014-05-13 16:48       ` Sagi Grimberg
       [not found]         ` <53724CC9.6080509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2014-05-13 16:48 UTC (permalink / raw)
  To: Bart Van Assche, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 5/13/2014 5:44 PM, Bart Van Assche wrote:
> Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
> ConnectX VF) support fast registration but not FMR. Hence add fast
> registration support.
>
> In function srp_rport_reconnect(), move the the srp_finish_req()
> loop from after to before the srp_create_target_ib() call. This is
> needed to avoid that srp_finish_req() tries to queue any
> invalidation requests for rkeys associated with the old queue pair
> on the newly allocated queue pair. Invoking srp_finish_req() before
> the queue pair has been reallocated is safe since srp_claim_req()
> handles completions correctly that arrive after srp_finish_req()
> has been invoked.
>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: David Dillow <dave-i1Mk8JYDVaaSihdK6806/g@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 457 +++++++++++++++++++++++++++++-------
>   drivers/infiniband/ulp/srp/ib_srp.h |  74 +++++-
>   2 files changed, 440 insertions(+), 91 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 4fda7eb..2e0bd4d 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -66,6 +66,7 @@ static unsigned int srp_sg_tablesize;
>   static unsigned int cmd_sg_entries;
>   static unsigned int indirect_sg_entries;
>   static bool allow_ext_sg;
> +static bool prefer_fr;
>   static bool register_always;
>   static int topspin_workarounds = 1;
>   
> @@ -88,6 +89,10 @@ module_param(topspin_workarounds, int, 0444);
>   MODULE_PARM_DESC(topspin_workarounds,
>   		 "Enable workarounds for Topspin/Cisco SRP target bugs if != 0");
>   
> +module_param(prefer_fr, bool, 0444);
> +MODULE_PARM_DESC(prefer_fr,
> +"Whether to use fast registration if both FMR and fast registration are supported");
> +
>   module_param(register_always, bool, 0444);
>   MODULE_PARM_DESC(register_always,
>   		 "Use memory registration even for contiguous memory regions");
> @@ -311,6 +316,132 @@ static struct ib_fmr_pool *srp_alloc_fmr_pool(struct srp_target_port *target)
>   	return ib_create_fmr_pool(dev->pd, &fmr_param);
>   }
>   
> +/**
> + * srp_destroy_fr_pool() - free the resources owned by a pool
> + * @pool: Fast registration pool to be destroyed.
> + */
> +static void srp_destroy_fr_pool(struct srp_fr_pool *pool)
> +{
> +	int i;
> +	struct srp_fr_desc *d;
> +
> +	if (!pool)
> +		return;
> +
> +	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> +		if (d->frpl)
> +			ib_free_fast_reg_page_list(d->frpl);
> +		if (d->mr)
> +			ib_dereg_mr(d->mr);
> +	}
> +	kfree(pool);
> +}
> +
> +/**
> + * srp_create_fr_pool() - allocate and initialize a pool for fast registration
> + * @device:            IB device to allocate fast registration descriptors for.
> + * @pd:                Protection domain associated with the FR descriptors.
> + * @pool_size:         Number of descriptors to allocate.
> + * @max_page_list_len: Maximum fast registration work request page list length.
> + */
> +static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
> +					      struct ib_pd *pd, int pool_size,
> +					      int max_page_list_len)
> +{
> +	struct srp_fr_pool *pool;
> +	struct srp_fr_desc *d;
> +	struct ib_mr *mr;
> +	struct ib_fast_reg_page_list *frpl;
> +	int i, ret = -EINVAL;
> +
> +	if (pool_size <= 0)
> +		goto err;
> +	ret = -ENOMEM;
> +	pool = kzalloc(sizeof(struct srp_fr_pool) +
> +		       pool_size * sizeof(struct srp_fr_desc), GFP_KERNEL);
> +	if (!pool)
> +		goto err;
> +	pool->size = pool_size;
> +	pool->max_page_list_len = max_page_list_len;
> +	spin_lock_init(&pool->lock);
> +	INIT_LIST_HEAD(&pool->free_list);
> +
> +	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
> +		mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
> +		if (IS_ERR(mr)) {
> +			ret = PTR_ERR(mr);
> +			goto destroy_pool;
> +		}
> +		d->mr = mr;
> +		frpl = ib_alloc_fast_reg_page_list(device, max_page_list_len);
> +		if (IS_ERR(frpl)) {
> +			ret = PTR_ERR(frpl);
> +			goto destroy_pool;
> +		}
> +		d->frpl = frpl;
> +		list_add_tail(&d->entry, &pool->free_list);
> +	}
> +
> +out:
> +	return pool;
> +
> +destroy_pool:
> +	srp_destroy_fr_pool(pool);
> +
> +err:
> +	pool = ERR_PTR(ret);
> +	goto out;
> +}
> +
> +/**
> + * srp_fr_pool_get() - obtain a descriptor suitable for fast registration
> + * @pool: Pool to obtain descriptor from.
> + */
> +static struct srp_fr_desc *srp_fr_pool_get(struct srp_fr_pool *pool)
> +{
> +	struct srp_fr_desc *d = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pool->lock, flags);
> +	if (!list_empty(&pool->free_list)) {
> +		d = list_first_entry(&pool->free_list, typeof(*d), entry);
> +		list_del(&d->entry);
> +	}
> +	spin_unlock_irqrestore(&pool->lock, flags);
> +
> +	return d;
> +}
> +
> +/**
> + * srp_fr_pool_put() - put an FR descriptor back in the free list
> + * @pool: Pool the descriptor was allocated from.
> + * @desc: Pointer to an array of fast registration descriptor pointers.
> + * @n:    Number of descriptors to put back.
> + *
> + * Note: The caller must already have queued an invalidation request for
> + * desc->mr->rkey before calling this function.
> + */
> +static void srp_fr_pool_put(struct srp_fr_pool *pool, struct srp_fr_desc **desc,
> +			    int n)
> +{
> +	unsigned long flags;
> +	int i;
> +
> +	spin_lock_irqsave(&pool->lock, flags);
> +	for (i = 0; i < n; i++)
> +		list_add(&desc[i]->entry, &pool->free_list);
> +	spin_unlock_irqrestore(&pool->lock, flags);
> +}
> +
> +static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
> +{
> +	struct srp_device *dev = target->srp_host->srp_dev;
> +
> +	return srp_create_fr_pool(dev->dev, dev->pd,
> +				  target->scsi_host->can_queue,
> +				  dev->max_pages_per_mr);
> +}
> +
>   static int srp_create_target_ib(struct srp_target_port *target)
>   {
>   	struct srp_device *dev = target->srp_host->srp_dev;
> @@ -318,6 +449,8 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   	struct ib_cq *recv_cq, *send_cq;
>   	struct ib_qp *qp;
>   	struct ib_fmr_pool *fmr_pool = NULL;
> +	struct srp_fr_pool *fr_pool = NULL;
> +	const int m = 1 + dev->use_fast_reg;
>   	int ret;
>   
>   	init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
> @@ -332,7 +465,7 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   	}
>   
>   	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
> -			       target->queue_size, target->comp_vector);
> +			       m * target->queue_size, target->comp_vector);

So it is correct to expand the send CQ and QP send queue, but why not x3?
For fast registration we do:
- FASTREG
- RDMA
- LOCAL_INV

I'm aware we are suppressing the completions but I think we need to 
reserve room for FLUSH errors in case
QP went to error state when the send queue is packed.

>   	if (IS_ERR(send_cq)) {
>   		ret = PTR_ERR(send_cq);
>   		goto err_recv_cq;
> @@ -341,11 +474,11 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   	ib_req_notify_cq(recv_cq, IB_CQ_NEXT_COMP);
>   
>   	init_attr->event_handler       = srp_qp_event;
> -	init_attr->cap.max_send_wr     = target->queue_size;
> +	init_attr->cap.max_send_wr     = m * target->queue_size;
>   	init_attr->cap.max_recv_wr     = target->queue_size;
>   	init_attr->cap.max_recv_sge    = 1;
>   	init_attr->cap.max_send_sge    = 1;
> -	init_attr->sq_sig_type         = IB_SIGNAL_ALL_WR;
> +	init_attr->sq_sig_type         = IB_SIGNAL_REQ_WR;
>   	init_attr->qp_type             = IB_QPT_RC;
>   	init_attr->send_cq             = send_cq;
>   	init_attr->recv_cq             = recv_cq;
> @@ -360,19 +493,38 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   	if (ret)
>   		goto err_qp;
>   
> -	if (!target->qp || target->fmr_pool) {
> -		fmr_pool = srp_alloc_fmr_pool(target);
> -		if (IS_ERR(fmr_pool)) {
> -			ret = PTR_ERR(fmr_pool);
> -			shost_printk(KERN_WARNING, target->scsi_host, PFX
> -				     "FMR pool allocation failed (%d)\n", ret);
> -			if (target->qp)
> -				goto err_qp;
> -			fmr_pool = NULL;
> +	if (dev->use_fast_reg) {
> +		if (!target->qp || target->fr_pool) {
> +			fr_pool = srp_alloc_fr_pool(target);
> +			if (IS_ERR(fr_pool)) {
> +				ret = PTR_ERR(fr_pool);
> +				shost_printk(KERN_WARNING, target->scsi_host,
> +					PFX "FR pool allocation failed (%d)\n",
> +					ret);
> +				if (target->qp)
> +					goto err_qp;
> +				fr_pool = NULL;
> +			}
> +			if (target->fr_pool)
> +				srp_destroy_fr_pool(target->fr_pool);
> +			target->fr_pool = fr_pool;
> +		}
> +	} else {
> +		if (!target->qp || target->fmr_pool) {
> +			fmr_pool = srp_alloc_fmr_pool(target);
> +			if (IS_ERR(fmr_pool)) {
> +				ret = PTR_ERR(fmr_pool);
> +				shost_printk(KERN_WARNING, target->scsi_host,
> +					PFX "FMR pool allocation failed (%d)\n",
> +					ret);
> +				if (target->qp)
> +					goto err_qp;
> +				fmr_pool = NULL;
> +			}
> +			if (target->fmr_pool)
> +				ib_destroy_fmr_pool(target->fmr_pool);
> +			target->fmr_pool = fmr_pool;
>   		}
> -		if (target->fmr_pool)
> -			ib_destroy_fmr_pool(target->fmr_pool);
> -		target->fmr_pool = fmr_pool;
>   	}
>   
>   	if (target->qp)
> @@ -409,10 +561,16 @@ err:
>    */
>   static void srp_free_target_ib(struct srp_target_port *target)
>   {
> +	struct srp_device *dev = target->srp_host->srp_dev;
>   	int i;
>   
> -	if (target->fmr_pool)
> -		ib_destroy_fmr_pool(target->fmr_pool);
> +	if (dev->use_fast_reg) {
> +		if (target->fr_pool)
> +			srp_destroy_fr_pool(target->fr_pool);
> +	} else {
> +		if (target->fmr_pool)
> +			ib_destroy_fmr_pool(target->fmr_pool);
> +	}
>   	ib_destroy_qp(target->qp);
>   	ib_destroy_cq(target->send_cq);
>   	ib_destroy_cq(target->recv_cq);
> @@ -617,7 +775,8 @@ static void srp_disconnect_target(struct srp_target_port *target)
>   
>   static void srp_free_req_data(struct srp_target_port *target)
>   {
> -	struct ib_device *ibdev = target->srp_host->srp_dev->dev;
> +	struct srp_device *dev = target->srp_host->srp_dev;
> +	struct ib_device *ibdev = dev->dev;
>   	struct srp_request *req;
>   	int i;
>   
> @@ -626,7 +785,10 @@ static void srp_free_req_data(struct srp_target_port *target)
>   
>   	for (i = 0; i < target->req_ring_size; ++i) {
>   		req = &target->req_ring[i];
> -		kfree(req->fmr_list);
> +		if (dev->use_fast_reg)
> +			kfree(req->fr_list);
> +		else
> +			kfree(req->fmr_list);
>   		kfree(req->map_page);
>   		if (req->indirect_dma_addr) {
>   			ib_dma_unmap_single(ibdev, req->indirect_dma_addr,
> @@ -645,6 +807,7 @@ static int srp_alloc_req_data(struct srp_target_port *target)
>   	struct srp_device *srp_dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = srp_dev->dev;
>   	struct srp_request *req;
> +	void *mr_list;
>   	dma_addr_t dma_addr;
>   	int i, ret = -ENOMEM;
>   
> @@ -657,12 +820,20 @@ static int srp_alloc_req_data(struct srp_target_port *target)
>   
>   	for (i = 0; i < target->req_ring_size; ++i) {
>   		req = &target->req_ring[i];
> -		req->fmr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
> -					GFP_KERNEL);
> +		mr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
> +				  GFP_KERNEL);
> +		if (!mr_list)
> +			goto out;
> +		if (srp_dev->use_fast_reg)
> +			req->fr_list = mr_list;
> +		else
> +			req->fmr_list = mr_list;
>   		req->map_page = kmalloc(SRP_MAX_PAGES_PER_MR * sizeof(void *),
>   					GFP_KERNEL);
> +		if (!req->map_page)
> +			goto out;
>   		req->indirect_desc = kmalloc(target->indirect_size, GFP_KERNEL);
> -		if (!req->fmr_list || !req->map_page || !req->indirect_desc)
> +		if (!req->indirect_desc)
>   			goto out;
>   
>   		dma_addr = ib_dma_map_single(ibdev, req->indirect_desc,
> @@ -799,21 +970,48 @@ static int srp_connect_target(struct srp_target_port *target)
>   	}
>   }
>   
> +static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
> +{
> +	struct ib_send_wr *bad_wr;
> +	struct ib_send_wr wr = {
> +		.opcode		    = IB_WR_LOCAL_INV,
> +		.wr_id		    = LOCAL_INV_WR_ID_MASK,
> +		.next		    = NULL,
> +		.num_sge	    = 0,
> +		.send_flags	    = 0,
> +		.ex.invalidate_rkey = rkey,
> +	};
> +
> +	return ib_post_send(target->qp, &wr, &bad_wr);
> +}
> +
>   static void srp_unmap_data(struct scsi_cmnd *scmnd,
>   			   struct srp_target_port *target,
>   			   struct srp_request *req)
>   {
> -	struct ib_device *ibdev = target->srp_host->srp_dev->dev;
> -	struct ib_pool_fmr **pfmr;
> +	struct srp_device *dev = target->srp_host->srp_dev;
> +	struct ib_device *ibdev = dev->dev;
> +	int i;
>   
>   	if (!scsi_sglist(scmnd) ||
>   	    (scmnd->sc_data_direction != DMA_TO_DEVICE &&
>   	     scmnd->sc_data_direction != DMA_FROM_DEVICE))
>   		return;
>   
> -	pfmr = req->fmr_list;
> -	while (req->nmdesc--)
> -		ib_fmr_pool_unmap(*pfmr++);
> +	if (dev->use_fast_reg) {
> +		struct srp_fr_desc **pfr;
> +
> +		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
> +			srp_inv_rkey(target, (*pfr)->mr->rkey);

No check on return code here? I assume we should react here if we failed 
to post a work request right?

> +		if (req->nmdesc)
> +			srp_fr_pool_put(target->fr_pool, req->fr_list,
> +					req->nmdesc);
> +	} else {
> +		struct ib_pool_fmr **pfmr;
> +
> +		for (i = req->nmdesc, pfmr = req->fmr_list; i > 0; i--, pfmr++)
> +			ib_fmr_pool_unmap(*pfmr);
> +	}
>   
>   	ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
>   			scmnd->sc_data_direction);
> @@ -926,21 +1124,19 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	 * callbacks will have finished before a new QP is allocated.
>   	 */
>   	ret = srp_new_cm_id(target);
> -	/*
> -	 * Whether or not creating a new CM ID succeeded, create a new
> -	 * QP. This guarantees that all completion callback function
> -	 * invocations have finished before request resetting starts.
> -	 */
> -	if (ret == 0)
> -		ret = srp_create_target_ib(target);
> -	else
> -		srp_create_target_ib(target);
>   
>   	for (i = 0; i < target->req_ring_size; ++i) {
>   		struct srp_request *req = &target->req_ring[i];
>   		srp_finish_req(target, req, NULL, DID_RESET << 16);
>   	}
>   
> +	/*
> +	 * Whether or not creating a new CM ID succeeded, create a new
> +	 * QP. This guarantees that all callback functions for the old QP have
> +	 * finished before any send requests are posted on the new QP.
> +	 */
> +	ret += srp_create_target_ib(target);
> +
>   	INIT_LIST_HEAD(&target->free_tx);
>   	for (i = 0; i < target->queue_size; ++i)
>   		list_add(&target->tx_ring[i]->list, &target->free_tx);
> @@ -988,6 +1184,47 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
>   	return 0;
>   }
>   
> +static int srp_map_finish_fr(struct srp_map_state *state,
> +			     struct srp_target_port *target)
> +{
> +	struct srp_device *dev = target->srp_host->srp_dev;
> +	struct ib_send_wr *bad_wr;
> +	struct ib_send_wr wr;
> +	struct srp_fr_desc *desc;
> +	u32 rkey;
> +
> +	desc = srp_fr_pool_get(target->fr_pool);
> +	if (!desc)
> +		return -ENOMEM;
> +
> +	rkey = ib_inc_rkey(desc->mr->rkey);
> +	ib_update_fast_reg_key(desc->mr, rkey);
> +
> +	memcpy(desc->frpl->page_list, state->pages,
> +	       sizeof(state->pages[0]) * state->npages);

I would really like to avoid this memcpy. Any chance we can map directly 
to frpl->page_list instead of
instead?

> +
> +	memset(&wr, 0, sizeof(wr));
> +	wr.opcode = IB_WR_FAST_REG_MR;
> +	wr.wr_id = FAST_REG_WR_ID_MASK;
> +	wr.wr.fast_reg.iova_start = state->base_dma_addr;
> +	wr.wr.fast_reg.page_list = desc->frpl;
> +	wr.wr.fast_reg.page_list_len = state->npages;
> +	wr.wr.fast_reg.page_shift = ilog2(dev->mr_page_size);
> +	wr.wr.fast_reg.length = state->dma_len;
> +	wr.wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE |
> +				       IB_ACCESS_REMOTE_READ |
> +				       IB_ACCESS_REMOTE_WRITE);
> +	wr.wr.fast_reg.rkey = desc->mr->lkey;
> +
> +	*state->next_fr++ = desc;
> +	state->nmdesc++;
> +
> +	srp_map_desc(state, state->base_dma_addr, state->dma_len,
> +		     desc->mr->rkey);
> +
> +	return ib_post_send(target->qp, &wr, &bad_wr);
> +}
> +
>   static int srp_finish_mapping(struct srp_map_state *state,
>   			      struct srp_target_port *target)
>   {
> @@ -1000,7 +1237,9 @@ static int srp_finish_mapping(struct srp_map_state *state,
>   		srp_map_desc(state, state->base_dma_addr, state->dma_len,
>   			     target->rkey);
>   	else
> -		ret = srp_map_finish_fmr(state, target);
> +		ret = target->srp_host->srp_dev->use_fast_reg ?
> +			srp_map_finish_fr(state, target) :
> +			srp_map_finish_fmr(state, target);
>   
>   	if (ret == 0) {
>   		state->npages = 0;
> @@ -1022,7 +1261,7 @@ static void srp_map_update_start(struct srp_map_state *state,
>   static int srp_map_sg_entry(struct srp_map_state *state,
>   			    struct srp_target_port *target,
>   			    struct scatterlist *sg, int sg_index,
> -			    int use_fmr)
> +			    bool use_memory_registration)
>   {
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
> @@ -1034,22 +1273,24 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   	if (!dma_len)
>   		return 0;
>   
> -	if (use_fmr == SRP_MAP_NO_FMR) {
> -		/* Once we're in direct map mode for a request, we don't
> -		 * go back to FMR mode, so no need to update anything
> +	if (!use_memory_registration) {
> +		/*
> +		 * Once we're in direct map mode for a request, we don't
> +		 * go back to FMR or FR mode, so no need to update anything
>   		 * other than the descriptor.
>   		 */
>   		srp_map_desc(state, dma_addr, dma_len, target->rkey);
>   		return 0;
>   	}
>   
> -	/* If we start at an offset into the FMR page, don't merge into
> -	 * the current FMR. Finish it out, and use the kernel's MR for this
> -	 * sg entry. This is to avoid potential bugs on some SRP targets
> -	 * that were never quite defined, but went away when the initiator
> -	 * avoided using FMR on such page fragments.
> +	/*
> +	 * Since not all RDMA HW drivers support non-zero page offsets for
> +	 * FMR, if we start at an offset into a page, don't merge into the
> +	 * current FMR mapping. Finish it out, and use the kernel's MR for
> +	 * this sg entry.
>   	 */
> -	if (dma_addr & ~dev->mr_page_mask || dma_len > dev->mr_max_size) {
> +	if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
> +	    dma_len > dev->mr_max_size) {
>   		ret = srp_finish_mapping(state, target);
>   		if (ret)
>   			return ret;
> @@ -1059,16 +1300,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   		return 0;
>   	}
>   
> -	/* If this is the first sg to go into the FMR, save our position.
> -	 * We need to know the first unmapped entry, its index, and the
> -	 * first unmapped address within that entry to be able to restart
> -	 * mapping after an error.
> +	/*
> +	 * If this is the first sg that will be mapped via FMR or via FR, save
> +	 * our position. We need to know the first unmapped entry, its index,
> +	 * and the first unmapped address within that entry to be able to
> +	 * restart mapping after an error.
>   	 */
>   	if (!state->unmapped_sg)
>   		srp_map_update_start(state, sg, sg_index, dma_addr);
>   
>   	while (dma_len) {
> -		if (state->npages == SRP_MAX_PAGES_PER_MR) {
> +		unsigned offset = dma_addr & ~dev->mr_page_mask;
> +		if (state->npages == SRP_MAX_PAGES_PER_MR || offset != 0) {
>   			ret = srp_finish_mapping(state, target);
>   			if (ret)
>   				return ret;
> @@ -1076,17 +1319,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   			srp_map_update_start(state, sg, sg_index, dma_addr);
>   		}
>   
> -		len = min_t(unsigned int, dma_len, dev->mr_page_size);
> +		len = min_t(unsigned int, dma_len, dev->mr_page_size - offset);
>   
>   		if (!state->npages)
>   			state->base_dma_addr = dma_addr;
> -		state->pages[state->npages++] = dma_addr;
> +		state->pages[state->npages++] = dma_addr & dev->mr_page_mask;
>   		state->dma_len += len;
>   		dma_addr += len;
>   		dma_len -= len;
>   	}
>   
> -	/* If the last entry of the FMR wasn't a full page, then we need to
> +	/*
> +	 * If the last entry of the MR wasn't a full page, then we need to
>   	 * close it out and start a new one -- we can only merge at page
>   	 * boundries.
>   	 */
> @@ -1099,25 +1343,33 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   	return ret;
>   }
>   
> -static void srp_map_fmr(struct srp_map_state *state,
> -			struct srp_target_port *target, struct srp_request *req,
> -			struct scatterlist *scat, int count)
> +static int srp_map_sg(struct srp_map_state *state,
> +		      struct srp_target_port *target, struct srp_request *req,
> +		      struct scatterlist *scat, int count)
>   {
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
>   	struct scatterlist *sg;
> -	int i, use_fmr;
> +	int i;
> +	bool use_memory_registration;
>   
>   	state->desc	= req->indirect_desc;
>   	state->pages	= req->map_page;
> -	state->next_fmr	= req->fmr_list;
> -
> -	use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
> +	if (dev->use_fast_reg) {
> +		state->next_fmr = req->fmr_list;

is this correct?
shouldn't this be state->next_fr = req->fr_list (dev->use_fast_reg == true)?
or did I misunderstood?

> +		use_memory_registration = !!target->fr_pool;
> +	} else {
> +		state->next_fr = req->fr_list;

Same here?

> +		use_memory_registration = !!target->fmr_pool;
> +	}
>   
>   	for_each_sg(scat, sg, count, i) {
> -		if (srp_map_sg_entry(state, target, sg, i, use_fmr)) {
> -			/* FMR mapping failed, so backtrack to the first
> -			 * unmapped entry and continue on without using FMR.
> +		if (srp_map_sg_entry(state, target, sg, i,
> +				     use_memory_registration)) {
> +			/*
> +			 * Memory registration failed, so backtrack to the
> +			 * first unmapped entry and continue on without using
> +			 * memory registration.
>   			 */
>   			dma_addr_t dma_addr;
>   			unsigned int dma_len;
> @@ -1130,15 +1382,17 @@ backtrack:
>   			dma_len = ib_sg_dma_len(ibdev, sg);
>   			dma_len -= (state->unmapped_addr - dma_addr);
>   			dma_addr = state->unmapped_addr;
> -			use_fmr = SRP_MAP_NO_FMR;
> +			use_memory_registration = false;
>   			srp_map_desc(state, dma_addr, dma_len, target->rkey);
>   		}
>   	}
>   
> -	if (use_fmr == SRP_MAP_ALLOW_FMR && srp_finish_mapping(state, target))
> +	if (use_memory_registration && srp_finish_mapping(state, target))
>   		goto backtrack;
>   
>   	req->nmdesc = state->nmdesc;
> +
> +	return 0;
>   }
>   
>   static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
> @@ -1195,9 +1449,9 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
>   		goto map_complete;
>   	}
>   
> -	/* We have more than one scatter/gather entry, so build our indirect
> -	 * descriptor table, trying to merge as many entries with FMR as we
> -	 * can.
> +	/*
> +	 * We have more than one scatter/gather entry, so build our indirect
> +	 * descriptor table, trying to merge as many entries as we can.
>   	 */
>   	indirect_hdr = (void *) cmd->add_data;
>   
> @@ -1205,7 +1459,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
>   				   target->indirect_size, DMA_TO_DEVICE);
>   
>   	memset(&state, 0, sizeof(state));
> -	srp_map_fmr(&state, target, req, scat, count);
> +	srp_map_sg(&state, target, req, scat, count);
>   
>   	/* We've mapped the request, now pull as much of the indirect
>   	 * descriptor table as we can into the command buffer. If this
> @@ -1214,7 +1468,8 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
>   	 * give us more S/G entries than we allow.
>   	 */
>   	if (state.ndesc == 1) {
> -		/* FMR mapping was able to collapse this to one entry,
> +		/*
> +		 * Memory registration collapsed the sg-list into one entry,
>   		 * so use a direct descriptor.
>   		 */
>   		struct srp_direct_buf *buf = (void *) cmd->add_data;
> @@ -1537,14 +1792,24 @@ static void srp_tl_err_work(struct work_struct *work)
>   		srp_start_tl_fail_timers(target->rport);
>   }
>   
> -static void srp_handle_qp_err(enum ib_wc_status wc_status, bool send_err,
> -			      struct srp_target_port *target)
> +static void srp_handle_qp_err(u64 wr_id, enum ib_wc_status wc_status,
> +			      bool send_err, struct srp_target_port *target)
>   {
>   	if (target->connected && !target->qp_in_error) {
> -		shost_printk(KERN_ERR, target->scsi_host,
> -			     PFX "failed %s status %d\n",
> -			     send_err ? "send" : "receive",
> -			     wc_status);
> +		if (wr_id & LOCAL_INV_WR_ID_MASK) {
> +			shost_printk(KERN_ERR, target->scsi_host,
> +				     "LOCAL_INV failed with status %d\n",
> +				     wc_status);
> +		} else if (wr_id & FAST_REG_WR_ID_MASK) {
> +			shost_printk(KERN_ERR, target->scsi_host,
> +				     "FAST_REG_MR failed status %d\n",
> +				     wc_status);
> +		} else {
> +			shost_printk(KERN_ERR, target->scsi_host,
> +				     PFX "failed %s status %d for iu %p\n",
> +				     send_err ? "send" : "receive",
> +				     wc_status, (void *)(uintptr_t)wr_id);
> +		}
>   		queue_work(system_long_wq, &target->tl_err_work);
>   	}
>   	target->qp_in_error = true;
> @@ -1560,7 +1825,7 @@ static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
>   		if (likely(wc.status == IB_WC_SUCCESS)) {
>   			srp_handle_recv(target, &wc);
>   		} else {
> -			srp_handle_qp_err(wc.status, false, target);
> +			srp_handle_qp_err(wc.wr_id, wc.status, false, target);
>   		}
>   	}
>   }
> @@ -1576,7 +1841,7 @@ static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
>   			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
>   			list_add(&iu->list, &target->free_tx);
>   		} else {
> -			srp_handle_qp_err(wc.status, true, target);
> +			srp_handle_qp_err(wc.wr_id, wc.status, true, target);
>   		}
>   	}
>   }
> @@ -2689,7 +2954,8 @@ static ssize_t srp_create_target(struct device *dev,
>   		container_of(dev, struct srp_host, dev);
>   	struct Scsi_Host *target_host;
>   	struct srp_target_port *target;
> -	struct ib_device *ibdev = host->srp_dev->dev;
> +	struct srp_device *srp_dev = host->srp_dev;
> +	struct ib_device *ibdev = srp_dev->dev;
>   	int ret;
>   
>   	target_host = scsi_host_alloc(&srp_template,
> @@ -2738,10 +3004,6 @@ static ssize_t srp_create_target(struct device *dev,
>   	INIT_WORK(&target->remove_work, srp_remove_work);
>   	spin_lock_init(&target->lock);
>   	INIT_LIST_HEAD(&target->free_tx);
> -	ret = srp_alloc_req_data(target);
> -	if (ret)
> -		goto err_free_mem;
> -
>   	ret = ib_query_gid(ibdev, host->port, 0, &target->path.sgid);
>   	if (ret)
>   		goto err_free_mem;
> @@ -2752,7 +3014,7 @@ static ssize_t srp_create_target(struct device *dev,
>   
>   	if (!target->fmr_pool && !target->allow_ext_sg &&
>   	    target->cmd_sg_cnt < target->sg_tablesize) {
> -		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
> +		pr_warn("No MR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
>   		target->sg_tablesize = target->cmd_sg_cnt;
>   	}
>   
> @@ -2763,6 +3025,10 @@ static ssize_t srp_create_target(struct device *dev,
>   			     sizeof(struct srp_indirect_buf) +
>   			     target->cmd_sg_cnt * sizeof(struct srp_direct_buf);
>   
> +	ret = srp_alloc_req_data(target);
> +	if (ret)
> +		goto err_free_mem;
> +
>   	ret = srp_new_cm_id(target);
>   	if (ret)
>   		goto err_free_ib;
> @@ -2876,6 +3142,7 @@ static void srp_add_one(struct ib_device *device)
>   	struct ib_device_attr *dev_attr;
>   	struct srp_host *host;
>   	int mr_page_shift, s, e, p;
> +	bool have_fmr = false, have_fr = false;
>   
>   	dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL);
>   	if (!dev_attr)
> @@ -2890,6 +3157,19 @@ static void srp_add_one(struct ib_device *device)
>   	if (!srp_dev)
>   		goto free_attr;
>   
> +	if (device->alloc_fmr && device->dealloc_fmr && device->map_phys_fmr &&
> +	    device->unmap_fmr) {
> +		have_fmr = true;
> +	}
> +	if (dev_attr->device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS)
> +		have_fr = true;
> +	if (!have_fmr && !have_fr) {
> +		dev_err(&device->dev, "neither FMR nor FR is supported\n");
> +		goto free_dev;
> +	}
> +
> +	srp_dev->use_fast_reg = have_fr && (!have_fmr || prefer_fr);
> +
>   	/*
>   	 * Use the smallest page size supported by the HCA, down to a
>   	 * minimum of 4096 bytes. We're unlikely to build large sglists
> @@ -2900,6 +3180,11 @@ static void srp_add_one(struct ib_device *device)
>   	srp_dev->mr_page_mask	= ~((u64) srp_dev->mr_page_size - 1);
>   	srp_dev->max_pages_per_mr = min_t(u64, SRP_MAX_PAGES_PER_MR,
>   				dev_attr->max_mr_size / srp_dev->mr_page_size);
> +	if (srp_dev->use_fast_reg) {
> +		srp_dev->max_pages_per_mr =
> +			min_t(u32, srp_dev->max_pages_per_mr,
> +			      dev_attr->max_fast_reg_page_list_len);
> +	}
>   	srp_dev->mr_max_size	= srp_dev->mr_page_size *
>   				   srp_dev->max_pages_per_mr;
>   
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index fccf5df..fb465fd 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -68,8 +68,8 @@ enum {
>   
>   	SRP_MAX_PAGES_PER_MR	= 512,
>   
> -	SRP_MAP_ALLOW_FMR	= 0,
> -	SRP_MAP_NO_FMR		= 1,
> +	LOCAL_INV_WR_ID_MASK	= 1,
> +	FAST_REG_WR_ID_MASK	= 2,
>   };
>   
>   enum srp_target_state {
> @@ -83,6 +83,12 @@ enum srp_iu_type {
>   	SRP_IU_RSP,
>   };
>   
> +/*
> + * @mr_page_mask: HCA memory registration page mask.
> + * @mr_page_size: HCA memory registration page size.
> + * @mr_max_size: Maximum size in bytes of a single FMR / FR registration
> + *   request.
> + */
>   struct srp_device {
>   	struct list_head	dev_list;
>   	struct ib_device       *dev;
> @@ -92,6 +98,7 @@ struct srp_device {
>   	int			mr_page_size;
>   	int			mr_max_size;
>   	int			max_pages_per_mr;
> +	bool			use_fast_reg;
>   };
>   
>   struct srp_host {
> @@ -109,12 +116,15 @@ struct srp_request {
>   	struct list_head	list;
>   	struct scsi_cmnd       *scmnd;
>   	struct srp_iu	       *cmd;
> -	struct ib_pool_fmr    **fmr_list;
>   	u64		       *map_page;
>   	struct srp_direct_buf  *indirect_desc;
>   	dma_addr_t		indirect_dma_addr;
>   	short			nmdesc;
>   	short			index;
> +	union {
> +		struct ib_pool_fmr **fmr_list;
> +		struct srp_fr_desc **fr_list;
> +	};
>   };
>   
>   struct srp_target_port {
> @@ -128,7 +138,10 @@ struct srp_target_port {
>   	struct ib_cq	       *send_cq ____cacheline_aligned_in_smp;
>   	struct ib_cq	       *recv_cq;
>   	struct ib_qp	       *qp;
> -	struct ib_fmr_pool     *fmr_pool;
> +	union {
> +		struct ib_fmr_pool     *fmr_pool;
> +		struct srp_fr_pool     *fr_pool;
> +	};
>   	u32			lkey;
>   	u32			rkey;
>   	enum srp_target_state	state;
> @@ -195,8 +208,59 @@ struct srp_iu {
>   	enum dma_data_direction	direction;
>   };
>   
> +/**
> + * struct srp_fr_desc - fast registration work request arguments
> + * @entry: Entry in srp_fr_pool.free_list.
> + * @mr:    Memory region.
> + * @frpl:  Fast registration page list.
> + */
> +struct srp_fr_desc {
> +	struct list_head		entry;
> +	struct ib_mr			*mr;
> +	struct ib_fast_reg_page_list	*frpl;
> +};
> +
> +/**
> + * struct srp_fr_pool - pool of fast registration descriptors
> + *
> + * An entry is available for allocation if and only if it occurs in @free_list.
> + *
> + * @size:      Number of descriptors in this pool.
> + * @max_page_list_len: Maximum fast registration work request page list length.
> + * @lock:      Protects free_list.
> + * @free_list: List of free descriptors.
> + * @desc:      Fast registration descriptor pool.
> + */
> +struct srp_fr_pool {
> +	int			size;
> +	int			max_page_list_len;
> +	spinlock_t		lock;
> +	struct list_head	free_list;
> +	struct srp_fr_desc	desc[0];
> +};
> +
> +/**
> + * struct srp_map_state - per-request DMA memory mapping state
> + * @desc:	    Pointer to the element of the SRP buffer descriptor array
> + *		    that is being filled in.
> + * @pages:	    Array with DMA addresses of pages being considered for
> + *		    memory registration.
> + * @base_dma_addr:  DMA address of the first page that has not yet been mapped.
> + * @dma_len:	    Number of bytes that will be registered with the next
> + *		    FMR or FR memory registration call.
> + * @total_len:	    Total number of bytes in the sg-list being mapped.
> + * @npages:	    Number of page addresses in the pages[] array.
> + * @nmdesc:	    Number of FMR or FR memory descriptors used for mapping.
> + * @ndesc:	    Number of SRP buffer descriptors that have been filled in.
> + * @unmapped_sg:    First element of the sg-list that is mapped via FMR or FR.
> + * @unmapped_index: Index of the first element mapped via FMR or FR.
> + * @unmapped_addr:  DMA address of the first element mapped via FMR or FR.
> + */
>   struct srp_map_state {
> -	struct ib_pool_fmr    **next_fmr;
> +	union {
> +		struct ib_pool_fmr **next_fmr;
> +		struct srp_fr_desc **next_fr;
> +	};
>   	struct srp_direct_buf  *desc;
>   	u64		       *pages;
>   	dma_addr_t		base_dma_addr;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/9] SRP initiator patches for kernel 3.16
       [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2014-05-13 14:44   ` [PATCH v2 9/9] IB/srp: Add fast registration support Bart Van Assche
@ 2014-05-13 16:50   ` Sagi Grimberg
  9 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2014-05-13 16:50 UTC (permalink / raw)
  To: Bart Van Assche, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 5/13/2014 5:38 PM, Bart Van Assche wrote:
> Changes compared to v1:
> - Modified the FMR code such that one FMR pool is allocated per
>    connection instead of one pool per HCA.
> - Dropped the patch "Make srp_alloc_req_data() reallocate request data".
> - Moved introduction of the register_always kernel module parameter
>    into a separate patch.
> - Removed the loop from around ib_create_fmr_pool() and
>    srp_create_fr_pool(). max_pages_per_mr is now computed from
>    max_mr_size and max_fast_reg_page_list_len.
> - Reduced fast registration pool size from 1024 to scsi_host->can_queue.
> - Added a patch that should fix a crash that had been reported by Sagi
>    but that I have not yet been able to reproduce myself.
>
> This patch series consists of the following nine patches:
>
> 0001-IB-srp-Fix-a-sporadic-crash-triggered-by-cable-pulli.patch
> 0002-IB-srp-Fix-kernel-doc-warnings.patch
> 0003-IB-srp-Introduce-an-additional-local-variable.patch
> 0004-IB-srp-Introduce-srp_map_fmr.patch
> 0005-IB-srp-Introduce-srp_finish_mapping.patch
> 0006-IB-srp-Introduce-the-register_always-kernel-module-p.patch
> 0007-IB-srp-One-FMR-pool-per-SRP-connection.patch
> 0008-IB-srp-Rename-FMR-related-variables.patch
> 0009-IB-srp-Add-fast-registration-support.patch

Hey Bart,

Thanks for the quick re-spin.

I had a look, first 8 patches seems ok to me.
I posted some comments on patch 9/9.

Cheers,
Sagi.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found]         ` <53724CC9.6080509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-05-14  7:05           ` Bart Van Assche
       [not found]             ` <537315CC.1090001-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2014-05-14  7:05 UTC (permalink / raw)
  To: Sagi Grimberg, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 05/13/14 18:48, Sagi Grimberg wrote:
> On 5/13/2014 5:44 PM, Bart Van Assche wrote:
>>   static int srp_create_target_ib(struct srp_target_port *target)
>>   {
>>       struct srp_device *dev = target->srp_host->srp_dev;
>> @@ -318,6 +449,8 @@ static int srp_create_target_ib(struct
>> srp_target_port *target)
>>       struct ib_cq *recv_cq, *send_cq;
>>       struct ib_qp *qp;
>>       struct ib_fmr_pool *fmr_pool = NULL;
>> +    struct srp_fr_pool *fr_pool = NULL;
>> +    const int m = 1 + dev->use_fast_reg;
>>       int ret;
>>         init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
>> @@ -332,7 +465,7 @@ static int srp_create_target_ib(struct
>> srp_target_port *target)
>>       }
>>         send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL,
>> target,
>> -                   target->queue_size, target->comp_vector);
>> +                   m * target->queue_size, target->comp_vector);
> 
> So it is correct to expand the send CQ and QP send queue, but why not x3?
> For fast registration we do:
> - FASTREG
> - RDMA
> - LOCAL_INV
> 
> I'm aware we are suppressing the completions but I think we need to
> reserve room for FLUSH errors in case QP went to error state when the
> send queue is packed.

The first FLUSH error triggers a call of srp_tl_err_work(). The second
and later FLUSH errors are ignored by srp_handle_qp_err(). Do you think
multiplying target->queue_size with a factor 3 instead of 2 would make a
difference in fast registration mode ?

>>   static void srp_unmap_data(struct scsi_cmnd *scmnd,
>>                  struct srp_target_port *target,
>>                  struct srp_request *req)
>>   {
>> -    struct ib_device *ibdev = target->srp_host->srp_dev->dev;
>> -    struct ib_pool_fmr **pfmr;
>> +    struct srp_device *dev = target->srp_host->srp_dev;
>> +    struct ib_device *ibdev = dev->dev;
>> +    int i;
>>         if (!scsi_sglist(scmnd) ||
>>           (scmnd->sc_data_direction != DMA_TO_DEVICE &&
>>            scmnd->sc_data_direction != DMA_FROM_DEVICE))
>>           return;
>>   -    pfmr = req->fmr_list;
>> -    while (req->nmdesc--)
>> -        ib_fmr_pool_unmap(*pfmr++);
>> +    if (dev->use_fast_reg) {
>> +        struct srp_fr_desc **pfr;
>> +
>> +        for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
>> +            srp_inv_rkey(target, (*pfr)->mr->rkey);
>
> No check on return code here? I assume we should react here if we
> failed to post a work request right?

OK, I will add a check for the srp_inv_rkey() return code.

>>   +static int srp_map_finish_fr(struct srp_map_state *state,
>> +                 struct srp_target_port *target)
>> +{
>> +    struct srp_device *dev = target->srp_host->srp_dev;
>> +    struct ib_send_wr *bad_wr;
>> +    struct ib_send_wr wr;
>> +    struct srp_fr_desc *desc;
>> +    u32 rkey;
>> +
>> +    desc = srp_fr_pool_get(target->fr_pool);
>> +    if (!desc)
>> +        return -ENOMEM;
>> +
>> +    rkey = ib_inc_rkey(desc->mr->rkey);
>> +    ib_update_fast_reg_key(desc->mr, rkey);
>> +
>> +    memcpy(desc->frpl->page_list, state->pages,
>> +           sizeof(state->pages[0]) * state->npages);
> 
> I would really like to avoid this memcpy. Any chance we can map directly
> to frpl->page_list instead ?

Avoiding this memcpy() would be relatively easy if all memory that is
holding data for a SCSI command would always be registered. However,
since if register_always == false the fast registration algorithm in
this patch only allocates an rkey if needed (npages > 1) and since how
many pages are present in a mapping descriptor is only known after
state->pages[] has been filled in, eliminating that memcpy would be
challenging.

>> -    state->next_fmr    = req->fmr_list;
>> -
>> -    use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
>> +    if (dev->use_fast_reg) {
>> +        state->next_fmr = req->fmr_list;
> 
> is this correct?
> shouldn't this be state->next_fr = req->fr_list (dev->use_fast_reg ==
> true)?
> or did I misunderstood?
> 
>> +        use_memory_registration = !!target->fr_pool;
>> +    } else {
>> +        state->next_fr = req->fr_list;
> 
> Same here?

req->fmr_list and req->fr_list point at the same memory location since
both pointers are members of the same union. This also holds for
state->next_fmr and state->next_fr. So swapping these two statements as
proposed wouldn't change the generated code. But I agree that this would
make the code easier to read.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found]             ` <537315CC.1090001-HInyCGIudOg@public.gmane.org>
@ 2014-05-14  8:18               ` Sagi Grimberg
       [not found]                 ` <537326BF.3010706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2014-05-14  8:18 UTC (permalink / raw)
  To: Bart Van Assche, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 5/14/2014 10:05 AM, Bart Van Assche wrote:
> On 05/13/14 18:48, Sagi Grimberg wrote:
>> On 5/13/2014 5:44 PM, Bart Van Assche wrote:
>>>    static int srp_create_target_ib(struct srp_target_port *target)
>>>    {
>>>        struct srp_device *dev = target->srp_host->srp_dev;
>>> @@ -318,6 +449,8 @@ static int srp_create_target_ib(struct
>>> srp_target_port *target)
>>>        struct ib_cq *recv_cq, *send_cq;
>>>        struct ib_qp *qp;
>>>        struct ib_fmr_pool *fmr_pool = NULL;
>>> +    struct srp_fr_pool *fr_pool = NULL;
>>> +    const int m = 1 + dev->use_fast_reg;
>>>        int ret;
>>>          init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL);
>>> @@ -332,7 +465,7 @@ static int srp_create_target_ib(struct
>>> srp_target_port *target)
>>>        }
>>>          send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL,
>>> target,
>>> -                   target->queue_size, target->comp_vector);
>>> +                   m * target->queue_size, target->comp_vector);
>> So it is correct to expand the send CQ and QP send queue, but why not x3?
>> For fast registration we do:
>> - FASTREG
>> - RDMA
>> - LOCAL_INV
>>
>> I'm aware we are suppressing the completions but I think we need to
>> reserve room for FLUSH errors in case QP went to error state when the
>> send queue is packed.
> The first FLUSH error triggers a call of srp_tl_err_work(). The second
> and later FLUSH errors are ignored by srp_handle_qp_err(). Do you think
> multiplying target->queue_size with a factor 3 instead of 2 would make a
> difference in fast registration mode ?

Rethinking on this, we post LOCAL_INV only after the command is done, so 
this means that
the FASTREG+SEND are already completed successfully thus the consumer 
index had been incremented.
So I guess it is safe to save room for x2 WRs in the SQ (*although I'm 
not so conclusive on this*).
WRT to the send CQ, the overrun may happen even before consuming the 
FLUSH errors (In case the HW will attempt to put a completion on a full 
queue).
But this is easy, send CQ should match QP send queue size - so if we 
settle for x2 - the send CQ size should be x2 as well.

>>>    static void srp_unmap_data(struct scsi_cmnd *scmnd,
>>>                   struct srp_target_port *target,
>>>                   struct srp_request *req)
>>>    {
>>> -    struct ib_device *ibdev = target->srp_host->srp_dev->dev;
>>> -    struct ib_pool_fmr **pfmr;
>>> +    struct srp_device *dev = target->srp_host->srp_dev;
>>> +    struct ib_device *ibdev = dev->dev;
>>> +    int i;
>>>          if (!scsi_sglist(scmnd) ||
>>>            (scmnd->sc_data_direction != DMA_TO_DEVICE &&
>>>             scmnd->sc_data_direction != DMA_FROM_DEVICE))
>>>            return;
>>>    -    pfmr = req->fmr_list;
>>> -    while (req->nmdesc--)
>>> -        ib_fmr_pool_unmap(*pfmr++);
>>> +    if (dev->use_fast_reg) {
>>> +        struct srp_fr_desc **pfr;
>>> +
>>> +        for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++)
>>> +            srp_inv_rkey(target, (*pfr)->mr->rkey);
>> No check on return code here? I assume we should react here if we
>> failed to post a work request right?
> OK, I will add a check for the srp_inv_rkey() return code.
>
>>>    +static int srp_map_finish_fr(struct srp_map_state *state,
>>> +                 struct srp_target_port *target)
>>> +{
>>> +    struct srp_device *dev = target->srp_host->srp_dev;
>>> +    struct ib_send_wr *bad_wr;
>>> +    struct ib_send_wr wr;
>>> +    struct srp_fr_desc *desc;
>>> +    u32 rkey;
>>> +
>>> +    desc = srp_fr_pool_get(target->fr_pool);
>>> +    if (!desc)
>>> +        return -ENOMEM;
>>> +
>>> +    rkey = ib_inc_rkey(desc->mr->rkey);
>>> +    ib_update_fast_reg_key(desc->mr, rkey);
>>> +
>>> +    memcpy(desc->frpl->page_list, state->pages,
>>> +           sizeof(state->pages[0]) * state->npages);
>> I would really like to avoid this memcpy. Any chance we can map directly
>> to frpl->page_list instead ?
> Avoiding this memcpy() would be relatively easy if all memory that is
> holding data for a SCSI command would always be registered. However,
> since if register_always == false the fast registration algorithm in
> this patch only allocates an rkey if needed (npages > 1) and since how
> many pages are present in a mapping descriptor is only known after
> state->pages[] has been filled in, eliminating that memcpy would be
> challenging.

Yes I see, But since the only constraint that forces us to do this 
memcpy is the current code logic,
I would like to see a /* TODO */ here to remind us that we should 
re-think on how to avoid this.

>>> -    state->next_fmr    = req->fmr_list;
>>> -
>>> -    use_fmr = target->fmr_pool ? SRP_MAP_ALLOW_FMR : SRP_MAP_NO_FMR;
>>> +    if (dev->use_fast_reg) {
>>> +        state->next_fmr = req->fmr_list;
>> is this correct?
>> shouldn't this be state->next_fr = req->fr_list (dev->use_fast_reg ==
>> true)?
>> or did I misunderstood?
>>
>>> +        use_memory_registration = !!target->fr_pool;
>>> +    } else {
>>> +        state->next_fr = req->fr_list;
>> Same here?
> req->fmr_list and req->fr_list point at the same memory location since
> both pointers are members of the same union. This also holds for
> state->next_fmr and state->next_fr. So swapping these two statements as
> proposed wouldn't change the generated code. But I agree that this would
> make the code easier to read.

I didn't think you posted a patch with such an obvious bug...
Just asked about the semantics and as you said, it can improve.

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found]                 ` <537326BF.3010706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-05-14  8:51                   ` Bart Van Assche
       [not found]                     ` <53732EAE.9010207-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2014-05-14  8:51 UTC (permalink / raw)
  To: Sagi Grimberg, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 05/14/14 10:18, Sagi Grimberg wrote:
> On 5/14/2014 10:05 AM, Bart Van Assche wrote:
>> On 05/13/14 18:48, Sagi Grimberg wrote:
>>> On 5/13/2014 5:44 PM, Bart Van Assche wrote:
>>>> +    memcpy(desc->frpl->page_list, state->pages,
>>>> +           sizeof(state->pages[0]) * state->npages);
>>> I would really like to avoid this memcpy. Any chance we can map directly
>>> to frpl->page_list instead ?
>> Avoiding this memcpy() would be relatively easy if all memory that is
>> holding data for a SCSI command would always be registered. However,
>> since if register_always == false the fast registration algorithm in
>> this patch only allocates an rkey if needed (npages > 1) and since how
>> many pages are present in a mapping descriptor is only known after
>> state->pages[] has been filled in, eliminating that memcpy would be
>> challenging.
> 
> Yes I see, But since the only constraint that forces us to do this
> memcpy is the current code logic,
> I would like to see a /* TODO */ here to remind us that we should
> re-think on how to avoid this.

Can you explain why you consider eliminating that memcpy() statement so
important ? If e.g. the page cache submits an sg-list with 255 pages
each 4 KB in size and in the case all these pages are scattered then 255
* 4096 = 1044480 data bytes have to be transferred from main memory to
HCA. In that case the above memcpy() statement copies an additional 255
* 8 = 2040 bytes. That's an overhead of about 0.2%.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 9/9] IB/srp: Add fast registration support
       [not found]                     ` <53732EAE.9010207-HInyCGIudOg@public.gmane.org>
@ 2014-05-14 10:13                       ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2014-05-14 10:13 UTC (permalink / raw)
  To: Bart Van Assche, Roland Dreier
  Cc: Sagi Grimberg, Vu Pham, David Dillow, Sebastian Parschauer,
	linux-rdma

On 5/14/2014 11:51 AM, Bart Van Assche wrote:
> On 05/14/14 10:18, Sagi Grimberg wrote:
>> On 5/14/2014 10:05 AM, Bart Van Assche wrote:
>>> On 05/13/14 18:48, Sagi Grimberg wrote:
>>>> On 5/13/2014 5:44 PM, Bart Van Assche wrote:
>>>>> +    memcpy(desc->frpl->page_list, state->pages,
>>>>> +           sizeof(state->pages[0]) * state->npages);
>>>> I would really like to avoid this memcpy. Any chance we can map directly
>>>> to frpl->page_list instead ?
>>> Avoiding this memcpy() would be relatively easy if all memory that is
>>> holding data for a SCSI command would always be registered. However,
>>> since if register_always == false the fast registration algorithm in
>>> this patch only allocates an rkey if needed (npages > 1) and since how
>>> many pages are present in a mapping descriptor is only known after
>>> state->pages[] has been filled in, eliminating that memcpy would be
>>> challenging.
>> Yes I see, But since the only constraint that forces us to do this
>> memcpy is the current code logic,
>> I would like to see a /* TODO */ here to remind us that we should
>> re-think on how to avoid this.
> Can you explain why you consider eliminating that memcpy() statement so
> important ? If e.g. the page cache submits an sg-list with 255 pages
> each 4 KB in size and in the case all these pages are scattered then 255
> * 4096 = 1044480 data bytes have to be transferred from main memory to
> HCA. In that case the above memcpy() statement copies an additional 255
> * 8 = 2040 bytes. That's an overhead of about 0.2%.

The data transfer is offloaded and doesn't require any CPU cycles.
Today performance bottlenecks lie in SW CPU utilization. Just wanted to 
point out
that this patch introduces unnecessary extra CPU cycles. I guess I'm 
fine with leaving
it as is if no one is bothered with it.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-05-14 10:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-13 14:38 [PATCH v2 0/9] SRP initiator patches for kernel 3.16 Bart Van Assche
     [not found] ` <53722E4F.7070709-HInyCGIudOg@public.gmane.org>
2014-05-13 14:39   ` [PATCH v2 1/9] IB/srp: Fix a sporadic crash triggered by cable pulling Bart Van Assche
2014-05-13 14:40   ` [PATCH v2 2/9] IB/srp: Fix kernel-doc warnings Bart Van Assche
2014-05-13 14:40   ` [PATCH v2 3/9] IB/srp: Introduce an additional local variable Bart Van Assche
2014-05-13 14:41   ` [PATCH v2 4/9] IB/srp: Introduce srp_map_fmr() Bart Van Assche
2014-05-13 14:41   ` [PATCH v2 5/9] IB/srp: Introduce srp_finish_mapping() Bart Van Assche
2014-05-13 14:42   ` [PATCH v2 6/9] IB/srp: Introduce the 'register_always' kernel module parameter Bart Van Assche
2014-05-13 14:43   ` [PATCH v2 7/9] IB/srp: One FMR pool per SRP connection Bart Van Assche
2014-05-13 14:44   ` [PATCH v2 8/9] IB/srp: Rename FMR-related variables Bart Van Assche
2014-05-13 14:44   ` [PATCH v2 9/9] IB/srp: Add fast registration support Bart Van Assche
     [not found]     ` <53722FE2.4010808-HInyCGIudOg@public.gmane.org>
2014-05-13 16:48       ` Sagi Grimberg
     [not found]         ` <53724CC9.6080509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-05-14  7:05           ` Bart Van Assche
     [not found]             ` <537315CC.1090001-HInyCGIudOg@public.gmane.org>
2014-05-14  8:18               ` Sagi Grimberg
     [not found]                 ` <537326BF.3010706-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-05-14  8:51                   ` Bart Van Assche
     [not found]                     ` <53732EAE.9010207-HInyCGIudOg@public.gmane.org>
2014-05-14 10:13                       ` Sagi Grimberg
2014-05-13 16:50   ` [PATCH v2 0/9] SRP initiator patches for kernel 3.16 Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).