Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH v3 01/10] svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
From: Chuck Lever @ 2016-11-29 16:04 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129155521.4477.53561.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Logic copied from xs_setup_bc_tcp().

Fixes: 39a9beab5acb ('rpc: share one xps between all backchannels')
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 20027f8..6035c5a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -359,6 +359,7 @@ static int svc_rdma_bc_sendto(struct svcxprt_rdma *rdma,
 out_fail:
 	xprt_rdma_free_addresses(xprt);
 	args->bc_xprt->xpt_bc_xprt = NULL;
+	args->bc_xprt->xpt_bc_xps = NULL;
 	xprt_put(xprt);
 	xprt_free(xprt);
 	return ERR_PTR(-EINVAL);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 00/10] Server-side NFS/RDMA patches proposed for v4.10
From: Chuck Lever @ 2016-11-29 16:04 UTC (permalink / raw)
  To: bfields-uC3wQj2KruNg9hUCZPvPmw
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

Hi Bruce-

These patches are ready for you to consider for v4.10. Changes proposed
for v4.10 include:

- Drop connection on GSS sequence window overflow
- Remove unnecessary spin lock in the svc_rdma_send path
- A number of minor clean-ups

Available in the "nfsd-rdma-for-4.10" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.10


Changes since v2:
- Rebased on v4.9-rc7


Changes since v1:
- Rebased on v4.9-rc5
- Address checkpatch.pl warnings
- Address compiler warnings
- Remove a dprintk in svc_rdma_get_inv_rkey()
- Patches reordered so fixes come before clean-ups

---

Chuck Lever (10):
      svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
      svcauth_gss: Close connection when dropping an incoming message
      svcrdma: Renovate sendto chunk list parsing
      svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
      svcrdma: Remove DMA map accounting
      svcrdma: Remove svc_rdma_op_ctxt::wc_status
      svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
      svcrdma: Remove unused variable in rdma_copy_tail()
      svcrdma: Break up dprintk format in svc_rdma_accept()
      svcrdma: Further clean-up of svc_rdma_get_inv_rkey()


 include/linux/sunrpc/svc_rdma.h            |    7 --
 net/sunrpc/auth_gss/svcauth_gss.c          |    2 
 net/sunrpc/svc.c                           |   14 ++-
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |    5 -
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c    |   21 ++++-
 net/sunrpc/xprtrdma/svc_rdma_sendto.c      |  116 ++++++++--------------------
 net/sunrpc/xprtrdma/svc_rdma_transport.c   |   94 +++++++----------------
 7 files changed, 94 insertions(+), 165 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3 12/12] xprtrdma: Relocate connection helper functions
From: Chuck Lever @ 2016-11-29 15:53 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: Disentangle connection helpers from RPC-over-RDMA reply
decoding functions.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/rpc_rdma.c  |   34 ----------------------------------
 net/sunrpc/xprtrdma/transport.c |   28 ++++++++++++++++++++++++++++
 net/sunrpc/xprtrdma/xprt_rdma.h |   10 +++-------
 3 files changed, 31 insertions(+), 41 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 01f5cbc..c52e0f2 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -906,28 +906,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 	return fixup_copy_count;
 }
 
-void
-rpcrdma_connect_worker(struct work_struct *work)
-{
-	struct rpcrdma_ep *ep =
-		container_of(work, struct rpcrdma_ep, rep_connect_worker.work);
-	struct rpcrdma_xprt *r_xprt =
-		container_of(ep, struct rpcrdma_xprt, rx_ep);
-	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
-
-	spin_lock_bh(&xprt->transport_lock);
-	if (++xprt->connect_cookie == 0)	/* maintain a reserved value */
-		++xprt->connect_cookie;
-	if (ep->rep_connected > 0) {
-		if (!xprt_test_and_set_connected(xprt))
-			xprt_wake_pending_tasks(xprt, 0);
-	} else {
-		if (xprt_test_and_clear_connected(xprt))
-			xprt_wake_pending_tasks(xprt, -ENOTCONN);
-	}
-	spin_unlock_bh(&xprt->transport_lock);
-}
-
 #if defined(CONFIG_SUNRPC_BACKCHANNEL)
 /* By convention, backchannel calls arrive via rdma_msg type
  * messages, and never populate the chunk lists. This makes
@@ -959,18 +937,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 }
 #endif	/* CONFIG_SUNRPC_BACKCHANNEL */
 
-/*
- * This function is called when an async event is posted to
- * the connection which changes the connection state. All it
- * does at this point is mark the connection up/down, the rpc
- * timers do the rest.
- */
-void
-rpcrdma_conn_func(struct rpcrdma_ep *ep)
-{
-	schedule_delayed_work(&ep->rep_connect_worker, 0);
-}
-
 /* Process received RPC/RDMA messages.
  *
  * Errors must result in the RPC task either being awakened, or
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 545d3fc..534c178 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -219,6 +219,34 @@
 		}
 }
 
+void
+rpcrdma_conn_func(struct rpcrdma_ep *ep)
+{
+	schedule_delayed_work(&ep->rep_connect_worker, 0);
+}
+
+void
+rpcrdma_connect_worker(struct work_struct *work)
+{
+	struct rpcrdma_ep *ep =
+		container_of(work, struct rpcrdma_ep, rep_connect_worker.work);
+	struct rpcrdma_xprt *r_xprt =
+		container_of(ep, struct rpcrdma_xprt, rx_ep);
+	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
+
+	spin_lock_bh(&xprt->transport_lock);
+	if (++xprt->connect_cookie == 0)	/* maintain a reserved value */
+		++xprt->connect_cookie;
+	if (ep->rep_connected > 0) {
+		if (!xprt_test_and_set_connected(xprt))
+			xprt_wake_pending_tasks(xprt, 0);
+	} else {
+		if (xprt_test_and_clear_connected(xprt))
+			xprt_wake_pending_tasks(xprt, -ENOTCONN);
+	}
+	spin_unlock_bh(&xprt->transport_lock);
+}
+
 static void
 xprt_rdma_connect_worker(struct work_struct *work)
 {
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index b8424fa..e35efd4 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -490,6 +490,7 @@ int rpcrdma_ep_create(struct rpcrdma_ep *, struct rpcrdma_ia *,
 				struct rpcrdma_create_data_internal *);
 void rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
 int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
+void rpcrdma_conn_func(struct rpcrdma_ep *ep);
 void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
 
 int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
@@ -549,13 +550,6 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum dma_data_direction,
 }
 
 /*
- * RPC/RDMA connection management calls - xprtrdma/rpc_rdma.c
- */
-void rpcrdma_connect_worker(struct work_struct *);
-void rpcrdma_conn_func(struct rpcrdma_ep *);
-void rpcrdma_reply_handler(struct work_struct *);
-
-/*
  * RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
  */
 
@@ -572,12 +566,14 @@ bool rpcrdma_prepare_send_sges(struct rpcrdma_ia *, struct rpcrdma_req *,
 void rpcrdma_unmap_sges(struct rpcrdma_ia *, struct rpcrdma_req *);
 int rpcrdma_marshal_req(struct rpc_rqst *);
 void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
+void rpcrdma_reply_handler(struct work_struct *work);
 
 /* RPC/RDMA module init - xprtrdma/transport.c
  */
 extern unsigned int xprt_rdma_max_inline_read;
 void xprt_rdma_format_addresses(struct rpc_xprt *xprt, struct sockaddr *sap);
 void xprt_rdma_free_addresses(struct rpc_xprt *xprt);
+void rpcrdma_connect_worker(struct work_struct *work);
 void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq);
 int xprt_rdma_init(void);
 void xprt_rdma_cleanup(void);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 11/12] xprtrdma: Update dprintk in rpcrdma_count_chunks
From: Chuck Lever @ 2016-11-29 15:53 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: offset and handle should be zero-filled, just like in the
chunk encoders.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/rpc_rdma.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index d987c2d..01f5cbc 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -786,7 +786,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
 		ifdebug(FACILITY) {
 			u64 off;
 			xdr_decode_hyper((__be32 *)&seg->rs_offset, &off);
-			dprintk("RPC:       %s: chunk %d@0x%llx:0x%x\n",
+			dprintk("RPC:       %s: chunk %d@0x%016llx:0x%08x\n",
 				__func__,
 				be32_to_cpu(seg->rs_length),
 				(unsigned long long)off,

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 10/12] xprtrdma: Shorten QP access error message
From: Chuck Lever @ 2016-11-29 15:53 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: The convention for this type of warning message is not to
show the function name or "RPC:       ".

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/verbs.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9f66fd8..11d0774 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -103,9 +103,9 @@
 {
 	struct rpcrdma_ep *ep = context;
 
-	pr_err("RPC:       %s: %s on device %s ep %p\n",
-	       __func__, ib_event_msg(event->event),
-		event->device->name, context);
+	pr_err("rpcrdma: %s on device %s ep %p\n",
+	       ib_event_msg(event->event), event->device->name, context);
+
 	if (ep->rep_connected == 1) {
 		ep->rep_connected = -EIO;
 		rpcrdma_conn_func(ep);

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 09/12] xprtrdma: Squelch "max send, max recv" messages at connect time
From: Chuck Lever @ 2016-11-29 15:53 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: This message was intended to be a dprintk, as it is on the
server-side.

Fixes: 87cfb9a0c85c ('xprtrdma: Client-side support for ...')
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/verbs.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cbb1885..9f66fd8 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -223,8 +223,8 @@
 		cdata->inline_rsize = rsize;
 	if (wsize < cdata->inline_wsize)
 		cdata->inline_wsize = wsize;
-	pr_info("rpcrdma: max send %u, max recv %u\n",
-		cdata->inline_wsize, cdata->inline_rsize);
+	dprintk("RPC:       %s: max send %u, max recv %u\n",
+		__func__, cdata->inline_wsize, cdata->inline_rsize);
 	rpcrdma_set_max_header_sizes(r_xprt);
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 08/12] xprtrdma: Update documenting comment
From: Chuck Lever @ 2016-11-29 15:53 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: If reset fails, FRMRs are no longer abandoned, rather
they are released immediately. Update the comment to reflect this.

Fixes: 2ffc871a574d ('xprtrdma: Release orphaned MRs immediately')
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 900dc40..47bed53 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -171,10 +171,6 @@
 }
 
 /* Reset of a single FRMR. Generate a fresh rkey by replacing the MR.
- *
- * There's no recovery if this fails. The FRMR is abandoned, but
- * remains in rb_all. It will be cleaned up when the transport is
- * destroyed.
  */
 static void
 frwr_op_recover_mr(struct rpcrdma_mw *mw)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 07/12] xprtrdma: Refactor FRMR invalidation
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Clean up: After some recent updates, clarifications can be made to
the FRMR invalidation logic.

- Both the remote and local invalidation case mark the frmr INVALID,
  so make that a common path.

- Manage the WR list more "tastefully" by replacing the conditional
  that discriminates between the list head and ->next pointers.

- Use mw->mw_handle in all cases, since that has the same value as
  f->fr_mr->rkey, and is already in cache.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c |   57 +++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index e99bf61..900dc40 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -457,26 +457,6 @@
 	return -ENOTCONN;
 }
 
-static struct ib_send_wr *
-__frwr_prepare_linv_wr(struct rpcrdma_mw *mw)
-{
-	struct rpcrdma_frmr *f = &mw->frmr;
-	struct ib_send_wr *invalidate_wr;
-
-	dprintk("RPC:       %s: invalidating frmr %p\n", __func__, f);
-
-	f->fr_state = FRMR_IS_INVALID;
-	invalidate_wr = &f->fr_invwr;
-
-	memset(invalidate_wr, 0, sizeof(*invalidate_wr));
-	f->fr_cqe.done = frwr_wc_localinv;
-	invalidate_wr->wr_cqe = &f->fr_cqe;
-	invalidate_wr->opcode = IB_WR_LOCAL_INV;
-	invalidate_wr->ex.invalidate_rkey = f->fr_mr->rkey;
-
-	return invalidate_wr;
-}
-
 /* Invalidate all memory regions that were registered for "req".
  *
  * Sleeps until it is safe for the host CPU to access the
@@ -487,7 +467,7 @@
 static void
 frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
 {
-	struct ib_send_wr *invalidate_wrs, *pos, *prev, *bad_wr;
+	struct ib_send_wr *first, **prev, *last, *bad_wr;
 	struct rpcrdma_rep *rep = req->rl_reply;
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct rpcrdma_mw *mw, *tmp;
@@ -503,23 +483,28 @@
 	 */
 	f = NULL;
 	count = 0;
-	invalidate_wrs = pos = prev = NULL;
+	prev = &first;
 	list_for_each_entry(mw, &req->rl_registered, mw_list) {
+		mw->frmr.fr_state = FRMR_IS_INVALID;
+
 		if ((rep->rr_wc_flags & IB_WC_WITH_INVALIDATE) &&
-		    (mw->mw_handle == rep->rr_inv_rkey)) {
-			mw->frmr.fr_state = FRMR_IS_INVALID;
+		    (mw->mw_handle == rep->rr_inv_rkey))
 			continue;
-		}
 
-		pos = __frwr_prepare_linv_wr(mw);
+		f = &mw->frmr;
+		dprintk("RPC:       %s: invalidating frmr %p\n",
+			__func__, f);
+
+		f->fr_cqe.done = frwr_wc_localinv;
+		last = &f->fr_invwr;
+		memset(last, 0, sizeof(*last));
+		last->wr_cqe = &f->fr_cqe;
+		last->opcode = IB_WR_LOCAL_INV;
+		last->ex.invalidate_rkey = mw->mw_handle;
 		count++;
 
-		if (!invalidate_wrs)
-			invalidate_wrs = pos;
-		else
-			prev->next = pos;
-		prev = pos;
-		f = &mw->frmr;
+		*prev = last;
+		prev = &last->next;
 	}
 	if (!f)
 		goto unmap;
@@ -528,7 +513,7 @@
 	 * last WR in the chain completes, all WRs in the chain
 	 * are complete.
 	 */
-	f->fr_invwr.send_flags = IB_SEND_SIGNALED;
+	last->send_flags = IB_SEND_SIGNALED;
 	f->fr_cqe.done = frwr_wc_localinv_wake;
 	reinit_completion(&f->fr_linv_done);
 
@@ -543,7 +528,7 @@
 	 * unless ri_id->qp is a valid pointer.
 	 */
 	r_xprt->rx_stats.local_inv_needed++;
-	rc = ib_post_send(ia->ri_id->qp, invalidate_wrs, &bad_wr);
+	rc = ib_post_send(ia->ri_id->qp, first, &bad_wr);
 	if (rc)
 		goto reset_mrs;
 
@@ -554,7 +539,7 @@
 	 */
 unmap:
 	list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
-		dprintk("RPC:       %s: unmapping frmr %p\n",
+		dprintk("RPC:       %s: DMA unmapping frmr %p\n",
 			__func__, &mw->frmr);
 		list_del_init(&mw->mw_list);
 		ib_dma_unmap_sg(ia->ri_device,
@@ -572,7 +557,7 @@
 	 */
 	list_for_each_entry(mw, &req->rl_registered, mw_list) {
 		f = &mw->frmr;
-		if (mw->frmr.fr_mr->rkey == bad_wr->ex.invalidate_rkey) {
+		if (mw->mw_handle == bad_wr->ex.invalidate_rkey) {
 			__frwr_reset_mr(ia, mw);
 			bad_wr = bad_wr->next;
 		}

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 06/12] xprtrdma: Avoid calls to ro_unmap_safe()
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Micro-optimization: Most of the time, calls to ro_unmap_safe are
expensive no-ops. Call only when there is work to do.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/transport.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index ed5e285..545d3fc 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -621,7 +621,8 @@
 
 	dprintk("RPC:       %s: called on 0x%p\n", __func__, req->rl_reply);
 
-	ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
+	if (unlikely(!list_empty(&req->rl_registered)))
+		ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
 	rpcrdma_unmap_sges(ia, req);
 	rpcrdma_buffer_put(req);
 }
@@ -657,7 +658,8 @@
 	int rc = 0;
 
 	/* On retransmit, remove any previously registered chunks */
-	r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
+	if (unlikely(!list_empty(&req->rl_registered)))
+		r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
 
 	rc = rpcrdma_marshal_req(rqst);
 	if (rc < 0)

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 05/12] xprtrdma: Address coverity complaint about wait_for_completion()
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

> ** CID 114101:  Error handling issues  (CHECKED_RETURN)
> /net/sunrpc/xprtrdma/verbs.c: 355 in rpcrdma_create_id()

Commit 5675add36e76 ("RPC/RDMA: harden connection logic against
missing/late rdma_cm upcalls.") replaced wait_for_completion() calls
with these two call sites.

The original wait_for_completion() calls were added in the initial
commit of verbs.c, which was commit c56c65fb67d6 ("RPCRDMA: rpc rdma
verbs interface implementation"), but these returned void.

rpcrdma_create_id() is called by the RDMA connect worker, which
probably won't ever be interrupted. It is also called by
rpcrdma_ia_open which is in the synchronous mount path, and ^C is
possible there.

Add a bit of logic at those two call sites to return if the waits
return ERESTARTSYS.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/verbs.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 451f5f2..cbb1885 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -331,6 +331,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
 rpcrdma_create_id(struct rpcrdma_xprt *xprt,
 			struct rpcrdma_ia *ia, struct sockaddr *addr)
 {
+	unsigned long wtimeout = msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1;
 	struct rdma_cm_id *id;
 	int rc;
 
@@ -352,8 +353,12 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
 			__func__, rc);
 		goto out;
 	}
-	wait_for_completion_interruptible_timeout(&ia->ri_done,
-				msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+	rc = wait_for_completion_interruptible_timeout(&ia->ri_done, wtimeout);
+	if (rc < 0) {
+		dprintk("RPC:       %s: wait() exited: %i\n",
+			__func__, rc);
+		goto out;
+	}
 
 	/* FIXME:
 	 * Until xprtrdma supports DEVICE_REMOVAL, the provider must
@@ -376,8 +381,12 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
 			__func__, rc);
 		goto put;
 	}
-	wait_for_completion_interruptible_timeout(&ia->ri_done,
-				msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
+	rc = wait_for_completion_interruptible_timeout(&ia->ri_done, wtimeout);
+	if (rc < 0) {
+		dprintk("RPC:       %s: wait() exited: %i\n",
+			__func__, rc);
+		goto put;
+	}
 	rc = ia->ri_async_rc;
 	if (rc)
 		goto put;

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 04/12] SUNRPC: Proper metric accounting when RPC is not transmitted
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

I noticed recently that during an xfstests on a krb5i mount, the
retransmit count for certain operations had gone negative, and the
backlog value became unreasonably large. I recall that Andy has
pointed this out to me in the past.

When call_refresh fails to find a valid credential for an RPC, the
RPC exits immediately without sending anything on the wire. This
leaves rq_ntrans, rq_xtime, and rq_rtt set to zero.

The solution for om_queue is to not add the to RPC's running backlog
queue total whenever rq_xtime is zero.

For om_ntrans, it's a bit more difficult. A zero rq_ntrans causes
om_ops to become larger than om_ntrans. The design of the RPC
metrics API assumes that ntrans will always be equal to or larger
than the ops count. The result is that when an RPC fails to find
credentials, the RPC operation's reported retransmit count, which is
computed in user space as the difference between ops and ntrans,
goes negative.

Ideally the kernel API should report a separate retransmit and
"exited before initial transmission" metric, so that user space can
sort out the difference properly.

To avoid kernel API changes and changes to the way rq_ntrans is used
when performing transport locking, account for untransmitted RPCs
so that om_ntrans keeps up with om_ops: always add one or more to
om_ntrans.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/stats.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index 2ecb994..caeb01a 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -157,15 +157,17 @@ void rpc_count_iostats_metrics(const struct rpc_task *task,
 	spin_lock(&op_metrics->om_lock);
 
 	op_metrics->om_ops++;
-	op_metrics->om_ntrans += req->rq_ntrans;
+	/* kernel API: om_ops must never become larger than om_ntrans */
+	op_metrics->om_ntrans += max(req->rq_ntrans, 1);
 	op_metrics->om_timeouts += task->tk_timeouts;
 
 	op_metrics->om_bytes_sent += req->rq_xmit_bytes_sent;
 	op_metrics->om_bytes_recv += req->rq_reply_bytes_recvd;
 
-	delta = ktime_sub(req->rq_xtime, task->tk_start);
-	op_metrics->om_queue = ktime_add(op_metrics->om_queue, delta);
-
+	if (ktime_to_ns(req->rq_xtime)) {
+		delta = ktime_sub(req->rq_xtime, task->tk_start);
+		op_metrics->om_queue = ktime_add(op_metrics->om_queue, delta);
+	}
 	op_metrics->om_rtt = ktime_add(op_metrics->om_rtt, req->rq_rtt);
 
 	delta = ktime_sub(now, task->tk_start);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 03/12] xprtrdma: Support for SG_GAP devices
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Some devices (such as the Mellanox CX-4) can register, under a
single R_key, a set of memory regions that are not contiguous. When
this is done, all the segments in a Reply list, say, can then be
invalidated in a single LocalInv Work Request (or via Remote
Invalidation, which can invalidate exactly one R_key when completing
a Receive).

This means a single FastReg WR is used to register, and one or zero
LocalInv WRs can invalidate, the memory involved with RDMA transfers
on behalf of an RPC.

In addition, xprtrdma constructs some Reply chunks from three or
more segments. By registering them with SG_GAP, only one segment
is needed for the Reply chunk, allowing the whole chunk to be
invalidated remotely.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c  |   20 +++++++++++++-------
 net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index adbf52c..e99bf61 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -101,7 +101,7 @@
 	struct rpcrdma_frmr *f = &r->frmr;
 	int rc;
 
-	f->fr_mr = ib_alloc_mr(ia->ri_pd, IB_MR_TYPE_MEM_REG, depth);
+	f->fr_mr = ib_alloc_mr(ia->ri_pd, ia->ri_mrtype, depth);
 	if (IS_ERR(f->fr_mr))
 		goto out_mr_err;
 
@@ -157,7 +157,7 @@
 		return rc;
 	}
 
-	f->fr_mr = ib_alloc_mr(ia->ri_pd, IB_MR_TYPE_MEM_REG,
+	f->fr_mr = ib_alloc_mr(ia->ri_pd, ia->ri_mrtype,
 			       ia->ri_max_frmr_depth);
 	if (IS_ERR(f->fr_mr)) {
 		pr_warn("rpcrdma: ib_alloc_mr status %ld, frwr %p orphaned\n",
@@ -210,11 +210,16 @@
 frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
 	     struct rpcrdma_create_data_internal *cdata)
 {
+	struct ib_device_attr *attrs = &ia->ri_device->attrs;
 	int depth, delta;
 
+	ia->ri_mrtype = IB_MR_TYPE_MEM_REG;
+	if (attrs->device_cap_flags & IB_DEVICE_SG_GAPS_REG)
+		ia->ri_mrtype = IB_MR_TYPE_SG_GAPS;
+
 	ia->ri_max_frmr_depth =
 			min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
-			      ia->ri_device->attrs.max_fast_reg_page_list_len);
+			      attrs->max_fast_reg_page_list_len);
 	dprintk("RPC:       %s: device's max FR page list len = %u\n",
 		__func__, ia->ri_max_frmr_depth);
 
@@ -241,8 +246,8 @@
 	}
 
 	ep->rep_attr.cap.max_send_wr *= depth;
-	if (ep->rep_attr.cap.max_send_wr > ia->ri_device->attrs.max_qp_wr) {
-		cdata->max_requests = ia->ri_device->attrs.max_qp_wr / depth;
+	if (ep->rep_attr.cap.max_send_wr > attrs->max_qp_wr) {
+		cdata->max_requests = attrs->max_qp_wr / depth;
 		if (!cdata->max_requests)
 			return -EINVAL;
 		ep->rep_attr.cap.max_send_wr = cdata->max_requests *
@@ -348,6 +353,7 @@
 	    int nsegs, bool writing, struct rpcrdma_mw **out)
 {
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+	bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS;
 	struct rpcrdma_mw *mw;
 	struct rpcrdma_frmr *frmr;
 	struct ib_mr *mr;
@@ -383,8 +389,8 @@
 
 		++seg;
 		++i;
-
-		/* Check for holes */
+		if (holes_ok)
+			continue;
 		if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
 		    offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
 			break;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index f6ae1b2..b8424fa 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -75,6 +75,7 @@ struct rpcrdma_ia {
 	unsigned int		ri_max_inline_write;
 	unsigned int		ri_max_inline_read;
 	bool			ri_reminv_expected;
+	enum ib_mr_type		ri_mrtype;
 	struct ib_qp_attr	ri_qp_attr;
 	struct ib_qp_init_attr	ri_qp_init_attr;
 };

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 02/12] xprtrdma: Make FRWR send queue entry accounting more accurate
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

Verbs providers may perform house-keeping on the Send Queue during
each signaled send completion. It is necessary therefore for a verbs
consumer (like xprtrdma) to occasionally force a signaled send
completion if it runs unsignaled most of the time.

xprtrdma does not require signaled completions for Send or FastReg
Work Requests, but does signal some LocalInv Work Requests. To
ensure that Send Queue house-keeping can run before the Send Queue
is more than half-consumed, xprtrdma forces a signaled completion
on occasion by counting the number of Send Queue Entries it
consumes. It currently does this by counting each ib_post_send as
one Entry.

Commit c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
introduced the ability for frwr_op_unmap_sync to post more than one
Work Request with a single post_send. Thus the underlying assumption
of one Send Queue Entry per ib_post_send is no longer true.

Also, FastReg Work Requests are currently never signaled. They
should be signaled once in a while, just as Send is, to keep the
accounting of consumed SQEs accurate.

While we're here, convert the CQCOUNT macros to the currently
preferred kernel coding style, which is inline functions.

Fixes: c9918ff56dfb ("xprtrdma: Add ro_unmap_sync method for FRWR")
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c  |   13 ++++++++++---
 net/sunrpc/xprtrdma/verbs.c     |   10 ++--------
 net/sunrpc/xprtrdma/xprt_rdma.h |   20 ++++++++++++++++++--
 3 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 26b26be..adbf52c 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -421,7 +421,7 @@
 			 IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
 			 IB_ACCESS_REMOTE_READ;
 
-	DECR_CQCOUNT(&r_xprt->rx_ep);
+	rpcrdma_set_signaled(&r_xprt->rx_ep, &reg_wr->wr);
 	rc = ib_post_send(ia->ri_id->qp, &reg_wr->wr, &bad_wr);
 	if (rc)
 		goto out_senderr;
@@ -486,7 +486,7 @@
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct rpcrdma_mw *mw, *tmp;
 	struct rpcrdma_frmr *f;
-	int rc;
+	int count, rc;
 
 	dprintk("RPC:       %s: req %p\n", __func__, req);
 
@@ -496,6 +496,7 @@
 	 * a single ib_post_send() call.
 	 */
 	f = NULL;
+	count = 0;
 	invalidate_wrs = pos = prev = NULL;
 	list_for_each_entry(mw, &req->rl_registered, mw_list) {
 		if ((rep->rr_wc_flags & IB_WC_WITH_INVALIDATE) &&
@@ -505,6 +506,7 @@
 		}
 
 		pos = __frwr_prepare_linv_wr(mw);
+		count++;
 
 		if (!invalidate_wrs)
 			invalidate_wrs = pos;
@@ -523,7 +525,12 @@
 	f->fr_invwr.send_flags = IB_SEND_SIGNALED;
 	f->fr_cqe.done = frwr_wc_localinv_wake;
 	reinit_completion(&f->fr_linv_done);
-	INIT_CQCOUNT(&r_xprt->rx_ep);
+
+	/* Initialize CQ count, since there is always a signaled
+	 * WR being posted here.  The new cqcount depends on how
+	 * many SQEs are about to be consumed.
+	 */
+	rpcrdma_init_cqcount(&r_xprt->rx_ep, count);
 
 	/* Transport disconnect drains the receive CQ before it
 	 * replaces the QP. The RPC reply handler won't call us
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ec74289..451f5f2 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -532,7 +532,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
 	ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 - 1;
 	if (ep->rep_cqinit <= 2)
 		ep->rep_cqinit = 0;	/* always signal? */
-	INIT_CQCOUNT(ep);
+	rpcrdma_init_cqcount(ep, 0);
 	init_waitqueue_head(&ep->rep_connect_wait);
 	INIT_DELAYED_WORK(&ep->rep_connect_worker, rpcrdma_connect_worker);
 
@@ -1311,13 +1311,7 @@ struct rpcrdma_regbuf *
 	dprintk("RPC:       %s: posting %d s/g entries\n",
 		__func__, send_wr->num_sge);
 
-	if (DECR_CQCOUNT(ep) > 0)
-		send_wr->send_flags = 0;
-	else { /* Provider must take a send completion every now and then */
-		INIT_CQCOUNT(ep);
-		send_wr->send_flags = IB_SEND_SIGNALED;
-	}
-
+	rpcrdma_set_signaled(ep, send_wr);
 	rc = ib_post_send(ia->ri_id->qp, send_wr, &send_wr_fail);
 	if (rc)
 		goto out_postsend_err;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 6e1bba3..f6ae1b2 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -95,8 +95,24 @@ struct rpcrdma_ep {
 	struct delayed_work	rep_connect_worker;
 };
 
-#define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
-#define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)
+static inline void
+rpcrdma_init_cqcount(struct rpcrdma_ep *ep, int count)
+{
+	atomic_set(&ep->rep_cqcount, ep->rep_cqinit - count);
+}
+
+/* To update send queue accounting, provider must take a
+ * send completion every now and then.
+ */
+static inline void
+rpcrdma_set_signaled(struct rpcrdma_ep *ep, struct ib_send_wr *send_wr)
+{
+	send_wr->send_flags = 0;
+	if (unlikely(atomic_sub_return(1, &ep->rep_cqcount) <= 0)) {
+		rpcrdma_init_cqcount(ep, 0);
+		send_wr->send_flags = IB_SEND_SIGNALED;
+	}
+}
 
 /* Pre-allocate extra Work Requests for handling backward receives
  * and sends. This is a fixed value because the Work Queues are

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 01/12] xprtrdma: Cap size of callback buffer resources
From: Chuck Lever @ 2016-11-29 15:52 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129154845.23061.29385.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>

When the inline threshold size is set to large values (say, 32KB)
any NFSv4.1 CB request from the server gets a reply with status
NFS4ERR_RESOURCE.

Looks like there are some upper layer assumptions about the maximum
size of a reply (for example, in process_op). Cap the size of the
NFSv4 client's reply resources at a page.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/backchannel.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/backchannel.c b/net/sunrpc/xprtrdma/backchannel.c
index 2c472e1..24fedd4 100644
--- a/net/sunrpc/xprtrdma/backchannel.c
+++ b/net/sunrpc/xprtrdma/backchannel.c
@@ -55,7 +55,8 @@ static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
 	if (IS_ERR(rb))
 		goto out_fail;
 	req->rl_sendbuf = rb;
-	xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base, size);
+	xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base,
+		     min_t(size_t, size, PAGE_SIZE));
 	rpcrdma_set_xprtdata(rqst, req);
 	return 0;
 
@@ -191,6 +192,7 @@ size_t xprt_rdma_bc_maxpayload(struct rpc_xprt *xprt)
 	size_t maxmsg;
 
 	maxmsg = min_t(unsigned int, cdata->inline_rsize, cdata->inline_wsize);
+	maxmsg = min_t(unsigned int, maxmsg, PAGE_SIZE);
 	return maxmsg - RPCRDMA_HDRLEN_MIN;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 00/12] client-side NFS/RDMA patches proposed for v4.10
From: Chuck Lever @ 2016-11-29 15:51 UTC (permalink / raw)
  To: anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA

Hi Anna-

These are ready for you to consider for v4.10. This patch series makes
the following changes:

- Support for devices that support SG_GAP
- Several bug fixes
- A number of clean-ups and optimizations

SG_GAP devices allow the client transport implementation to register
non-contiguous memory regions using one offset and handle. Scatter
and gather from these regions are then managed entirely in hardware.

Normally a Reply chunk for a READDIR or GETACL is built of multiple
RDMA segments. Now, when our client wants to build a Reply chunk out
of the xdr_buf's head, page list, and tail, it appears to the server
as a single contiguous RDMA segment if the device supports SG_GAP.

Instead of a separate Write WR for each of several RDMA segments,
the server can then post a single Write WR to transmit the whole RPC
Reply (assuming the Reply is smaller than the server RNIC's max_sge)
and can invalidate the whole Reply chunk remotely.

SG_GAP is currently supported by mlx5 devices when using the FRWR
registration mechanism.


Available in the "nfs-rdma-for-4.10" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git


Or for browsing:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfs-rdma-for-4.10


Changes since v2:
- Rebased on v4.9-rc7


Changes since v1:
- Rebased on v4.9-rc5
- "xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect" was merged
- "xprtrdma: Simplify synopsis of rpcrdma_ep_connect()" was postponed
- Clean-ups re-ordered to the end of the series
- Editorial changes to patch descriptions

---

Chuck Lever (12):
      xprtrdma: Cap size of callback buffer resources
      xprtrdma: Make FRWR send queue entry accounting more accurate
      xprtrdma: Support for SG_GAP devices
      SUNRPC: Proper metric accounting when RPC is not transmitted
      xprtrdma: Address coverity complaint about wait_for_completion()
      xprtrdma: Avoid calls to ro_unmap_safe()
      xprtrdma: Refactor FRMR invalidation
      xprtrdma: Update documenting comment
      xprtrdma: Squelch "max send, max recv" messages at connect time
      xprtrdma: Shorten QP access error message
      xprtrdma: Update dprintk in rpcrdma_count_chunks
      xprtrdma: Relocate connection helper functions


 net/sunrpc/stats.c                |   10 ++--
 net/sunrpc/xprtrdma/backchannel.c |    4 +-
 net/sunrpc/xprtrdma/frwr_ops.c    |   94 +++++++++++++++++--------------------
 net/sunrpc/xprtrdma/rpc_rdma.c    |   36 --------------
 net/sunrpc/xprtrdma/transport.c   |   34 +++++++++++++
 net/sunrpc/xprtrdma/verbs.c       |   37 ++++++++-------
 net/sunrpc/xprtrdma/xprt_rdma.h   |   31 +++++++++---
 7 files changed, 128 insertions(+), 118 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [bug report] net/mlx5: E-Switch, Add control for inline mode
From: Roi Dayan @ 2016-11-29 15:06 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: roid-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161129105258.GA31063@mwanda>



On 29/11/2016 12:52, Dan Carpenter wrote:
> Hello Roi Dayan,
>
> The patch bffaa916588e: "net/mlx5: E-Switch, Add control for inline
> mode" from Nov 22, 2016, leads to the following static checker
> warning:
>
> 	drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:950 mlx5_eswitch_inline_mode_get()
> 	error: uninitialized symbol 'prev_mlx5_mode'.
>
> drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
>     932  int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
>     933  {
>     934          struct mlx5_core_dev *dev = esw->dev;
>     935          int vport;
>     936          u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
>                     ^^^^^^^^^^^^^^
>     937
>     938          if (!MLX5_CAP_GEN(dev, vport_group_manager))
>     939                  return -EOPNOTSUPP;
>     940
>     941          if (esw->mode == SRIOV_NONE)
>     942                  return -EOPNOTSUPP;
>     943
>     944          if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
>     945              MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
>     946                  return -EOPNOTSUPP;
>     947
>     948          for (vport = 1; vport <= nvfs; vport++) {
>     949                  mlx5_query_nic_vport_min_inline(dev, vport, &mlx5_mode);
>     950                  if (vport > 1 && prev_mlx5_mode != mlx5_mode)
>                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> No idea what prev_mlx5_mode is supposed to be for.
>
>     951                          return -EINVAL;
>     952                  prev_mlx5_mode = mlx5_mode;
>     953          }
>     954
>     955          *mode = mlx5_mode;
>     956          return 0;
>     957  }
>
> regards,
> dan carpenter

Hi Dan,

I really didn't see this. sparse and smatch didn't catch this. I use 
FC24 and gcc 6.2.1 for compilation and it doesn't give me any warning.
Did another check with an old rhel machine with gcc 4.4.6 and I got a 
warning about it there.
Can you tell me which tool you used?

This is a false positive because of the first check if vport > 1, we 
actually check prev_mlx5_mode from the second for-loop
iteration and on the first loop iteration we set a value (in line 952).
It is used to check if all vports are set with the same inline mode.

Thanks,
Roi

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] SRP transport: Move queuecommand() wait code to SCSI core
From: Bart Van Assche @ 2016-11-29 14:59 UTC (permalink / raw)
  To: Max Gurtovoy, James Bottomley, Martin K. Petersen
  Cc: Doug Ledford, Christoph Hellwig, Sagi Grimberg,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <0187d160-5985-32c4-95c0-466c4f09bdf9@mellanox.com>

On 11/25/16 15:21, Max Gurtovoy wrote:
> On 11/23/2016 2:17 AM, Bart Van Assche wrote:
>> +
>> +/**
>> + * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
>> + * @shost: SCSI host pointer.
>
> the arg is sdev.

Good catch.

Martin, do you want me to repost this patch or do you want me to post a 
fix-up patch?

Bart.


^ permalink raw reply

* Re: [PATCH] IB/iser: extended CDB support
From: Max Gurtovoy @ 2016-11-29 13:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vladimir Neyelov, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sagi-NQWnxTmZq1alnMjI0IkVqw
In-Reply-To: <20161129080232.GA11368-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>



On 11/29/2016 10:02 AM, Christoph Hellwig wrote:
> On Tue, Nov 29, 2016 at 09:43:12AM +0200, Max Gurtovoy wrote:
>> Sagi/Christopth,
>>
>> any thoughts on this onw ?
>>
>> On 11/23/2016 10:52 AM, Vladimir Neyelov wrote:
>>> Support of vendor specific CDBs in iser.
>
> This command surely is wrong.
>
> Except for that: what's the use case?  The only use case for gigantic
> CDBs in iSCSI was the T10 OSD code, which hopefully is on it's way out.
>
> Or do you just need 32 byte CDBs, in whіch case things could probably
> be done much simpler.
>

we needed to support 22 byte CDB (in the case we encountered the 
problem) but we wanted to be aligned with the tcp implementation that 
support SCSI_MAX_VARLEN_CDB_SIZE.
notice that on the wire we send only the actual CDB size and not all the 
SCSI_MAX_VARLEN_CDB_SIZE.
I guess you're concern about allocating extra 244B for each task, right ?
AFAIK we didn't see difference in perf with this patch but it worth to 
re-check it and share numbers with the community.

regarding the simplicity, I guess the support for 32B CDB will be pretty 
much similar to this commit, unless you see some simpler solution.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] IB/rxe: Wait for tasklets to finish before tearing down QP
From: Yuval Shaia @ 2016-11-29 12:49 UTC (permalink / raw)
  To: Andrew Boyer
  Cc: monis-VPRAkNaXOzVWk0Htik3J/w, yonatanc-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1480343844-8381-2-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>

On Mon, Nov 28, 2016 at 09:37:23AM -0500, Andrew Boyer wrote:
> The system may crash when a malformed request is received and
> the error is detected by the responder.
> 
> NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000
> NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip>
> 
> The responder generates a receive error on node B since the incoming
> SEND is oversized. If the client tears down the QP before the responder
> or the completer finish running, a page fault may occur.
> 
> The fix makes the destroy operation spin until the tasks complete, which
> appears to be original intent of the design.
> 
> Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
> ---
>  drivers/infiniband/sw/rxe/rxe_task.c | 19 +++++++++++++++++++
>  drivers/infiniband/sw/rxe/rxe_task.h |  1 +
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
> index 1e19bf8..1e9a28f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.c
> +++ b/drivers/infiniband/sw/rxe/rxe_task.c
> @@ -121,6 +121,7 @@ int rxe_init_task(void *obj, struct rxe_task *task,
>  	task->arg	= arg;
>  	task->func	= func;
>  	snprintf(task->name, sizeof(task->name), "%s", name);
> +	task->destroyed	= false;
>  
>  	tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
>  
> @@ -132,11 +133,29 @@ int rxe_init_task(void *obj, struct rxe_task *task,
>  
>  void rxe_cleanup_task(struct rxe_task *task)
>  {
> +	unsigned long flags;
> +	bool idle = false;

The above initialization is not needed

Besides that:
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

> +
> +	/*
> +	 * Mark the task, then wait for it to finish. It might be
> +	 * running in a non-tasklet (direct call) context.
> +	 */
> +	task->destroyed = true;
> +
> +	do {
> +		spin_lock_irqsave(&task->state_lock, flags);
> +		idle = (task->state == TASK_STATE_START);
> +		spin_unlock_irqrestore(&task->state_lock, flags);
> +	} while (!idle);
> +
>  	tasklet_kill(&task->tasklet);
>  }
>  
>  void rxe_run_task(struct rxe_task *task, int sched)
>  {
> +	if (task->destroyed)
> +		return;
> +
>  	if (sched)
>  		tasklet_schedule(&task->tasklet);
>  	else
> diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
> index d14aa6d..08ff42d 100644
> --- a/drivers/infiniband/sw/rxe/rxe_task.h
> +++ b/drivers/infiniband/sw/rxe/rxe_task.h
> @@ -54,6 +54,7 @@ struct rxe_task {
>  	int			(*func)(void *arg);
>  	int			ret;
>  	char			name[16];
> +	bool			destroyed;
>  };
>  
>  /*
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [bug report] net/mlx5: E-Switch, Add control for inline mode
From: Dan Carpenter @ 2016-11-29 10:52 UTC (permalink / raw)
  To: roid-VPRAkNaXOzVWk0Htik3J/w; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello Roi Dayan,

The patch bffaa916588e: "net/mlx5: E-Switch, Add control for inline
mode" from Nov 22, 2016, leads to the following static checker
warning:

	drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:950 mlx5_eswitch_inline_mode_get()
	error: uninitialized symbol 'prev_mlx5_mode'.

drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
   932  int mlx5_eswitch_inline_mode_get(struct mlx5_eswitch *esw, int nvfs, u8 *mode)
   933  {
   934          struct mlx5_core_dev *dev = esw->dev;
   935          int vport;
   936          u8 prev_mlx5_mode, mlx5_mode = MLX5_INLINE_MODE_L2;
                   ^^^^^^^^^^^^^^
   937  
   938          if (!MLX5_CAP_GEN(dev, vport_group_manager))
   939                  return -EOPNOTSUPP;
   940  
   941          if (esw->mode == SRIOV_NONE)
   942                  return -EOPNOTSUPP;
   943  
   944          if (MLX5_CAP_ETH(dev, wqe_inline_mode) !=
   945              MLX5_CAP_INLINE_MODE_VPORT_CONTEXT)
   946                  return -EOPNOTSUPP;
   947  
   948          for (vport = 1; vport <= nvfs; vport++) {
   949                  mlx5_query_nic_vport_min_inline(dev, vport, &mlx5_mode);
   950                  if (vport > 1 && prev_mlx5_mode != mlx5_mode)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
No idea what prev_mlx5_mode is supposed to be for.

   951                          return -EINVAL;
   952                  prev_mlx5_mode = mlx5_mode;
   953          }
   954  
   955          *mode = mlx5_mode;
   956          return 0;
   957  }

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Sergei Shtylyov @ 2016-11-29  9:28 UTC (permalink / raw)
  To: Souptick Joarder, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20161129065931.GA3245@gnr743-HP-ZBook-15>

Hello.

On 11/29/2016 9:59 AM, Souptick Joarder wrote:

> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc().

    One more nit since you're going to send it again...

> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> v2:
>   - Address comment from sergei
>     Alignment was not proper
>
>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..96cdf9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>  	if (!mailbox)
>  		return ERR_PTR(-ENOMEM);
>
> -	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -				      &mailbox->dma);
> +	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +				       &mailbox->dma);
>  	if (!mailbox->buf) {
>  		kfree(mailbox);
>  		return ERR_PTR(-ENOMEM);
>  	}
>
> -	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>

    Remove empty line here -- one is enough.

>  	return mailbox;
>  }

MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  8:25 UTC (permalink / raw)
  To: Sergei Shtylyov, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Rameshwar Sahu
In-Reply-To: <CAFqt6zYzUyZ__7iVH3LwhGzeU7RN59-md4rcmwngRZW_ahDeAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 1847 bytes --]

While sending the patch using "mutt" , alignment is getting changed.
In original patch file alignment is proper and
I have tested the patch with "checkpatch.pl". Attach is the original patch file.

I am using mutt -H < patch-file>

Any suggestion will be helpful?

On Tue, Nov 29, 2016 at 12:55 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Please ignore this v2 patch.
>
> On Tue, Nov 29, 2016 at 12:29 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
>> replaced by pci_pool_zalloc().
>>
>> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> ---
>> v2:
>>   - Address comment from sergei
>>     Alignment was not proper
>>
>>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> index e36bebc..96cdf9a 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
>> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>>         if (!mailbox)
>>                 return ERR_PTR(-ENOMEM);
>>
>> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
>> -                                     &mailbox->dma);
>> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
>> +                                      &mailbox->dma);
>>         if (!mailbox->buf) {
>>                 kfree(mailbox);
>>                 return ERR_PTR(-ENOMEM);
>>         }
>>
>> -       memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>>
>>         return mailbox;
>>  }
>> --
>> 1.9.1
>>

Regards
Souptick

[-- Attachment #2: 0001-ethernet-mellanox-mlx4-Replace-pci_pool_alloc-by-pci.patch --]
[-- Type: text/x-patch, Size: 1290 bytes --]

From 3621eac92d271332ecc95b6f09ce25b7c6846137 Mon Sep 17 00:00:00 2001
From: Souptick Joarder <jrdr.linux@gmail.com>
Date: Tue, 29 Nov 2016 12:15:39 +0530
Subject: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by
 pci_pool_zalloc

In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc().

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
---
v2:
  - Address comment from sergei
    Alignment was not proper

 drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e36bebc..96cdf9a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
 	if (!mailbox)
 		return ERR_PTR(-ENOMEM);
 
-	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
-				      &mailbox->dma);
+	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+				       &mailbox->dma);
 	if (!mailbox->buf) {
 		kfree(mailbox);
 		return ERR_PTR(-ENOMEM);
 	}
 
-	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
 
 	return mailbox;
 }
-- 
1.9.1


^ permalink raw reply related

* Re: [PATCH] IB/iser: extended CDB support
From: Christoph Hellwig @ 2016-11-29  8:02 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Vladimir Neyelov, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sagi-NQWnxTmZq1alnMjI0IkVqw, Christoph Hellwig
In-Reply-To: <04534a11-df1c-eff7-14c5-65c1050f3393-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

On Tue, Nov 29, 2016 at 09:43:12AM +0200, Max Gurtovoy wrote:
> Sagi/Christopth,
> 
> any thoughts on this onw ?
> 
> On 11/23/2016 10:52 AM, Vladimir Neyelov wrote:
> > Support of vendor specific CDBs in iser.

This command surely is wrong.

Except for that: what's the use case?  The only use case for gigantic
CDBs in iSCSI was the T10 OSD code, which hopefully is on it's way out.

Or do you just need 32 byte CDBs, in whіch case things could probably
be done much simpler.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] IB/iser: extended CDB support
From: Max Gurtovoy @ 2016-11-29  7:43 UTC (permalink / raw)
  To: Vladimir Neyelov, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	sagi-NQWnxTmZq1alnMjI0IkVqw, Christoph Hellwig
In-Reply-To: <1479891139-15604-1-git-send-email-vladimirn-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Sagi/Christopth,

any thoughts on this onw ?

On 11/23/2016 10:52 AM, Vladimir Neyelov wrote:
> Support of vendor specific CDBs in iser. SCSI supports max command size
> SCSI_MAX_VARLEN_CDB_SIZE (260 bytes). ISER currently supports max scsi
> command 16 bytes. This commit changes max scsi command for ISER
> (to align with iscsi/tcp) to SCSI_MAX_VARLEN_CDB_SIZE.
>
> Signed-off-by: Vladimir Neyelov <vladimirn-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/ulp/iser/iscsi_iser.c | 17 +++++++++++++----
>  drivers/infiniband/ulp/iser/iscsi_iser.h |  1 +
>  2 files changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
> index 64b3d11..865ce48 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> @@ -163,7 +163,8 @@ iscsi_iser_pdu_alloc(struct iscsi_task *task, uint8_t opcode)
>  	struct iscsi_iser_task *iser_task = task->dd_data;
>
>  	task->hdr = (struct iscsi_hdr *)&iser_task->desc.iscsi_header;
> -	task->hdr_max = sizeof(iser_task->desc.iscsi_header);
> +	task->hdr_max = sizeof(iser_task->desc.iscsi_header) +
> +			sizeof(iser_task->desc.iscsi_ecdb_header);
>
>  	return 0;
>  }
> @@ -189,6 +190,14 @@ iser_initialize_task_headers(struct iscsi_task *task,
>  	u64 dma_addr;
>  	const bool mgmt_task = !task->sc && !in_interrupt();
>  	int ret = 0;
> +	int headers_len = ISER_HEADERS_LEN;
> +	struct scsi_cmnd *sc = task->sc;
> +
> +	if(sc) {
> +		if(sc->cmd_len > ISCSI_CDB_SIZE)
> +			headers_len += offsetof(struct iscsi_ecdb_ahdr, ecdb) +
> +					(sc->cmd_len - ISCSI_CDB_SIZE);
> +	}
>
>  	if (unlikely(mgmt_task))
>  		mutex_lock(&iser_conn->state_mutex);
> @@ -199,7 +208,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
>  	}
>
>  	dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,
> -				ISER_HEADERS_LEN, DMA_TO_DEVICE);
> +				headers_len, DMA_TO_DEVICE );
>  	if (ib_dma_mapping_error(device->ib_device, dma_addr)) {
>  		ret = -ENOMEM;
>  		goto out;
> @@ -209,7 +218,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
>  	tx_desc->mapped = true;
>  	tx_desc->dma_addr = dma_addr;
>  	tx_desc->tx_sg[0].addr   = tx_desc->dma_addr;
> -	tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
> +	tx_desc->tx_sg[0].length = headers_len;
>  	tx_desc->tx_sg[0].lkey   = device->pd->local_dma_lkey;
>
>  	iser_task->iser_conn = iser_conn;
> @@ -623,7 +632,7 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
>  	shost->max_lun = iscsi_max_lun;
>  	shost->max_id = 0;
>  	shost->max_channel = 0;
> -	shost->max_cmd_len = 16;
> +	shost->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
>
>  	/*
>  	 * older userspace tools (before 2.0-870) did not pass us
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
> index 0be6a7c..e1ae9b8 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.h
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
> @@ -254,6 +254,7 @@ enum iser_desc_type {
>  struct iser_tx_desc {
>  	struct iser_ctrl             iser_header;
>  	struct iscsi_hdr             iscsi_header;
> +	struct iscsi_ecdb_ahdr       iscsi_ecdb_header;
>  	enum   iser_desc_type        type;
>  	u64		             dma_addr;
>  	struct ib_sge		     tx_sg[2];
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-29  7:25 UTC (permalink / raw)
  To: Sergei Shtylyov, yishaih-VPRAkNaXOzVWk0Htik3J/w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Rameshwar Sahu
In-Reply-To: <20161129065931.GA3245@gnr743-HP-ZBook-15>

Please ignore this v2 patch.

On Tue, Nov 29, 2016 at 12:29 PM, Souptick Joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc().
>
> Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
> v2:
>   - Address comment from sergei
>     Alignment was not proper
>
>  drivers/net/ethernet/mellanox/mlx4/cmd.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..96cdf9a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>         if (!mailbox)
>                 return ERR_PTR(-ENOMEM);
>
> -       mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -                                     &mailbox->dma);
> +       mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +                                      &mailbox->dma);
>         if (!mailbox->buf) {
>                 kfree(mailbox);
>                 return ERR_PTR(-ENOMEM);
>         }
>
> -       memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
>
>         return mailbox;
>  }
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox