Linux NFS development
 help / color / mirror / Atom feed
From: Tom Tucker <tom@opengridcomputing.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>,
	Tom Talpey <Thomas.Talpey@netapp.com>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: [Fwd: [PATCH, RFC] xprtrdma: Simplify client responder resource handling]
Date: Mon, 15 Sep 2008 11:00:53 -0500	[thread overview]
Message-ID: <48CE86B5.8000101@opengridcomputing.com> (raw)


Hey guys. Do you have any status/comments on this patch?

Perhaps my description is misleading. This is a bug and there are people 
playing with big data mounts that are hitting this. I'll be happy to 
resend it if you'd like.

Thanks,
Tom


-------- Original Message --------
Subject: [PATCH, RFC] xprtrdma: Simplify client responder resource handling
Date: Thu, 21 Aug 2008 20:35:06 -0500
From: Tom Tucker <tom@opengridcomputing.com>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
CC: linux-nfs@vger.kernel.org, Tom Talpey <Thomas.Talpey@netapp.com>


This logic sets the connection parameter that configures the local 
device and informs the remote peer how many concurrent incoming 
RDMA_READ requests are supported. This logic didn't really do what was 
intended for two reasons:

- The max number supported by the device is typically smaller than
any one factor in the calculation used, and

- The field in the connection parameter structure where the value is
stored is a u8 and always overflows for the default settings.

So what really happens is the value requested for responder resources
is the left over 8 bits from the "desired value". If the desired value
happened to be a multiple of 256, the result was zero and it wouldn't
connect at all.

Given the above and the fact that max_requests is almost always larger
than the max responder resources supported by the adapter, this patch
simplifies this logic and simply requests the max supported by the device.

This bug was found by Jim Schutt at Sandia.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

---
  net/sunrpc/xprtrdma/verbs.c |   44 
++++++------------------------------------
  1 files changed, 7 insertions(+), 37 deletions(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index edf520c..a6d86f6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -670,30 +670,11 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct 
rpcrdma_ia *ia,
  	ep->rep_remote_cma.private_data_len = 0;

  	/* Client offers RDMA Read but does not initiate */
-	switch (ia->ri_memreg_strategy) {
-	case RPCRDMA_BOUNCEBUFFERS:
-		ep->rep_remote_cma.responder_resources = 0;
-		break;
-	case RPCRDMA_MTHCAFMR:
-	case RPCRDMA_REGISTER:
-	case RPCRDMA_FASTREG:
-		ep->rep_remote_cma.responder_resources = cdata->max_requests *
-				(RPCRDMA_MAX_DATA_SEGS / 8);
-		break;
-	case RPCRDMA_MEMWINDOWS:
-	case RPCRDMA_MEMWINDOWS_ASYNC:
-#if RPCRDMA_PERSISTENT_REGISTRATION
-	case RPCRDMA_ALLPHYSICAL:
-#endif
-		ep->rep_remote_cma.responder_resources = cdata->max_requests *
-				(RPCRDMA_MAX_DATA_SEGS / 2);
-		break;
-	default:
-		break;
-	}
-	if (ep->rep_remote_cma.responder_resources > devattr.max_qp_rd_atom)
-		ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;
  	ep->rep_remote_cma.initiator_depth = 0;
+	if (ia->ri_memreg_strategy != RPCRDMA_BOUNCEBUFFERS)
+		ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;
+	else
+		ep->rep_remote_cma.responder_resources = 0;

  	ep->rep_remote_cma.retry_count = 7;
  	ep->rep_remote_cma.flow_control = 0;
@@ -822,15 +803,6 @@ if 
(strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
  		rc = ib_modify_qp(ia->ri_id->qp, &attr, IB_QP_PATH_MTU);
  	}
  }
-
-	/* Theoretically a client initiator_depth > 0 is not needed,
-	 * but many peers fail to complete the connection unless they
-	 * == responder_resources! */
-	if (ep->rep_remote_cma.initiator_depth !=
-				ep->rep_remote_cma.responder_resources)
-		ep->rep_remote_cma.initiator_depth =
-			ep->rep_remote_cma.responder_resources;
-
  	ep->rep_connected = 0;

  	rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
@@ -859,12 +831,10 @@ if 
(strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
  	if (ep->rep_connected <= 0) {
  		/* Sometimes, the only way to reliably connect to remote
  		 * CMs is to use same nonzero values for ORD and IRD. */
+		if (!ep->rep_remote_cma.responder_resources)
+			ep->rep_remote_cma.responder_resources = 1;
  		ep->rep_remote_cma.initiator_depth =
-					ep->rep_remote_cma.responder_resources;
-		if (ep->rep_remote_cma.initiator_depth == 0)
-			++ep->rep_remote_cma.initiator_depth;
-		if (ep->rep_remote_cma.responder_resources == 0)
-			++ep->rep_remote_cma.responder_resources;
+			ep->rep_remote_cma.responder_resources;
  		if (retry_count++ == 0)
  			goto retry;
  		rc = ep->rep_connected;
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

                 reply	other threads:[~2008-09-15 16:00 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48CE86B5.8000101@opengridcomputing.com \
    --to=tom@opengridcomputing.com \
    --cc=Thomas.Talpey@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox