linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Steve Wise" <swise@opengridcomputing.com>
To: "'J. Bruce Fields'" <bfields@fieldses.org>,
	"'Tom Tucker'" <tom@opengridcomputing.com>
Cc: <trond.myklebust@primarydata.com>, <linux-nfs@vger.kernel.org>,
	"'Indranil Choudhury'" <indranil@chelsio.com>
Subject: RE: [PATCH] Fix regression in NFSRDMA server
Date: Fri, 28 Mar 2014 16:31:25 -0500	[thread overview]
Message-ID: <009b01cf4acd$12832750$378975f0$@opengridcomputing.com> (raw)
In-Reply-To: <20140328212633.GF6041@fieldses.org>

+Indranil

Indranil Choudhury is the QA contact.  

Steve
> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Friday, March 28, 2014 4:27 PM
> To: Tom Tucker
> Cc: Steve Wise; trond.myklebust@primarydata.com; linux-nfs@vger.kernel.org
> Subject: Re: [PATCH] Fix regression in NFSRDMA server
> 
> On Fri, Mar 28, 2014 at 10:21:27AM -0500, Tom Tucker wrote:
> > Hi Bruce,
> >
> > On 3/27/14 9:08 PM, J. Bruce Fields wrote:
> > >On Tue, Mar 25, 2014 at 03:14:57PM -0500, Steve Wise wrote:
> > >>From: Tom Tucker <tom@ogc.us>
> > >>
> > >>The server regression was caused by the addition of rq_next_page
> > >>(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that
> > >>were missed with the update of the rq_respages array.
> > >Apologies.  (But, it could happen again--could we set up some regular
> > >testing?  It doesn't have to be anything fancy, just cthon over
> > >rdma--really, just read and write over rdma--would probably catch a
> > >lot.)
> >
> > I think Chelsio is going to be adding some NFSRDMA regression
> > testing to their system test.
> 
> OK.  Do you know who there is setting that up?  I'd be curious exactly
> what kernels they intend to test and how they plan to report results.
> 
> > >Also: I don't get why all these rq_next_page initializations are
> > >required.  Why isn't the initialization at the top of svc_process()
> > >enough?  Is rdma using it before we get to that point?  The only use of
> > >it I see off hand is in the while loop that you're deleting.
> >
> > I didn't apply tremendous deductive powers here, I just added
> > updates to rq_next_page wherever the transport messed with
> > rq_respages. That said, NFS WRITE is likely the culprit since the
> > write is completed as a deferral and therefore the request doesn't
> > go through svc_process, so if rq_next_page is bogus, the cleanup
> > will free/re-use pages that are actually in use by the transport.
> 
> Ugh, OK, without tracing through the code I guess I can see how that
> would happen.  Remind me why it's using deferrals?
> 
> Applying the patch.
> 
> --b.
> 
> >
> > Tom
> > >--b.
> > >
> > >>Signed-off-by: Tom Tucker <tom@ogc.us>
> > >>Tested-by: Steve Wise <swise@ogc.us>
> > >>---
> > >>
> > >>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |   12 ++++--------
> > >>  net/sunrpc/xprtrdma/svc_rdma_sendto.c   |    1 +
> > >>  2 files changed, 5 insertions(+), 8 deletions(-)
> > >>
> > >>diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>index 0ce7552..8d904e4 100644
> > >>--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>@@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
> > >>  		sge_no++;
> > >>  	}
> > >>  	rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> > >>+	rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >>  	/* We should never run out of SGE because the limit is defined to
> > >>  	 * support the max allowed RPC data length
> > >>@@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt,
> > >>  		 */
> > >>  		head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no];
> > >>  		rqstp->rq_respages = &rqstp->rq_arg.pages[page_no+1];
> > >>+		rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >>  		byte_count -= sge_bytes;
> > >>  		ch_bytes -= sge_bytes;
> > >>@@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt,
> > >>  	/* rq_respages points one past arg pages */
> > >>  	rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> > >>+	rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >>  	/* Create the reply and chunk maps */
> > >>  	offset = 0;
> > >>@@ -520,13 +523,6 @@ next_sge:
> > >>  	for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages; ch_no++)
> > >>  		rqstp->rq_pages[ch_no] = NULL;
> > >>-	/*
> > >>-	 * Detach res pages. If svc_release sees any it will attempt to
> > >>-	 * put them.
> > >>-	 */
> > >>-	while (rqstp->rq_next_page != rqstp->rq_respages)
> > >>-		*(--rqstp->rq_next_page) = NULL;
> > >>-
> > >>  	return err;
> > >>  }
> > >>@@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
> > >>  	/* rq_respages starts after the last arg page */
> > >>  	rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> > >>-	rqstp->rq_next_page = &rqstp->rq_arg.pages[page_no];
> > >>+	rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >>  	/* Rebuild rq_arg head and tail. */
> > >>  	rqstp->rq_arg.head[0] = head->arg.head[0];
> > >>diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>index c1d124d..11e90f8 100644
> > >>--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>@@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
> > >>  		if (page_no+1 >= sge_no)
> > >>  			ctxt->sge[page_no+1].length = 0;
> > >>  	}
> > >>+	rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >>  	BUG_ON(sge_no > rdma->sc_max_sge);
> > >>  	memset(&send_wr, 0, sizeof send_wr);
> > >>  	ctxt->wr_op = IB_WR_SEND;
> > >>
> > >--
> > >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >the body of a message to majordomo@vger.kernel.org
> > >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >


  reply	other threads:[~2014-03-28 21:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-25 20:14 [PATCH] Fix regression in NFSRDMA server Steve Wise
2014-03-28  2:08 ` J. Bruce Fields
2014-03-28 13:38   ` Steve Wise
2014-03-28 15:21   ` Tom Tucker
2014-03-28 21:26     ` J. Bruce Fields
2014-03-28 21:31       ` Steve Wise [this message]
2014-03-29  0:11       ` Tom Tucker
2014-03-29  0:51         ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='009b01cf4acd$12832750$378975f0$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    --cc=bfields@fieldses.org \
    --cc=indranil@chelsio.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=tom@opengridcomputing.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).