From: "Steve Wise" <swise@opengridcomputing.com>
To: "'J. Bruce Fields'" <bfields@fieldses.org>,
"'Tom Tucker'" <tom@opengridcomputing.com>
Cc: <trond.myklebust@primarydata.com>, <linux-nfs@vger.kernel.org>,
"'Indranil Choudhury'" <indranil@chelsio.com>
Subject: RE: [PATCH] Fix regression in NFSRDMA server
Date: Fri, 28 Mar 2014 16:31:25 -0500 [thread overview]
Message-ID: <009b01cf4acd$12832750$378975f0$@opengridcomputing.com> (raw)
In-Reply-To: <20140328212633.GF6041@fieldses.org>
+Indranil
Indranil Choudhury is the QA contact.
Steve
> -----Original Message-----
> From: J. Bruce Fields [mailto:bfields@fieldses.org]
> Sent: Friday, March 28, 2014 4:27 PM
> To: Tom Tucker
> Cc: Steve Wise; trond.myklebust@primarydata.com; linux-nfs@vger.kernel.org
> Subject: Re: [PATCH] Fix regression in NFSRDMA server
>
> On Fri, Mar 28, 2014 at 10:21:27AM -0500, Tom Tucker wrote:
> > Hi Bruce,
> >
> > On 3/27/14 9:08 PM, J. Bruce Fields wrote:
> > >On Tue, Mar 25, 2014 at 03:14:57PM -0500, Steve Wise wrote:
> > >>From: Tom Tucker <tom@ogc.us>
> > >>
> > >>The server regression was caused by the addition of rq_next_page
> > >>(afc59400d6c65bad66d4ad0b2daf879cbff8e23e). There were a few places that
> > >>were missed with the update of the rq_respages array.
> > >Apologies. (But, it could happen again--could we set up some regular
> > >testing? It doesn't have to be anything fancy, just cthon over
> > >rdma--really, just read and write over rdma--would probably catch a
> > >lot.)
> >
> > I think Chelsio is going to be adding some NFSRDMA regression
> > testing to their system test.
>
> OK. Do you know who there is setting that up? I'd be curious exactly
> what kernels they intend to test and how they plan to report results.
>
> > >Also: I don't get why all these rq_next_page initializations are
> > >required. Why isn't the initialization at the top of svc_process()
> > >enough? Is rdma using it before we get to that point? The only use of
> > >it I see off hand is in the while loop that you're deleting.
> >
> > I didn't apply tremendous deductive powers here, I just added
> > updates to rq_next_page wherever the transport messed with
> > rq_respages. That said, NFS WRITE is likely the culprit since the
> > write is completed as a deferral and therefore the request doesn't
> > go through svc_process, so if rq_next_page is bogus, the cleanup
> > will free/re-use pages that are actually in use by the transport.
>
> Ugh, OK, without tracing through the code I guess I can see how that
> would happen. Remind me why it's using deferrals?
>
> Applying the patch.
>
> --b.
>
> >
> > Tom
> > >--b.
> > >
> > >>Signed-off-by: Tom Tucker <tom@ogc.us>
> > >>Tested-by: Steve Wise <swise@ogc.us>
> > >>---
> > >>
> > >> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 12 ++++--------
> > >> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 1 +
> > >> 2 files changed, 5 insertions(+), 8 deletions(-)
> > >>
> > >>diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>index 0ce7552..8d904e4 100644
> > >>--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> > >>@@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
> > >> sge_no++;
> > >> }
> > >> rqstp->rq_respages = &rqstp->rq_pages[sge_no];
> > >>+ rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >> /* We should never run out of SGE because the limit is defined to
> > >> * support the max allowed RPC data length
> > >>@@ -169,6 +170,7 @@ static int map_read_chunks(struct svcxprt_rdma *xprt,
> > >> */
> > >> head->arg.pages[page_no] = rqstp->rq_arg.pages[page_no];
> > >> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no+1];
> > >>+ rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >> byte_count -= sge_bytes;
> > >> ch_bytes -= sge_bytes;
> > >>@@ -276,6 +278,7 @@ static int fast_reg_read_chunks(struct svcxprt_rdma *xprt,
> > >> /* rq_respages points one past arg pages */
> > >> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> > >>+ rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >> /* Create the reply and chunk maps */
> > >> offset = 0;
> > >>@@ -520,13 +523,6 @@ next_sge:
> > >> for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages; ch_no++)
> > >> rqstp->rq_pages[ch_no] = NULL;
> > >>- /*
> > >>- * Detach res pages. If svc_release sees any it will attempt to
> > >>- * put them.
> > >>- */
> > >>- while (rqstp->rq_next_page != rqstp->rq_respages)
> > >>- *(--rqstp->rq_next_page) = NULL;
> > >>-
> > >> return err;
> > >> }
> > >>@@ -550,7 +546,7 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
> > >> /* rq_respages starts after the last arg page */
> > >> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
> > >>- rqstp->rq_next_page = &rqstp->rq_arg.pages[page_no];
> > >>+ rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >> /* Rebuild rq_arg head and tail. */
> > >> rqstp->rq_arg.head[0] = head->arg.head[0];
> > >>diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>index c1d124d..11e90f8 100644
> > >>--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
> > >>@@ -625,6 +625,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
> > >> if (page_no+1 >= sge_no)
> > >> ctxt->sge[page_no+1].length = 0;
> > >> }
> > >>+ rqstp->rq_next_page = rqstp->rq_respages + 1;
> > >> BUG_ON(sge_no > rdma->sc_max_sge);
> > >> memset(&send_wr, 0, sizeof send_wr);
> > >> ctxt->wr_op = IB_WR_SEND;
> > >>
> > >--
> > >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > >the body of a message to majordomo@vger.kernel.org
> > >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
next prev parent reply other threads:[~2014-03-28 21:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-25 20:14 [PATCH] Fix regression in NFSRDMA server Steve Wise
2014-03-28 2:08 ` J. Bruce Fields
2014-03-28 13:38 ` Steve Wise
2014-03-28 15:21 ` Tom Tucker
2014-03-28 21:26 ` J. Bruce Fields
2014-03-28 21:31 ` Steve Wise [this message]
2014-03-29 0:11 ` Tom Tucker
2014-03-29 0:51 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='009b01cf4acd$12832750$378975f0$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=bfields@fieldses.org \
--cc=indranil@chelsio.com \
--cc=linux-nfs@vger.kernel.org \
--cc=tom@opengridcomputing.com \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).