From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>,
Eryu Guan <guaneryu@gmail.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: upstream server crash
Date: Mon, 24 Oct 2016 16:40:22 -0400 [thread overview]
Message-ID: <20161024204022.GB27359@fieldses.org> (raw)
In-Reply-To: <1477336654.21854.9.camel@redhat.com>
On Mon, Oct 24, 2016 at 03:17:34PM -0400, Jeff Layton wrote:
> On Mon, 2016-10-24 at 14:08 -0400, J. Bruce Fields wrote:
> > On Mon, Oct 24, 2016 at 11:24:40AM -0400, Jeff Layton wrote:
> > >
> > > On Mon, 2016-10-24 at 11:19 -0400, Jeff Layton wrote:
> > > >
> > > > On Mon, 2016-10-24 at 09:51 -0400, Chuck Lever wrote:
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Oct 24, 2016, at 9:31 AM, Jeff Layton <jlayton@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, 2016-10-24 at 11:15 +0800, Eryu Guan wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Oct 23, 2016 at 02:21:15PM -0400, J. Bruce Fields wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm getting an intermittent crash in the nfs server as of
> > > > > > > > 68778945e46f143ed7974b427a8065f69a4ce944 "SUNRPC: Separate buffer
> > > > > > > > pointers for RPC Call and Reply messages".
> > > > > > > >
> > > > > > > > I haven't tried to understand that commit or why it would be a problem yet, I
> > > > > > > > don't see an obvious connection--I can take a closer look Monday.
> > > > > > > >
> > > > > > > > Could even be that I just landed on this commit by chance, the problem is a
> > > > > > > > little hard to reproduce so I don't completely trust my testing.
> > > > > > >
> > > > > > > I've hit the same crash on 4.9-rc1 kernel, and it's reproduced for me
> > > > > > > reliably by running xfstests generic/013 case, on a loopback mounted
> > > > > > > NFSv4.1 (or NFSv4.2), XFS is the underlying exported fs. More details
> > > > > > > please see
> > > > > > >
> > > > > > > http://marc.info/?l=linux-nfs&m=147714320129362&w=2
> > > > > > >
> > > > > >
> > > > > > Looks like you landed at the same commit as Bruce, so that's probably
> > > > > > legit. That commit is very small though. The only real change that
> > > > > > doesn't affect the new field is this:
> > > > > >
> > > > > >
> > > > > > @@ -1766,7 +1766,7 @@ rpc_xdr_encode(struct rpc_task *task)
> > > > > > req->rq_buffer,
> > > > > > req->rq_callsize);
> > > > > > xdr_buf_init(&req->rq_rcv_buf,
> > > > > > - (char *)req->rq_buffer + req->rq_callsize,
> > > > > > + req->rq_rbuffer,
> > > > > > req->rq_rcvsize);
> > > > > >
> > > > > >
> > > > > > So I'm guessing this is breaking the callback channel somehow?
> > > > >
> > > > > Could be the TCP backchannel code is using rq_buffer in a different
> > > > > way than RDMA backchannel or the forward channel code.
> > > > >
> > > >
> > > > Well, it basically allocates a page per rpc_rqst and then maps that.
> > > >
> > > > One thing I notice is that this patch ensures that rq_rbuffer gets set
> > > > up in rpc_malloc and xprt_rdma_allocate, but it looks like
> > > > xprt_alloc_bc_req didn't get the same treatment.
> > > >
> > > > I suspect that that may be the problem...
> > > >
> > > In fact, maybe we just need this here? (untested and probably
> > > whitespace damaged):
> >
> > No change in results for me.
> >
> > --b.
> > >
> > >
> > > diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
> > > index ac701c28f44f..c561aa8ce05b 100644
> > > --- a/net/sunrpc/backchannel_rqst.c
> > > +++ b/net/sunrpc/backchannel_rqst.c
> > > @@ -100,6 +100,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
> > > goto out_free;
> > > }
> > > req->rq_rcv_buf.len = PAGE_SIZE;
> > > + req->rq_rbuffer = req->rq_rcv_buf.head[0].iov_base;
> > >
> > > /* Preallocate one XDR send buffer */
> > > if (xprt_alloc_xdr_buf(&req->rq_snd_buf, gfp_flags) < 0) {
>
> Ahh ok, I think I see.
>
> We probably also need to set rq_rbuffer in bc_malloc and and
> xprt_rdma_bc_allocate.
>
> My guess is that we're ending up in rpc_xdr_encode with a NULL
> rq_rbuffer pointer, so the right fix would seem to be to ensure that it
> is properly set whenever rq_buffer is set.
>
> So I think this may be what we want, actually. I'll plan to test it out
> but may not get to it before tomorrow.
It passes here.
--b.
>
> --
> Jeff Layton <jlayton@redhat.com>
> From ef2a391bc4d8f6b729aacee7cde8d9baf86767c3 Mon Sep 17 00:00:00 2001
> From: Jeff Layton <jlayton@redhat.com>
> Date: Mon, 24 Oct 2016 15:13:40 -0400
> Subject: [PATCH] sunrpc: fix some missing rq_rbuffer assignments
>
> I think we basically need to set rq_rbuffer whenever rq_buffer is set.
>
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
> net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 1 +
> net/sunrpc/xprtsock.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> index 2d8545c34095..fc4535ead7c2 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
> @@ -182,6 +182,7 @@ xprt_rdma_bc_allocate(struct rpc_task *task)
> return -ENOMEM;
>
> rqst->rq_buffer = page_address(page);
> + rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize;
> return 0;
> }
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 0137af1c0916..e01c825bc683 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2563,6 +2563,7 @@ static int bc_malloc(struct rpc_task *task)
> buf->len = PAGE_SIZE;
>
> rqst->rq_buffer = buf->data;
> + rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize;
> return 0;
> }
>
> --
> 2.7.4
>
next prev parent reply other threads:[~2016-10-24 20:40 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-23 18:21 upstream server crash J. Bruce Fields
2016-10-23 20:04 ` Chuck Lever
2016-10-23 20:14 ` J. Bruce Fields
2016-10-24 3:15 ` Eryu Guan
2016-10-24 13:31 ` Jeff Layton
2016-10-24 13:51 ` Chuck Lever
2016-10-24 15:19 ` Jeff Layton
2016-10-24 15:24 ` Jeff Layton
2016-10-24 15:55 ` Chuck Lever
2016-10-24 18:08 ` J. Bruce Fields
2016-10-24 19:17 ` Jeff Layton
2016-10-24 20:40 ` J. Bruce Fields [this message]
2016-10-24 21:38 ` Chuck Lever
2016-10-25 0:57 ` Jeff Layton
2016-10-25 1:00 ` Chuck Lever
2016-10-25 1:46 ` Jeff Layton
2016-10-25 2:02 ` Chuck Lever
2016-10-28 1:20 ` Chuck Lever
2016-10-28 20:50 ` J. Bruce Fields
2016-10-28 21:45 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161024204022.GB27359@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=guaneryu@gmail.com \
--cc=jlayton@redhat.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).