public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Tom Tucker <tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	Layton Jeff <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Steve Wise
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	Dr Fields James Bruce
	<bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>,
	Yan Burman <yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: NFS over RDMA crashing
Date: Wed, 12 Mar 2014 09:22:03 -0500	[thread overview]
Message-ID: <53206D8B.9060406@opengridcomputing.com> (raw)
In-Reply-To: <731A7629-7DBB-4FC3-8F21-70380705ED4E-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>

Hi Trond,

I think this patch is still 'off-by-one'. We'll take a look at this today.

Thanks,
Tom

On 3/12/14 9:05 AM, Trond Myklebust wrote:
> On Mar 12, 2014, at 9:33, Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
>> On Sat, 08 Mar 2014 14:13:44 -0600
>> Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>>
>>> On 3/8/2014 1:20 PM, Steve Wise wrote:
>>>>> I removed your change and started debugging original crash that
>>>>> happens on top-o-tree.   Seems like rq_next_pages is screwed up.  It
>>>>> should always be >= rq_respages, yes?  I added a BUG_ON() to assert
>>>>> this in rdma_read_xdr() we hit the BUG_ON(). Look
>>>>>
>>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000
>>>>> rq_next_page = 0xffff8800b84e6228
>>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000
>>>>> rq_respages = 0xffff8800b84e62a8
>>>>>
>>>>> Any ideas Bruce/Tom?
>>>>>
>>>> Guys, the patch below seems to fix the problem.  Dunno if it is
>>>> correct though.  What do you think?
>>>>
>>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> index 0ce7552..6d62411 100644
>>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>>>>                sge_no++;
>>>>        }
>>>>        rqstp->rq_respages = &rqstp->rq_pages[sge_no];
>>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>>>
>>>>        /* We should never run out of SGE because the limit is defined to
>>>>         * support the max allowed RPC data length
>>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct
>>>> svcxprt_rdma *xprt,
>>>>
>>>>        /* rq_respages points one past arg pages */
>>>>        rqstp->rq_respages = &rqstp->rq_arg.pages[page_no];
>>>> +       rqstp->rq_next_page = rqstp->rq_respages;
>>>>
>>>>        /* Create the reply and chunk maps */
>>>>        offset = 0;
>>>>
>>>>
>>> While this patch avoids the crashing, it apparently isn't correct...I'm
>>> getting IO errors reading files over the mount. :)
>>>
>> I hit the same oops and tested your patch and it seems to have fixed
>> that particular panic, but I still see a bunch of other mem corruption
>> oopses even with it. I'll look more closely at that when I get some
>> time.
>>
>> FWIW, I can easily reproduce that by simply doing something like:
>>
>>    $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1
>>
>> I'm not sure why you're not seeing any panics with your patch in place.
>> Perhaps it's due to hw differences between our test rigs.
>>
>> The EIO problem that you're seeing is likely the same client bug that
>> Chuck recently fixed in this patch:
>>
>>    [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA
>>
>> AIUI, Trond is merging that set for 3.15, so I'd make sure your client
>> has those patches when testing.
>>
> Nothing is in my queue yet.
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2014-03-12 14:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-06 15:48 NFS over RDMA crashing Yan Burman
     [not found] ` <51127B3F.2090200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-02-06 15:58   ` Steve Wise
     [not found]     ` <51127DB1.6070804-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2013-02-06 17:06       ` Jeff Becker
     [not found]         ` <51128DAC.9000206-NSQ8wuThN14@public.gmane.org>
2013-02-07 15:54           ` Yan Burman
2013-02-06 22:24   ` J. Bruce Fields
     [not found]     ` <20130206222435.GL16417-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-06 22:28       ` Steve Wise
     [not found]         ` <5112D903.9010601-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2013-02-08  5:37           ` Tom Tucker
2013-02-07 16:41       ` J. Bruce Fields
     [not found]         ` <20130207164134.GK3222-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-11 15:19           ` Yan Burman
     [not found]             ` <0EE9A1CDC8D6434DB00095CD7DB8734611518A44-fViJhHBwANKuSA5JZHE7gA@public.gmane.org>
2013-02-11 18:13               ` J. Bruce Fields
2013-02-15 15:27               ` J. Bruce Fields
     [not found]                 ` <20130215152746.GI8343-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-18 11:44                   ` Yan Burman
2014-03-07 16:59           ` Steve Wise
2014-03-07 20:41             ` Steve Wise
2014-03-08 16:39               ` Steve Wise
     [not found]                 ` <531B47B3.1070503-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-03-08 19:20                   ` Steve Wise
     [not found]                     ` <531B6D90.2090208-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-03-08 20:13                       ` Steve Wise
     [not found]                         ` <531B79F8.2020008-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-03-12 13:33                           ` Jeff Layton
     [not found]                             ` <20140312093300.7a434cbb-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2014-03-12 14:05                               ` Trond Myklebust
     [not found]                                 ` <731A7629-7DBB-4FC3-8F21-70380705ED4E-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2014-03-12 14:22                                   ` Tom Tucker [this message]
2014-03-12 14:28                                   ` Jeffrey Layton
     [not found]                                     ` <20140312102806.435847a7-uvzPfv+vNdB0Ogp0/tUwVOTW4wlIGRCZ@public.gmane.org>
2014-03-12 15:03                                       ` Trond Myklebust
     [not found]                                         ` <56B1FEC7-8514-4B2B-851B-7BC965A26AA8-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2014-03-12 15:29                                           ` Jeffrey Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53206D8B.9060406@opengridcomputing.com \
    --to=tom-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org \
    --cc=jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    --cc=trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
    --cc=yanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox