From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: linux-nfs-owner@vger.kernel.org Received: from smtp.opengridcomputing.com ([72.48.136.20]:35711 "EHLO smtp.opengridcomputing.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752841AbaCLOWF (ORCPT ); Wed, 12 Mar 2014 10:22:05 -0400 Message-ID: <53206D8B.9060406@opengridcomputing.com> Date: Wed, 12 Mar 2014 09:22:03 -0500 From: Tom Tucker MIME-Version: 1.0 To: Trond Myklebust , Layton Jeff CC: Steve Wise , Dr Fields James Bruce , Yan Burman , linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org, Or Gerlitz Subject: Re: NFS over RDMA crashing References: <51127B3F.2090200@mellanox.com> <20130206222435.GL16417@fieldses.org> <20130207164134.GK3222@fieldses.org> <003601cf3a26$94523ee0$bcf6bca0$@opengridcomputing.com> <005d01cf3a45$94ced0d0$be6c7270$@opengridcomputing.com> <531B47B3.1070503@opengridcomputing.com> <531B6D90.2090208@opengridcomputing.com> <531B79F8.2020008@opengridcomputing.com> <20140312093300.7a434cbb@tlielax.poochiereds.net> <731A7629-7DBB-4FC3-8F21-70380705ED4E@primarydata.com> In-Reply-To: <731A7629-7DBB-4FC3-8F21-70380705ED4E@primarydata.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: Hi Trond, I think this patch is still 'off-by-one'. We'll take a look at this today. Thanks, Tom On 3/12/14 9:05 AM, Trond Myklebust wrote: > On Mar 12, 2014, at 9:33, Jeff Layton wrote: > >> On Sat, 08 Mar 2014 14:13:44 -0600 >> Steve Wise wrote: >> >>> On 3/8/2014 1:20 PM, Steve Wise wrote: >>>>> I removed your change and started debugging original crash that >>>>> happens on top-o-tree. Seems like rq_next_pages is screwed up. It >>>>> should always be >= rq_respages, yes? I added a BUG_ON() to assert >>>>> this in rdma_read_xdr() we hit the BUG_ON(). Look >>>>> >>>>> crash> svc_rqst.rq_next_page 0xffff8800b84e6000 >>>>> rq_next_page = 0xffff8800b84e6228 >>>>> crash> svc_rqst.rq_respages 0xffff8800b84e6000 >>>>> rq_respages = 0xffff8800b84e62a8 >>>>> >>>>> Any ideas Bruce/Tom? >>>>> >>>> Guys, the patch below seems to fix the problem. Dunno if it is >>>> correct though. What do you think? >>>> >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c >>>> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c >>>> index 0ce7552..6d62411 100644 >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c >>>> @@ -90,6 +90,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp, >>>> sge_no++; >>>> } >>>> rqstp->rq_respages = &rqstp->rq_pages[sge_no]; >>>> + rqstp->rq_next_page = rqstp->rq_respages; >>>> >>>> /* We should never run out of SGE because the limit is defined to >>>> * support the max allowed RPC data length >>>> @@ -276,6 +277,7 @@ static int fast_reg_read_chunks(struct >>>> svcxprt_rdma *xprt, >>>> >>>> /* rq_respages points one past arg pages */ >>>> rqstp->rq_respages = &rqstp->rq_arg.pages[page_no]; >>>> + rqstp->rq_next_page = rqstp->rq_respages; >>>> >>>> /* Create the reply and chunk maps */ >>>> offset = 0; >>>> >>>> >>> While this patch avoids the crashing, it apparently isn't correct...I'm >>> getting IO errors reading files over the mount. :) >>> >> I hit the same oops and tested your patch and it seems to have fixed >> that particular panic, but I still see a bunch of other mem corruption >> oopses even with it. I'll look more closely at that when I get some >> time. >> >> FWIW, I can easily reproduce that by simply doing something like: >> >> $ dd if=/dev/urandom of=/file/on/nfsordma/mount bs=4k count=1 >> >> I'm not sure why you're not seeing any panics with your patch in place. >> Perhaps it's due to hw differences between our test rigs. >> >> The EIO problem that you're seeing is likely the same client bug that >> Chuck recently fixed in this patch: >> >> [PATCH 2/8] SUNRPC: Fix large reads on NFS/RDMA >> >> AIUI, Trond is merging that set for 3.15, so I'd make sure your client >> has those patches when testing. >> > Nothing is in my queue yet. > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html