linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Simon Kirby <sim@hostway.ca>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS client/sunrpc getting stuck on 2.6.36
Date: Fri, 19 Nov 2010 16:24:48 -0500	[thread overview]
Message-ID: <1290201888.3135.61.camel@heimdal.trondhjem.org> (raw)
In-Reply-To: <20101119202004.GA3270@hostway.ca>

On Fri, 2010-11-19 at 12:20 -0800, Simon Kirby wrote:
> On Thu, Nov 11, 2010 at 01:22:47PM +0800, Trond Myklebust wrote:
> 
> > On Wed, 2010-11-10 at 18:35 -0800, Simon Kirby wrote:
> > > Still seeing all sorts of boxes fall over with 2.6.35 and 2.6.36 NFS.
> > > Unfortunately, it doesn't happen all the time...only certain load
> > > patterns seem to start it off.  Once it starts, I can't find a way to
> > > make it recover without rebooting.
> > >...
> > > NFS: permission(0:4c/5284877), mask=0x1, res=0
> > > NFS: revalidating (0:4c/3247737045)
> > > 
> > > 900ms matches the probably-silly nfs mount settings we're currently using:
> > > 
> > > rw,hard,intr,tcp,timeo=9,retrans=3,rsize=8192,wsize=8192
> > > 
> > > Full kernel log here: http://0x.ca/sim/ref/2.6.36_stuck_nfs/
> > 
> > timeo=9 is a completely insane retransmit value for a tcp connection.
> > 
> > Please use the default timeo=600, and all will work correctly.
> 
> Ok, so, we were running with timeo=300 instead on a number of servers,
> and we were still seeing the problem on 2.6.36.  I've uploaded a new
> kernel log (lsh1051) here:
> 
> 	http://0x.ca/sim/ref/2.6.36_stuck_nfs/
> 
> The log starts out with the hung task warnings occurring after
> otherwise-normal operation.  Once I noticed, I set rpc/nfs_debug to 1,
> and then later set it to 255.

Were the NFS servers hung at this point? If so, then that probably
suffices to explain the hung task warnings (which would be false
positives) as being due to the page cache waiting to lock pages on which
I/O is being performed.

> Since several servers were stuck at the same time and we were losing
> quorum, I decided to try something more drastic and booted into
> 2.6.37-rc2-git3.  This kernel hasn't got stuck yet!  However, it's
> spitting out some new errors which may be worth looking into:
> 
> [ 1574.088812] NFS: server 10.10.52.222 error: fileid changed
> [ 1574.088814] fsid 0:18: expected fileid 0x4c081940, got 0x4c081950
> [11340.409447] NFS: server 10.10.52.228 error: fileid changed
> [11340.409450] fsid 0:45: expected fileid 0x696ff82, got 0x16a98bd7
> [20832.579912] NFS: server 10.10.52.225 error: fileid changed
> [20832.579914] fsid 0:2a: expected fileid 0x8c67ebab, got 0x8c6811e5
> [32775.957351] NFS: server 10.10.52.230 error: fileid changed
> [32775.957354] fsid 0:52: expected fileid 0x919041fd, got 0x93f1962d
> 
> These are also in the same kernel log.  The error code isn't new, so
> something else seems to have changed to cause it.

These indicate server bugs: your failover event appears to have caused
the inode numbers to have changed on a number of files. This is
something that shouldn't happen in a normal NFS environment, and so the
client prints out the above warnings...

Trond


  reply	other threads:[~2010-11-19 21:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-11  2:35 NFS client/sunrpc getting stuck on 2.6.36 Simon Kirby
2010-11-11  5:22 ` Trond Myklebust
2010-11-11  8:49   ` Simon Kirby
2010-11-19 20:20   ` Simon Kirby
2010-11-19 21:24     ` Trond Myklebust [this message]
2010-11-19 22:03       ` Simon Kirby
2010-11-19 22:17         ` Trond Myklebust
2010-11-19 22:58           ` Simon Kirby
2010-11-19 23:17             ` Trond Myklebust
2010-11-21  6:43               ` Simon Kirby
2010-11-21 19:55                 ` Trond Myklebust
2010-11-21  6:40           ` Simon Kirby
2010-11-21 19:54             ` Trond Myklebust
2010-11-24  5:18               ` Simon Kirby
2010-11-24 15:05                 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1290201888.3135.61.camel@heimdal.trondhjem.org \
    --to=trond.myklebust@fys.uio.no \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sim@hostway.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).