Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Larry Keegan <lk@pfw.demon.co.uk>
To: Jeff Layton <jlayton@redhat.com>
Cc: <linux-nfs@vger.kernel.org>
Subject: Re: nfs client: Now you see it, now you don't (aka spurious ESTALE errors)
Date: Fri, 26 Jul 2013 16:10:46 +0000	[thread overview]
Message-ID: <20130726161046.00c19730@cs3.al.itld> (raw)
In-Reply-To: <20130726091225.5f299ff6@corrin.poochiereds.net>

On Fri, 26 Jul 2013 09:12:25 -0400
Jeff Layton <jlayton@redhat.com> wrote:
> On Fri, 26 Jul 2013 12:41:01 +0000
> Larry Keegan <lk@pfw.demon.co.uk> wrote:
> 
> > On Thu, 25 Jul 2013 14:18:28 -0400
> > Jeff Layton <jlayton@redhat.com> wrote:
> > > On Thu, 25 Jul 2013 17:05:26 +0000
> > > Larry Keegan <lk@pfw.demon.co.uk> wrote:
> > > 
> > > > On Thu, 25 Jul 2013 10:11:43 -0400
> > > > Jeff Layton <jlayton@redhat.com> wrote:
> > > > > On Thu, 25 Jul 2013 13:45:15 +0000
> > > > > Larry Keegan <lk@pfw.demon.co.uk> wrote:
> > > > > 
> > > > > > Dear Chaps,
> > > > > > 
> > > > > > I am experiencing some inexplicable NFS behaviour which I
> > > > > > would like to run past you.
> > > > > What might be helpful is to do some network captures when the
> > > > > problem occurs. What we want to know is whether the ESTALE
> > > > > errors are coming from the server, or if the client is
> > > > > generating them. That'll narrow down where we need to look
> > > > > for problems.
> > > Ok, we had a number of changes to how ESTALE errors are handled
> > > over the last few releases. When you mentioned 3.10, I had
> > > assumed that you might be hitting a regression in one of those,
> > > but those went in well after the 3.4 series.
> > > 
> > > Captures are probably your best bet. My suspicion is that the
> > > server is returning these ESTALE errors occasionally, but it
> > > would be best to have you confirm that. They may also help make
> > > sense of why it's occurring...
> > I now have a good and a bad packet capture. I can run them through
> > tshark -V but if I do this, they're really long, so I'm wondering
> > how best to post them. I've posted the summaries below.
> > 
> > The first thing that strikes me is the bad trace is much longer.
> > This strikes me as reasonable because as well as the ESTALE problem
> > I've noticed that the whole system seems sluggish. claws-mail is
> > particularly so because it keeps saving my typing into a drafts
> > mailbox, and because claws doesn't really understand traditional
> > mboxes, it spends an inordinate amount of time locking and unlocking
> > the boxes for each message in them. Claws also spews tracebacks
> > frequently and it crashes from time to time, something it never did
> > before the ESTALE problem occurred.
> I'm afraid I can't tell much from the above output. I don't see any
> ESTALE errors there, but you can get similar issues if (for instance)
> certain attributes of a file change.

Such as might occur due to mail delivery?

> You mentioned that this is a DRBD
> cluster, are you "floating" IP addresses between cluster nodes here?
> If so, do your problems occur around the times that that's happening?
> 
> Also, what sort of filesystem is being exported here?
> 
The way my NFS servers are configured is as follows:

I have two identical boxes. They run lvm. There are two lvs on each
box called outer-nfs0 and outer-nfs1. These are kept in sync with drbd.
The content of these volumes are encrypted with dmcrypt. The plaintext
of each volume is a pv. I have two inner volume groups, named nfs0 and
nfs1. These each contain one of those pvs. They are sliced into a dozen
or so lvs. The lvs each contain ext4 filesystems. Each filesystem
contains one or more home directories. Although each filesystem is
exported in its entirety, autofs only mounts subdirectories (for
example /home/larry on fs-nfs0:/export/nfs0/home00/larry). Exports are
arranged by editing the exports file and running 'exportfs -r' so
userspace is always in sync with the kernel.

Each nfs volume group is associated with its own IP address which is
switched along with the volume group. So, when one of my boxes can see
volume group nfs0 it will mount the volumes inside it and export all the
filesystems on that volume group via its own ip address. Thus, one
fileserver can export nothing, a dozen filesystems or two dozen
filesystems. The automounter map only ever refers to the switchable ip
addresses.

This arrangement keeps the complexity of the dmcrypt stuff low and is
moderately nippy. As for the switchover, I've merely arranged pacemaker
to 'ip addr del' and 'ip addr add' the switchable IP addresses, blast
out a few ARPs and Bob's you're uncle. Occasionally I get a machine
which hangs for a couple of minutes, but mostly it's just a few
seconds. Until recently I haven't seen ESTALE errors.

The way I see it, as far as our discussion goes, it looks like I have a
single NFS server with three IP addresses, and the server happens to
copy its data to another server just in case. I haven't switched over
since I last upgraded.

Having said that, I can see where you're coming from. My particular
configuration is unnecessarily complicated for testing this problem.
I shall configure some other boxes more straightforwardly and hammer
them. Are there any good nfs stress-tests you can suggest?

Yours,

Larry.

next prev parent reply	other threads:[~2013-07-26 16:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-25 13:45 nfs client: Now you see it, now you don't Larry Keegan
2013-07-25 14:11 ` nfs client: Now you see it, now you don't (aka spurious ESTALE errors) Jeff Layton
2013-07-25 14:24   ` Myklebust, Trond
2013-07-25 14:33     ` Jeff Layton
2013-07-25 14:41       ` Myklebust, Trond
2013-07-25 17:05   ` Larry Keegan
2013-07-25 18:18     ` Jeff Layton
2013-07-26 12:41       ` Larry Keegan
2013-07-26 13:12         ` Jeff Layton
2013-07-26 15:02           ` J. Bruce Fields
2013-07-26 22:25             ` Larry Keegan
2013-07-31 14:03               ` J. Bruce Fields
2013-07-31 19:50                 ` Larry Keegan
2013-07-31 20:35                   ` J. Bruce Fields
2013-07-26 16:10           ` Larry Keegan [this message]
2013-07-26 14:59     ` J. Bruce Fields
2013-07-26 23:21       ` Larry Keegan
2013-08-06 11:02         ` Larry Keegan
2013-08-06 11:14           ` Jeff Layton
2013-08-06 13:34             ` J. Bruce Fields
2013-08-06 15:38               ` Larry Keegan
2013-08-19 21:16       ` Bruce Guenter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130726161046.00c19730@cs3.al.itld \
    --to=lk@pfw.demon.co.uk \
    --cc=jlayton@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).