Re: server does not abort grace period

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Ferenc Wagner <wferi@niif.hu>
Cc: linux-nfs@vger.kernel.org
Subject: Re: server does not abort grace period
Date: Mon, 21 Feb 2011 20:11:44 -0500	[thread overview]
Message-ID: <20110222011144.GA18424@fieldses.org> (raw)
In-Reply-To: <87mxlpw4cv.fsf@tac.ki.iif.hu>

On Mon, Feb 21, 2011 at 08:54:24PM +0100, Ferenc Wagner wrote:
> Ferenc Wagner <wferi@niif.hu> writes:
> 
> > We're running 2.6.32 (Debian squeeze) NFS4 server and clients.  The
> > server boots and runs purely from SAN, so we can start it on different
> > computers.  In case of such "hardware failovers" I'd expect the clients
> > to quickly reclaim their locks (if any) and thus the server to abort
> > it's 90-second grace period early.  However, this does not happen,
> > ruining our HA like, totally.
> >
> > So, the questions: is the functionality of aborting the grace period
> > early missing from version 2.6.32 of the Linux kernel?  If yes, is it
> > present in any kernel version?  If it should work, could someone offer
> > some advice on debugging it?  If it isn't supported, what's the
> > best practice of providing highly available NFSv4 today?
> 
> Hi,
> 
> Could somebody please share any related wisdom?  Pretty please?
> In short, how to fight grace period in a HA NFS4 setup?
> Decreasing it (of course after cutting the lock lease time) seems a
> rather big hammer, I'd like to avoid using it if reasonably possible.

The NFSv4.0 protocol doesn't provide any way for clients to tell the
server that they have finished recovering; as long as *any* clients held
state on the previous server instance, the new server is stuck waiting
out the whole grace period.  Some things we could do:

	- We could at least recognize the case where *no* clients held
	  state before, and end the grace period early in that case.
	- In the NFSv4.1 case there is a "reclaim complete" rpc that
	  clients are required to send.  Currently we don't take
	  advantage of that to end the grace period early, but we
	  should.  That's no help for 4.0 clients.
	- We could record a count of all locks/opens held in stable
	  storage and use that to decide when a client is done
	  recovering.  That would be complicated and risk slowing down
	  normal opens and locks a lot.

In short, it's hard.

I don't think decreasing the lease time would be so terrible.  Perhaps
the default should even be a little less.

--b.

next prev parent reply	other threads:[~2011-02-22  1:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-11 12:18 server does not abort grace period Ferenc Wagner
2011-02-21 19:54 ` Ferenc Wagner
2011-02-22  1:11   ` J. Bruce Fields [this message]
2011-02-22 17:05     ` Ferenc Wagner
2011-02-23 19:52       ` J. Bruce Fields
2011-02-24 17:06         ` Ferenc Wagner
2011-02-24 17:30           ` J. Bruce Fields
2011-02-25 16:51             ` Ferenc Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110222011144.GA18424@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=wferi@niif.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).