From: Jeff Layton <jlayton@redhat.com>
To: Wendy Cheng <s.wendy.cheng@gmail.com>
Cc: linux-nfs@vger.kernel.org, lhh@redhat.com, nfsv4@linux-nfs.org,
nhorman@redhat.com
Subject: Re: rapid clustered nfs server failover and hung clients -- how best to close the sockets?
Date: Mon, 9 Jun 2008 13:24:25 -0400 [thread overview]
Message-ID: <20080609132425.5144557b@tleilax.poochiereds.net> (raw)
In-Reply-To: <484D6510.2010109@gmail.com>
On Mon, 09 Jun 2008 13:14:56 -0400
Wendy Cheng <s.wendy.cheng@gmail.com> wrote:
> Jeff Layton wrote:
> > The problem we've run into is that occasionally they fail over to the
> > alternate machine and then back very rapidly.
>
> It is a well known issue in the NFS-TCP failover arena (or more
> specifically, for floating IP applications) that failover from server A
> to server B, then immediately failing back from server B to A would
> *not* work well. IIRC last round of discussing with Red Hat GPS and
> support folks, we concluded that most of the applications/users *can*
> tolerate this restriction.
>
> Maybe another more basic question: "other than QA efforts, are there
> real NFSv2/v3 applications depending on this "feature" ? Or there may
> need tons of efforts for something that will not have much usages when
> it is finally delivered ?
>
Certainly a valid question...
While rapid failover like this is unusual, it's easily possible for a
sysadmin to do it. Maybe they moved the wrong service, or their downtime
was for something very brief but the service had to be off of the host to
make the change. In that case, a quick failover and back could easily
be something that happens in a real environment.
As to whether it's worth a ton of effort, that's a tough call. People want
HA services to guard against outages. Anything that jeopardizes that is
probably worth fixing. This could be solved with documentation, but a note
like:
"Be sure to wait for X minutes between failovers"
...wouldn't instill me with a lot of confidence. We'd have to have
some sort of mechanism to enforce this, and that would be less than
ideal.
IMO, the ideal thing would be to make sure that the "old" server is
ready to pick up the service again as soon as possible after the service
leaves it.
--
Jeff Layton <jlayton@redhat.com>
_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
next prev parent reply other threads:[~2008-06-09 17:24 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-09 14:31 rapid clustered nfs server failover and hung clients -- how best to close the sockets? Jeff Layton
[not found] ` <20080609103137.2474aabd-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-09 15:03 ` Peter Staubach
2008-06-09 15:18 ` Jeff Layton
[not found] ` <20080609111821.6e06d4f8-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-09 15:31 ` Neil Horman
2008-06-09 15:43 ` Jeff Layton
[not found] ` <RTPCLUEXC1-PRDOLZCH000001d2-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
[not found] ` <20080609120110.1fee7221-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
[not found] ` <RTPCLUEXC1-PRDF8Eqf000001d4-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
[not found] ` <20080609122249.51767b21-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-09 16:40 ` Talpey, Thomas
2008-06-09 16:46 ` Jeff Layton
2008-06-09 18:03 ` J. Bruce Fields
2008-06-09 17:14 ` J. Bruce Fields
2008-06-09 15:51 ` Talpey, Thomas
2008-06-09 16:01 ` Jeff Layton
2008-06-09 16:03 ` Neil Horman
2008-06-09 16:09 ` Talpey, Thomas
2008-06-09 16:22 ` Jeff Layton
2008-06-09 19:36 ` Chuck Lever
2008-06-09 20:11 ` Jeff Layton
2008-06-09 20:56 ` Chuck Lever
2008-06-09 15:23 ` Neil Horman
2008-06-09 15:37 ` Peter Staubach
2008-06-09 15:49 ` Jeff Layton
[not found] ` <20080609114909.131cfaef-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-09 16:01 ` Chuck Lever
2008-06-09 16:04 ` Neil Horman
2008-06-09 15:46 ` Chuck Lever
2008-06-09 16:00 ` Peter Staubach
2008-06-09 16:24 ` Neil Horman
2008-06-09 15:51 ` J. Bruce Fields
2008-06-09 16:02 ` Jeff Layton
2008-06-09 17:23 ` J. Bruce Fields
2008-06-09 19:10 ` Jeff Layton
2008-06-09 20:19 ` Lon Hohberger
2008-06-09 17:14 ` Wendy Cheng
2008-06-09 17:24 ` Jeff Layton [this message]
2008-06-09 17:51 ` Talpey, Thomas
2008-06-09 17:59 ` Talpey, Thomas
2008-06-09 19:01 ` Jeff Layton
2008-06-09 19:13 ` Talpey, Thomas
2008-06-09 18:10 ` Neil Horman
2008-06-09 18:07 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080609132425.5144557b@tleilax.poochiereds.net \
--to=jlayton@redhat.com \
--cc=lhh@redhat.com \
--cc=linux-nfs@vger.kernel.org \
--cc=nfsv4@linux-nfs.org \
--cc=nhorman@redhat.com \
--cc=s.wendy.cheng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.