Re: Failover in NFS - Jesse Pollard

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jesse Pollard <pollard@admin.navo.hpc.mil>
To: Bill Davidsen <davidsen@tmr.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Failover in NFS
Date: Thu, 21 Nov 2002 16:52:52 -0600	[thread overview]
Message-ID: <200211211652.53065.pollard@admin.navo.hpc.mil> (raw)
In-Reply-To: <Pine.LNX.3.96.1021121154055.10456C-100000@gatekeeper.tmr.com>

On Thursday 21 November 2002 02:58 pm, Bill Davidsen wrote:
> On Mon, 18 Nov 2002, Jesse Pollard wrote:
> > It would actually be better to use two floating IP numbers. That way
> > during normal operation, both servers would be functioning simultaneously
> > (based on the shared storage on two nodes).
> >
> > Then during failover, the floating IP of the failed node is activated on
> > the remaining node (total of 3 IP numbers now, one real, two floating).
> > The NFS recovery cycle should then cause the clients to remount the
> > filesystem from the backup server.
> >
> > When the failed node is recovered, the active server should then disable
> > the floating IP associated with the recovered server, causing only the
> > mounts using that IP number to fall back to the proper node, balancing
> > the load again.
>
> That works for stateless connections, but for stateful connections like
> POP, NNTP, SMTP, etc, you will lose all the connections currently
> actively.

yes. That is the point. NFS v3/4 CAN use TCP connections. The only way
I know to force them back to the recovered server IS to kill the connection.

> A proper solution is the have the recovered server accept ESTABLISHED and
> --syn packets, then DNAT the rest to the fallback server, while the
> fallback server takes and new (--syn) packets and does DNAT to the
> recovered server.

ahhh no. that doesn't work. The current connections have to be terminated
since what you are describing sounds like a fallback to a fallback.

If you want something like this you have to perform load balancing at
a router (with NAT/DNAT)  where the load balancing implementation is
independant of the host. Then each host in the cluster (since there may
be more than two) has to inform the router of the current load (say once
every 5, 10, or 15 seconds). If you are in a high availability configuration,
I would expect that there would need to be at least two load balancing
routers (a primary and backup). Then if a router fails, the higher up network
router would select an alternate path which would end up at the backup load
balancer.

TCP context would be saved in that situation. Even traffic loads could
be balanced between the two routers. This works because the "state"
information is only source/destination routes for packets, not TCP.

If a host node fails (not a router), then NEW connections can be redirected.
Unfortunately, the context of existing connections to the failed host is lost.

> I'm not sure iptables can do this right, you probably need a program to
> get the DNAT part just correct. There may be some one of the experimental
> patches which adds that capability, since people do load balancing with
> Linux. It might take source routing, and certainly will be harder than
> just turning off the alias ;-)

I don't think the host itself CAN do it, since you then get into the case
of a destination also being a router. It also means the load really doesn't
get balanced since the host must still carry the load of forwarding traffic
to the real server.

This can get really nasty in a cluster if it becomes necessary to reboot
various nodes. Suddenly the nodes start forwarding traffic around and
not doing the work.

If you are describing the use of a Linux router, however, you are back
in the second discussion with the load balancing router, and I think
(based on other discussion, not personal knowlege) iptables might
do it, with a little user land assist (the load balancing computations
could dynamically change the iptables entries for destinations).

This all started without a load balancing router, and how to get NFS
to switch servers. This, I think, is not that complicated other than
the tricky IP enable/disable between the two servers; and does ASSUME
(yes I know - "ass out of you and me" :-) a stateless communication
protocol. 

-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: pollard@navo.hpc.mil

Any opinions expressed are solely my own.

next prev parent reply	other threads:[~2002-11-21 22:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-18 15:04 Failover in NFS Rashmi Agrawal
2002-11-18 15:44 ` Ragnar Kjørstad
2002-11-18 22:11   ` Jesse Pollard
2002-11-18 22:22     ` Ragnar Kjørstad
2002-11-18 22:41       ` Jesse Pollard
2002-11-18 22:51         ` Ragnar Kjørstad
2002-11-19  1:36       ` Michael Clark
2002-11-19  5:07         ` Rashmi Agrawal
2002-11-19  7:40           ` Michael Clark
2002-11-22  7:07       ` Rashmi Agrawal
2002-11-21 20:58     ` Bill Davidsen
2002-11-21 22:52       ` Jesse Pollard [this message]
2002-11-22 19:19         ` Gunther Mayer
2002-11-18 22:33 ` Jan Niehusmann
  -- strict thread matches above, loose matches on Subject: below --
2002-11-19 18:24 Juan Gomez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200211211652.53065.pollard@admin.navo.hpc.mil \
    --to=pollard@admin.navo.hpc.mil \
    --cc=davidsen@tmr.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox