All of lore.kernel.org
 help / color / mirror / Atom feed
From: Manish Kathuria <manish@tuxspace.com>
To: lartc@vger.kernel.org
Subject: Re: [LARTC] Problems in Dead Gateway Detection / Failover -
Date: Mon, 30 Jan 2006 03:50:27 +0000	[thread overview]
Message-ID: <43DD8A33.9020305@tuxspace.com> (raw)
In-Reply-To: <43D8CEAE.3010006@tuxspace.com>

gypsy wrote:
> Manish Kathuria wrote:
> --= snip =--
> 
>>  However, if there is a problem in the ISP connectivity at any of the
>>subsequent hops, there is no dead gateway detection and failover also
>>does not take place. I have tested this on various linux kernels from
>>2.4 as well as 2.6 series.
>>
>>Somehow I have never faced a similar problem before and things have been
>>working perfectly. In real life situation here, the first hop gateway is
>>rarely going to be down so dead gateway detection and failover is going
>>to be required whenever there is some connectivity problem at any of the
>>later hops. So that's where dead gateway detection needs to work.
>>
>>What could be the reason ? How can this be resolved ? I would appreciate
>>any pointers or suggestions.
>>
>>Thanks,
>>
>>Manish Kathuria
> 
> 
> Manish,
> 
> Same here (a long time ago.  I no longer have multiple ISPs).
> 
> I don't have any answers for you, but here are a few pointers:

Thanks for your mail. I wil try out the suggestions given by you.

> 
> Use arping in a script, pinging the farthest hop that arping can reach
> that is of interest.  Whenever arping returns a bad status, run 'ip
> route flush cache'.  Put a nice long sleep in the script and run it all
> the time. >
> Perhaps in that same script, 'ping -n1 -I' each WAN interface in turn to
> some destination that must always be up but reachable only by/on that
> interface.  Run 'ip route flush cache' whenever that ping fails.

The only thing is whether by doing this the kernel would be able to mark 
the gateway having bad status as down or not. If it does not any other 
intervention, then its really superb.

> 
> You are just trying to detect the up or down status of the link, so
> don't flood the connection with arping and ping packets.  Using sleep,
> space those pings apart to something sensible.

I was thinking of writing a daemon which will ping a remote host through 
each of the WAN interfaces every 5 seconds. If one of them gives a bad 
status response continuosly for 8-10 times, the default route will be 
changed to the other ISP's gateway and if the status changes again, it 
will be restored back to the load balanced multipath state.

Will have to actually try and see which method fits in better here and 
is more elegant. If your suggestion works, its perhaps the best way out.

> 
> Although Julian has never confirmed (or denied) this, it was my
> experience that only the **__FIRST__** nexhop affected the up or down
> status of the connection.  If that succeeded, nothing would flag the
> connection as dead.  If you know C, perhaps you can examine Julian's
> kernel patch to see if there is any useful information there.  In my
> opinion, Julian should document exactly how DGD works.  Perhaps he has
> and I just can't find it on his web site, but (when I cared), I was not
> able to find anything useful there.

There are excellent documents at http://www.ssi.bg/~ja/dgd-usage.txt and 
http://www.ssi.bg/~ja/nano.txt which have explained it very well. 
Quoting from the dgd-usage.txt document here ...


---Begin Quote---

* the alternative routes check the neighbour state not only for gateways
but  for hosts, i.e. for any kind of neighbours. Note that in some cases
the  neighbour  can remain  in reachable  state  while its  nexthops are
failed.   For example, it is even possible the gateway to be a proxy ARP
server  and the gateway IP to remain  always in reachable state. In such
case we can not notice the real state of the gateway's IP.

* the alternative routes can be a list from unipath or multipath routes,
using  NOARP  and  ARP devices.  As  result,  the first  alive  or first
suspected  (but not dead)  route is selected by  inspecting the state of
the gateways in each path or the neighbours through the used device from
the path.

* as  result we take care of the state of each path in a multipath route
and  we  try to  use  only the  alive  paths considering  their relative
weights

---End Quote---

In the current situaion I am dealing with, the firsthop gateway is 
always reachable. It is only the subsequent hops which can go down. And 
when that happens, the dead gateway detection doesnt work, the outgoing 
traffic keeps on going out through the dead ISP's WAN interface. But 
what confuses me is that DGD does work for one of the ISPs which is also 
identically connected.

Could running routed / gated play a role here in resolving this problem ?

> 
> Have you tried to engage Julian in a conversation to resolve this?  He
> posts here occasionally but I do not know if he answers questions about
> DGD off this list.

I have not done it so far.

> --
> gypsy
> 

Thanks once again for your suggestions.

--
Manish Kathuria
_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

  parent reply	other threads:[~2006-01-30  3:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-26 13:41 [LARTC] Problems in Dead Gateway Detection / Failover - Multiple Manish Kathuria
2006-01-29 19:50 ` [LARTC] Problems in Dead Gateway Detection / Failover - gypsy
2006-01-30  3:50 ` Manish Kathuria [this message]
2006-04-15 13:58 ` [LARTC] Problems in Dead Gateway Detection / Failover - Multiple Eduardo Fernández
2006-04-17  7:14 ` Re:[LARTC] Problems in Dead Gateway Detection / Failover - Shashikant Mundlik
2006-04-17 14:01 ` [LARTC] Problems in Dead Gateway Detection / Failover Alessandro Ren
2006-04-17 15:16 ` Alessandro Ren
2006-04-17 15:22 ` Shashikant Mundlik
2006-04-17 15:52 ` Shashikant Mundlik
2006-04-17 16:30 ` [LARTC] Problems in Dead Gateway Detection / LinuXKiD
2006-04-17 17:11 ` [LARTC] Problems in Dead Gateway Detection / Failover Alessandro Ren
2006-04-21  1:49 ` [LARTC] Problems in Dead Gateway Detection / Failover - Multiple Manish Kathuria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43DD8A33.9020305@tuxspace.com \
    --to=manish@tuxspace.com \
    --cc=lartc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.