Linux Netfilter discussions
 help / color / mirror / Atom feed
* load balance [OT?] [Solution]
@ 2008-06-26 15:05 Martin
  2008-06-29  2:00 ` Grant Taylor
  0 siblings, 1 reply; 8+ messages in thread
From: Martin @ 2008-06-26 15:05 UTC (permalink / raw)
  To: netfilter

Hello netfilter list!

About a month ago, I've posted a question about load balance. Finally
it's working and in production environment.

I want to thanks the hole list and specially AcrosCom and Mark Perry,
and a special thanks to Julian Anastasov who takes time to reply my
mails.

This is what I've done.

Start testing Julian's "combinations" patches adding by one to kernel
and testing load balance.

Vanilla kernel 2.6.23
 |
 |____ +arprules = Load balance ok after reboot
  |
  |_____+routes = Load balance ok after reboot
   |
   |_ +jumbo patch hidden & forward_shared= Load balance ok after reboot
    |
    |__ +rp_filter_mask= *

* Here comes the difference. At reboot it goes out only to the first
"nexthop", the second is used only if 1st gateway is down or saturated.


Then added "send-to-self" patch, but nothing changes about load balance,
just the first nexthop is used.


Here is where I contacted Julian Anastasov for some help. He told me the
key was to ping both gateways and keep pinging then each few minutes to
know they where still alive. Then try load balance.


That was it, pinging gateways before load balance script and pinging
them every 10 minutes do the trick.


Thanks again to everyone and ask me if there is something I'm omitting.

See you all in the list!


Cheers,


Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-26 15:05 load balance [OT?] [Solution] Martin
@ 2008-06-29  2:00 ` Grant Taylor
  2008-06-30 13:32   ` Martin
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Taylor @ 2008-06-29  2:00 UTC (permalink / raw)
  To: Mail List - Netfilter

On 6/26/2008 10:05 AM, Martin wrote:
> That was it, pinging gateways before load balance script and pinging 
> them every 10 minutes do the trick.

How well will pinging work if the kernel's DGD detects the gateway to be 
dead?  I'd be tempted to use an ARP ping to be sure that the kernel's 
routing table is not re-routing the packets out a different way.



Grant. . . .


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-29  2:00 ` Grant Taylor
@ 2008-06-30 13:32   ` Martin
  2008-06-30 14:26     ` Grant Taylor
  0 siblings, 1 reply; 8+ messages in thread
From: Martin @ 2008-06-30 13:32 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Mail List - Netfilter

On Sat, 2008-06-28 at 21:00 -0500, Grant Taylor wrote:
> On 6/26/2008 10:05 AM, Martin wrote:
> > That was it, pinging gateways before load balance script and pinging 
> > them every 10 minutes do the trick.
> 
> How well will pinging work if the kernel's DGD detects the gateway to be 
> dead?  I'd be tempted to use an ARP ping to be sure that the kernel's 
> routing table is not re-routing the packets out a different way.
> 
> 
> 
> Grant. . . .

That's right, the ping part is to keep gateways in the arp table, so
arpinging them'd be the same as normal ping for the case. Probably (I
didn't tested it) adding MAC and IP with arp should work too.

May be I'm missing something, but what do you mean with "kernel's DGD"?

Cheers.


Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-30 13:32   ` Martin
@ 2008-06-30 14:26     ` Grant Taylor
  2008-06-30 15:34       ` Martin
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Taylor @ 2008-06-30 14:26 UTC (permalink / raw)
  To: Mail List - Netfilter

On 06/30/08 08:32, Martin wrote:
> That's right, the ping part is to keep gateways in the arp table, so 
> arpinging them'd be the same as normal ping for the case. Probably (I 
> didn't tested it) adding MAC and IP with arp should work too.

(See below.)

> May be I'm missing something, but what do you mean with "kernel's 
> DGD"?

DGD or Dead Gateway Detection, is a mechanism built in to the kernel 
that allows the kernel to detect if a gateway / router to a given (set 
of) destination(s) is no longer functioning and subsequently fall back 
to a different gateway.

I have had the following test network in place and experienced failures 
using standard pings in a very weird way.


        +-----+ A.1                       A.2 +-----+
  C.254 |    0+-----(Switch)-----(Switch)-----+0    | D.254
-------+1 A  |                               |  B 1+-------
        |    2+-------------------------------+2    +
        +-----+ B.5                       B.6 +-----+

(Think of this configuration as two separate buildings (C & D) with two 
different connections tying them together, one slow wireless (B) and one 
new Metro Ether (A).  As far as the A and B systems are concerned these 
are just simple ethernet connections.)

I had the above scenario set up between two buildings with the A system 
pinging both of the B systems interfaces (A.2 and B.2).  If I 
disconnected the cable connecting between the two switches both system A 
and B would still have link between them selves and their respective 
switch, however the channel between the two systems would be non functional.

After about 45 - 90 seconds (depending on how tings were configured) the 
kernels on either system would realize that the link between systems A 
and B using the A (Metro Ether) network was down and fall back to 
routing all traffic out over the B (wireless) network.  So when system A 
pinged the A.2 interface on the B system the traffic would go out across 
the wireless network, loop through the B system and hit the A.2 
interface on the B system.

Where as if I used arping to do the testing, the kernel's routing table 
(and thus DGD) was ignored giving an accurate test of the link state 
even if the kernel had routed around the link failure between the switches.

(The above configuration used a stock kernel with two equal routes 
(metric of 0) entered in reverse priority to get things to work.)



Grant. . . .

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-30 14:26     ` Grant Taylor
@ 2008-06-30 15:34       ` Martin
  2008-06-30 15:58         ` Grant Taylor
  0 siblings, 1 reply; 8+ messages in thread
From: Martin @ 2008-06-30 15:34 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Mail List - Netfilter

On Mon, 2008-06-30 at 09:26 -0500, Grant Taylor wrote:
> On 06/30/08 08:32, Martin wrote:
> > That's right, the ping part is to keep gateways in the arp table, so 
> > arpinging them'd be the same as normal ping for the case. Probably (I 
> > didn't tested it) adding MAC and IP with arp should work too.
> 
> (See below.)
> 
> > May be I'm missing something, but what do you mean with "kernel's 
> > DGD"?
> 
> DGD or Dead Gateway Detection, is a mechanism built in to the kernel 
> that allows the kernel to detect if a gateway / router to a given (set 
> of) destination(s) is no longer functioning and subsequently fall back 
> to a different gateway.
> 
> I have had the following test network in place and experienced failures 
> using standard pings in a very weird way.
> 
> 
>         +-----+ A.1                       A.2 +-----+
>   C.254 |    0+-----(Switch)-----(Switch)-----+0    | D.254
> -------+1 A  |                               |  B 1+-------
>         |    2+-------------------------------+2    +
>         +-----+ B.5                       B.6 +-----+
> 
> (Think of this configuration as two separate buildings (C & D) with two 
> different connections tying them together, one slow wireless (B) and one 
> new Metro Ether (A).  As far as the A and B systems are concerned these 
> are just simple ethernet connections.)
> 
> I had the above scenario set up between two buildings with the A system 
> pinging both of the B systems interfaces (A.2 and B.2).  If I 
> disconnected the cable connecting between the two switches both system A 
> and B would still have link between them selves and their respective 
> switch, however the channel between the two systems would be non functional.
> 
> After about 45 - 90 seconds (depending on how tings were configured) the 
> kernels on either system would realize that the link between systems A 
> and B using the A (Metro Ether) network was down and fall back to 
> routing all traffic out over the B (wireless) network.  So when system A 
> pinged the A.2 interface on the B system the traffic would go out across 
> the wireless network, loop through the B system and hit the A.2 
> interface on the B system.
> 
> Where as if I used arping to do the testing, the kernel's routing table 
> (and thus DGD) was ignored giving an accurate test of the link state 
> even if the kernel had routed around the link failure between the switches.
> 
> (The above configuration used a stock kernel with two equal routes 
> (metric of 0) entered in reverse priority to get things to work.)

Well, I don't know if understood it ok, but I'll try to answer it.


Ping work pretty well for me. Ping each gateway and if them responds and
if them are not saturated, load balance works great.
If one gateway doesn't responds or it's saturated, traffic switch to the
other interface.


About the loop, if one of the gateways goes down, but it still responds
because it passes traffic between both nics internally in servers, you
can test some things.

1) try to block traffic between cards with "iptables -i eth1 -o
eth2" (not tested, but it can work)

2) do it with ip route. Configure each card with a route and a ip range
different to each one. Do it in a different table and play with some
prohibit rules. You can make an idea with this document:
http://ssi.bg/~ja/nano.txt

3) need the servers connected go further than themselves? If not you can
set off ip_forward.


Please, keep us updated about your tests or if you can solve it. May be
a different thread would be better to keep a track of this.


Cheers


Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-30 15:34       ` Martin
@ 2008-06-30 15:58         ` Grant Taylor
  2008-06-30 16:26           ` Martin
  0 siblings, 1 reply; 8+ messages in thread
From: Grant Taylor @ 2008-06-30 15:58 UTC (permalink / raw)
  To: Mail List - Netfilter

On 06/30/08 10:34, Martin wrote:
> Well, I don't know if understood it ok, but I'll try to answer it.

*nod*

> Ping work pretty well for me. Ping each gateway and if them responds 
> and if them are not saturated, load balance works great. If one 
> gateway doesn't responds or it's saturated, traffic switch to the 
> other interface.

Ok.

> About the loop, if one of the gateways goes down, but it still 
> responds because it passes traffic between both nics internally in 
> servers, you can test some things.

Ok, you are deciding to let the loop happen and just detect that it is 
and / or prevent it from happening.  I guess that is another way to 
solve the problem.  IMHO, not having the loop happen is better, but what 
ever works for you.

> Please, keep us updated about your tests or if you can solve it. May 
> be a different thread would be better to keep a track of this.

Um, I'm not testing, that was based on what I have done in the past.



Grant. . . .

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-30 15:58         ` Grant Taylor
@ 2008-06-30 16:26           ` Martin
  2008-06-30 17:51             ` Grant Taylor
  0 siblings, 1 reply; 8+ messages in thread
From: Martin @ 2008-06-30 16:26 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Mail List - Netfilter

On Mon, 2008-06-30 at 10:58 -0500, Grant Taylor wrote:
> On 06/30/08 10:34, Martin wrote:
> > Well, I don't know if understood it ok, but I'll try to answer it.
> 
> *nod*
> 
> > Ping work pretty well for me. Ping each gateway and if them responds 
> > and if them are not saturated, load balance works great. If one 
> > gateway doesn't responds or it's saturated, traffic switch to the 
> > other interface.
> 
> Ok.
> 
> > About the loop, if one of the gateways goes down, but it still 
> > responds because it passes traffic between both nics internally in 
> > servers, you can test some things.
> 
> Ok, you are deciding to let the loop happen and just detect that it is 
> and / or prevent it from happening.  I guess that is another way to 
> solve the problem.  IMHO, not having the loop happen is better, but what 
> ever works for you.
> 
> > Please, keep us updated about your tests or if you can solve it. May 
> > be a different thread would be better to keep a track of this.
> 
> Um, I'm not testing, that was based on what I have done in the past.


In my case, loop won't happen, both gateways are internal of internet
carriers.


Sorry for the missed response, for some reason I think you were asking
me.

And that's right, an arping can be more accurate for this cases.


Thanks for the tip, and really sorry for the misunderstood of your mail.

Cheers.

Martin


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: load balance [OT?] [Solution]
  2008-06-30 16:26           ` Martin
@ 2008-06-30 17:51             ` Grant Taylor
  0 siblings, 0 replies; 8+ messages in thread
From: Grant Taylor @ 2008-06-30 17:51 UTC (permalink / raw)
  To: Mail List - Netfilter

On 06/30/08 11:26, Martin wrote:
> In my case, loop won't happen, both gateways are internal of internet 
> carriers.

Ok.

> Sorry for the missed response, for some reason I think you were 
> asking me.

Ah.  No problem.  I just wanted to make sure you weren't waiting on 
something from me.

> And that's right, an arping can be more accurate for this cases.

Arping will be sure you are testing the local segment, but pings can 
test further as long as you take in to account the possibility of looping.

> Thanks for the tip, and really sorry for the misunderstood of your 
> mail.

*nod*



Grant. . . .

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-06-30 17:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-26 15:05 load balance [OT?] [Solution] Martin
2008-06-29  2:00 ` Grant Taylor
2008-06-30 13:32   ` Martin
2008-06-30 14:26     ` Grant Taylor
2008-06-30 15:34       ` Martin
2008-06-30 15:58         ` Grant Taylor
2008-06-30 16:26           ` Martin
2008-06-30 17:51             ` Grant Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox