From: Peter Rabbitson <rabbit@rabbit.us>
To: lartc@vger.kernel.org
Subject: Re: [LARTC] Multihome load balancing - kernel vs netfilter
Date: Mon, 14 May 2007 07:15:56 +0000 [thread overview]
Message-ID: <46480CAC.6050002@rabbit.us> (raw)
In-Reply-To: <4647FA30.5040401@rabbit.us>
Salim S I wrote:
>> -----Original Message-----
>> From: lartc-bounces@mailman.ds9a.nl
>> [mailto:lartc-bounces@mailman.ds9a.nl] On Behalf Of Peter Rabbitson
>> Sent: Monday, May 14, 2007 1:57 PM
>> To: lartc@mailman.ds9a.nl
>> Subject: [LARTC] Multihome load balancing - kernel vs netfilter
>>
>> Hi,
>> I have searched the archives on the topic, and it seems that the list
>> gurus favor load balancing to be done in the kernel as opposed to other
>> means. I have been using a home-grown approach, which splits traffic
>> based on `-m statistic --mode random --probability X`, then CONNMARKs
>> the individual connections and the kernel happily routes them. I
>> understand that for > 2 links it will become impractical to calculate a
>> correct X. But if we only have 2 gateways to the internet - are there
>> any advantages in letting the kernel multipath scheduler do the
>> balancing (with all the downsides of route caching), as opposed to the
>> pure random approach described above?
>
> I have thought about this approach, but, I think, this approach does not
> handle failover/dead-gateway-detection well. Because you need to alter
> all your netfilter routing rules if you find a link down. And then
> reconfigure again when the link comes up. I am interested to know how
> you handle that.
>
Certainly. What I am doing is NATing a large company network, which gets
load balanced and receives fail over protection. I also have a number of
services running on the router which must not be balanced nor failed
over, as they are expected to respond on a specific IP only. All
remaining traffic on the server itself is not balanced but fails over
when the designated primary link goes down.
I start with a simple pinger app, that pings several well known remote
sites once a minute using a large icmp packet (1k of payload). The rtt
times are averaged out and are used to calculate the current "quality"
of the link (the large packet makes congestion a visible factor). If one
of the interface responses is 0 (meaning not a single one of the pinged
hosts has responded) - the link is dead.
In iproute I have two separate tables, each using one of the links as
default gw, matching a certain mark. The default route is set to a
single gateway (not a multipath), either by hardcoding, or by using the
first input of the pinger (it can run without a default gw set,
explanation follows)
In iptables I have two user defined chains:
iptables -t mangle -A ISP1 -j CONNMARK --set-mark 11
iptables -t mangle -A ISP1 -j MARK --set-mark 11
iptables -t mangle -A ISP1 -j ACCEPT
iptables -t mangle -A ISP2 -j CONNMARK --set-mark 12
iptables -t mangle -A ISP2 -j MARK --set-mark 12
iptables -t mangle -A ISP2 -j ACCEPT
The rules that reference those chains are:
For all locally originating traffic:
iptables -t mangle -A OUTPUT -o $I1 -j ISP1
iptables -t mangle -A OUTPUT -o $I2 -j ISP2
For all incoming traffic from the internet:
iptables -t mangle -A PREROUTING -i $I1 -m state --state NEW -j ISP1
iptables -t mangle -A PREROUTING -i $I2 -m state --state NEW -j ISP2
For all other traffic (nat)
iptables -t mangle -A PREROUTING -m state --state NEW -m statistic
--mode random --probability $X -j ISP1
iptables -t mangle -A PREROUTING -j ISP2
At the end of the PREROUTING cain I have
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
The NATing is trivially solved by:
iptables -t nat -A POSTROUTING -s 10.0.58.0/24 -j SOURCE_NAT
iptables -t nat -A POSTROUTING -s 192.168.58.0/24 -j SOURCE_NAT
iptables -t nat -A POSTROUTING -s 192.168.8.0/24 -j SOURCE_NAT
iptables -t nat -A SOURCE_NAT -o $I1 -j SNAT --to $I1_IP
iptables -t nat -A SOURCE_NAT -o $I2 -j SNAT --to $I2_IP
What does this achieve:
* Local applications that have explicitly requested a specific IP to
bind to, will be routed over the corresponding interface and will stay
that way. Only applications binding to 0.0.0.0 will be routed by
consulting the default route.
* Responses to connections from the internet are guaranteed to leave
from the same interface they came in.
* All new connection not coming from the external interfaces are load
balanced by the weight of $X, and are again guaranteed to stay there for
the life of the connection, but another connection to the same host is
not guaranteed to go over the same link. This is important in a company
environment, since most employees use the same online resources.
On every run of the pinger I do the following:
* If both gateways are alive I replace the -m statistic rule, adjusting
the value of $X
* If one is detected dead, I adjust the probability accordingly (or
alternatively remove the statistic match altogether), and change the
default gateway if it is the one that failed.
So really the whole exercise revolves around changing a single rule (or
two rules, if you want to control the probability in a more fine-grained
way).
Last but not least this setup allowed me to program exception tables for
certain IP blocks. For instance Yahoo has a braindead two tier
authentication system for commercial solutions. It remembers the IP
which you used to login with first, and it must match the IP used to
login to a more secure area (using another password). Or users from
within the lan might want to use one of the ISPs SMTP servers, which
keeps a close eye on who is talking to it. So I have a $PREFERRED which
is adjusted to either ISP1 or ISP2, depending on the current state of
affairs, and rules like:
iptables -t mangle -A PREROUTING -d 66.218.64.0/19 -m state --state
NEW -j $PREFERRED
iptables -t mangle -A PREROUTING -d 68.142.192.0/18 -m state --state
NEW -j $PREFERRED
This pretty much sums it up. The only downside I can think of is that
loss of service can be observed between two runs of the pinger. Let me
know if I missed something be it critical or minor.
Thanks
Peter
_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
next prev parent reply other threads:[~2007-05-14 7:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-14 5:57 [LARTC] Multihome load balancing - kernel vs netfilter Peter Rabbitson
2007-05-14 6:07 ` Salim S I
2007-05-14 7:15 ` Peter Rabbitson [this message]
2007-05-14 8:23 ` Salim S I
2007-05-14 11:24 ` Peter Rabbitson
2007-05-22 3:28 ` Luciano Ruete
2007-05-29 6:16 ` Salim S I
2007-05-30 3:58 ` Salim S I
2007-05-30 4:55 ` Peter Rabbitson
2007-05-31 5:02 ` Salim S I
2007-06-02 3:27 ` Luciano Ruete
2007-06-05 6:48 ` Salim S I
2007-06-05 21:09 ` Alex Samad
2007-06-13 2:52 ` Luciano Ruete
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46480CAC.6050002@rabbit.us \
--to=rabbit@rabbit.us \
--cc=lartc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.