Re: [LARTC] Multihome load balancing - kernel vs netfilter

From: Peter Rabbitson <rabbit@rabbit.us>
To: lartc@vger.kernel.org
Subject: Re: [LARTC] Multihome load balancing - kernel vs netfilter
Date: Mon, 14 May 2007 11:24:31 +0000	[thread overview]
Message-ID: <464846EF.3080109@rabbit.us> (raw)
In-Reply-To: <4647FA30.5040401@rabbit.us>

Answer inlined:

Salim S I wrote:
>     iptables -t mangle -A PREROUTING -j ISP2
> 
> Doesn't it need to check for state NEW? Or packets will not reach the
> restore-mark rule.

Of course, and the real script does check. I typed this line manually
because the copy cut it, and missed the obvious check.

> You may have to manually populate the routing tables when an interface
> comes up, after being down for some time. (Kernel would have removed the
> routing entries for this interface after it found the interface down.
> This happens only if its nexthop is down)

This is what I can't really understand (and it applies to DGD as well) -
how often in real life does someone yank a cable out, so an interface
will go down? In over 7 years of dealing with various ISPs I have never
seen the link go so dead, that the kernel will down the interface and
remove all associated routing information. What I have seen on the other
hand is the link dying at the 2nd or 3rd hop, which (if I understand
correctly) DGD simply can not detect. Correct me if my assumption is wrong.

> I tend to favor this approach, because it is more flexible in selecting
> the interface. You can use different weights/probability depending on
> different factors. I have seen a variation of this method, used with
> 'recent' (-m recent) match, instead of CONNMARK.

I see. But recent would have a "caching effect", and from what I
understand is heavier on the kernel, unlike the CONNMARK which hooks
into the conntrack which in turn has to track connections either way.

> The only downside in using this method, as far as I can see, is the need
> to reconfigure rules and routing tables, in case of a failure/coming-up.
> But lately, I have found that even with multipath method, there IS a
> need for reconfiguration.

Got you. This pretty much answers my original question. Thank you for
your time.

> -----Original Message-----
> From: lartc-bounces@mailman.ds9a.nl
> [mailto:lartc-bounces@mailman.ds9a.nl] On Behalf Of Peter Rabbitson
> Sent: Monday, May 14, 2007 3:16 PM
> To: lartc@mailman.ds9a.nl
> Subject: Re: [LARTC] Multihome load balancing - kernel vs netfilter
> 
> Salim S I wrote:
>>> -----Original Message-----
>>> From: lartc-bounces@mailman.ds9a.nl
>>> [mailto:lartc-bounces@mailman.ds9a.nl] On Behalf Of Peter Rabbitson
>>> Sent: Monday, May 14, 2007 1:57 PM
>>> To: lartc@mailman.ds9a.nl
>>> Subject: [LARTC] Multihome load balancing - kernel vs netfilter
>>>
>>> Hi,
>>> I have searched the archives on the topic, and it seems that the list
>>> gurus favor load balancing to be done in the kernel as opposed to
> other
>>> means. I have been using a home-grown approach, which splits traffic
>>> based on `-m statistic --mode random --probability X`, then CONNMARKs
>>> the individual connections and the kernel happily routes them. I
>>> understand that for > 2 links it will become impractical to calculate
> a
>>> correct X. But if we only have 2 gateways to the internet - are there
>>> any advantages in letting the kernel multipath scheduler do the
>>> balancing (with all the downsides of route caching), as opposed to
> the
>>> pure random approach described above?
>> I have thought about this approach, but, I think, this approach does
> not
>> handle failover/dead-gateway-detection well. Because you need to alter
>> all your netfilter routing rules if you find a link down. And then
>> reconfigure again when the link comes up. I am interested to know how
>> you handle that.
>>
> 
> Certainly. What I am doing is NATing a large company network, which gets
> load balanced and receives fail over protection. I also have a number of
> services running on the router which must not be balanced nor failed
> over, as they are expected to respond on a specific IP only. All
> remaining traffic on the server itself is not balanced but fails over
> when the designated primary link goes down.
> 
> I start with a simple pinger app, that pings several well known remote
> sites once a minute using a large icmp packet (1k of payload). The rtt
> times are averaged out and are used to calculate the current "quality"
> of the link (the large packet makes congestion a visible factor). If one
> of the interface responses is 0 (meaning not a single one of the pinged
> hosts has responded) - the link is dead.
> 
> In iproute I have two separate tables, each using one of the links as
> default gw, matching a certain mark. The default route is set to a
> single gateway (not a multipath), either by hardcoding, or by using the
> first input of the pinger (it can run without a default gw set,
> explanation follows)
> 
> In iptables I have two user defined chains:
>     iptables -t mangle -A ISP1 -j CONNMARK --set-mark 11
>     iptables -t mangle -A ISP1 -j MARK --set-mark 11
>     iptables -t mangle -A ISP1 -j ACCEPT
> 
>     iptables -t mangle -A ISP2 -j CONNMARK --set-mark 12
>     iptables -t mangle -A ISP2 -j MARK --set-mark 12
>     iptables -t mangle -A ISP2 -j ACCEPT
> 
> The rules that reference those chains are:
> 
> For all locally originating traffic:
>     iptables -t mangle -A OUTPUT -o $I1 -j ISP1
>     iptables -t mangle -A OUTPUT -o $I2 -j ISP2
> 
> For all incoming traffic from the internet:
>     iptables -t mangle -A PREROUTING -i $I1 -m state --state NEW -j ISP1
>     iptables -t mangle -A PREROUTING -i $I2 -m state --state NEW -j ISP2
> 
> For all other traffic (nat)
>     iptables -t mangle -A PREROUTING -m state --state NEW -m statistic
> --mode random --probability $X -j ISP1
>     iptables -t mangle -A PREROUTING -j ISP2
> 
> At the end of the PREROUTING cain I have
>     iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
> 
> The NATing is trivially solved by:
>     iptables -t nat -A POSTROUTING -s 10.0.58.0/24 -j SOURCE_NAT
>     iptables -t nat -A POSTROUTING -s 192.168.58.0/24 -j SOURCE_NAT
>     iptables -t nat -A POSTROUTING -s 192.168.8.0/24 -j SOURCE_NAT
> 
>     iptables -t nat -A SOURCE_NAT -o $I1 -j SNAT --to $I1_IP
>     iptables -t nat -A SOURCE_NAT -o $I2 -j SNAT --to $I2_IP
> 
> 
> What does this achieve:
> * Local applications that have explicitly requested a specific IP to
> bind to, will be routed over the corresponding interface and will stay
> that way. Only applications binding to 0.0.0.0 will be routed by
> consulting the default route.
> * Responses to connections from the internet are guaranteed to leave
> from the same interface they came in.
> * All new connection not coming from the external interfaces are load
> balanced by the weight of $X, and are again guaranteed to stay there for
>  the life of the connection, but another connection to the same host is
> not guaranteed to go over the same link. This is important in a company
> environment, since most employees use the same online resources.
> 
> On every run of the pinger I do the following:
> * If both gateways are alive I replace the -m statistic rule, adjusting
> the value of $X
> * If one is detected dead, I adjust the probability accordingly (or
> alternatively remove the statistic match altogether), and change the
> default gateway if it is the one that failed.
> 
> So really the whole exercise revolves around changing a single rule (or
> two rules, if you want to control the probability in a more fine-grained
> way).
> 
> Last but not least this setup allowed me to program exception tables for
> certain IP blocks. For instance Yahoo has a braindead two tier
> authentication system for commercial solutions. It remembers the IP
> which you used to login with first, and it must match the IP used to
> login to a more secure area (using another password). Or users from
> within the lan might want to use one of the ISPs SMTP servers, which
> keeps a close eye on who is talking to it. So I have a $PREFERRED which
> is adjusted to either ISP1 or ISP2, depending on the current state of
> affairs, and rules like:
>     iptables -t mangle -A PREROUTING -d 66.218.64.0/19 -m state --state
> NEW -j $PREFERRED
>     iptables -t mangle -A PREROUTING -d 68.142.192.0/18 -m state --state
> NEW -j $PREFERRED
> 
> This pretty much sums it up. The only downside I can think of is that
> loss of service can be observed between two runs of the pinger. Let me
> know if I missed something be it critical or minor.
> 
> Thanks
> 
> Peter
> _______________________________________________
> LARTC mailing list
> LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc
> 
> 
> _______________________________________________
> LARTC mailing list
> LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc