netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pablo Neira Ayuso <pablo@netfilter.org>
To: Patrick McHardy <kaber@trash.net>
Cc: netfilter-devel@vger.kernel.org
Subject: Re: [PATCH] netfilter: xtables: add cluster match
Date: Mon, 16 Feb 2009 15:01:48 +0100	[thread overview]
Message-ID: <499971CC.6040903@netfilter.org> (raw)
In-Reply-To: <49994643.8010001@trash.net>

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> This patch adds the iptables cluster match. This match can be used
>> to deploy gateway and back-end load-sharing clusters.
> 
> I'm mixing comments to the cluster match and the ARP mangle target.
> 
>> Assuming that all the nodes see all packets (see below for an
>> example on how to do that if your switch does not allow this), the
>> cluster match decides if this node has to handle a packet given:
>>
>>     jhash(source IP) % total_nodes == node_id
>>
>> For related connections, the master conntrack is used. The following
>> is an example of its use to deploy a gateway cluster composed of two
>> nodes (where this is the node 1):
>>
>> iptables -I PREROUTING -t mangle -i eth1 -m cluster \
>>     --cluster-total-nodes 2 --cluster-local-node 1 \
>>     --cluster-proc-name eth1 -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth1 \
>>     -m mark ! --mark 0xffff -j DROP
>> iptables -A PREROUTING -t mangle -i eth2 -m cluster \
>>     --cluster-total-nodes 2 --cluster-local-node 1 \
>>     --cluster-proc-name eth2 -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth2 \
>>     -m mark ! --mark 0xffff -j DROP
>>
>> And the following commands to make all nodes see the same packets:
>>
>> ip maddr add 01:00:5e:00:01:01 dev eth1
>> ip maddr add 01:00:5e:00:01:02 dev eth2
>> arptables -I OUTPUT -o eth1 --h-length 6 \
>>     -j mangle --mangle-mac-s 01:00:5e:00:01:01
>> arptables -I INPUT -i eth1 --h-length 6 \
>>     --destination-mac 01:00:5e:00:01:01 \
>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
> 
> Mhh, is the saving of one or two characters really worth these
> deviations from the kind-of established naming scheme? Its hard
> to remember all these minor differences in my opinion.

Hm, you mean the name "mangle" or the name of the option 
"--mangle-mac-d"? This is what we currently have in kernel mainline and 
arptables userspace, it's not my fault :). I can send you a patch to fix 
it with a consistent naming without breaking backward compatibility both 
in kernel and user-space.

>> arptables -I OUTPUT -o eth2 --h-length 6 \
>>     -j mangle --mangle-mac-s 01:00:5e:00:01:02
>> arptables -I INPUT -i eth2 --h-length 6 \
>>     --destination-mac 01:00:5e:00:01:02 \
>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
>>
>> In the case of TCP connections, pickup facility has to be disabled
>> to avoid marking TCP ACK packets coming in the reply direction as
>> valid.
>>
>> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
> 
> I'm not sure I understand this. You *don't* want to mark them
> as valid, and you need to disable pickup for this?

If TCP pickup is enabled, one TCP ACK packet coming in the reply 
direction enters TCP ESTABLISHED state. Since that's a valid 
state-transition, the cluster match will consider that this is part of a 
connection that this node is handling since it's a valid 
state-transition. The cluster match does not mark packets that trigger 
invalid state transitions.

> Unrelated to this patch, but maybe the target would also be
> better named "NAT" instead of the much more generic term "mangle".
> Why is it using lower case letters btw?

No idea who has done this, but I can send you a patch to fix this naming 
without breaking backward.

>> The match also provides a /proc entry under:
>>
>> /proc/sys/net/netfilter/cluster/$PROC_NAME
>>
>> where PROC_NAME is set via --cluster-proc-name. This is useful to
>> include possible cluster reconfigurations via fail-over scripts.
>> Assuming that this is the node 1, if node 2 is down, you can add
>> node 2 to your node-mask as follows:
>>
>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
> 
> Does this provide anything you can't do by replacing the rule
> itself?

Yes, the nodes in the cluster are identifies by an ID, the rule allows 
you to specify one ID. Say you have two cluster nodes, one with ID 1, 
and the other with ID 2. If the cluster node with ID 1 goes down, you 
can echo +1 to node with ID 2 so that it will handle packets going to 
node with ID 1 and ID 2. Of course, you need conntrackd to allow node ID 
2 recover the filtering.

Now, I see that there is a possible optimization that consists of 
checking if one node has its node mask all set with regards to the total 
number of nodes, so that hashing can be skipped. But that's something 
that we can add later I think.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

  reply	other threads:[~2009-02-16 13:53 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-14 19:29 [PATCH] netfilter: xtables: add cluster match Pablo Neira Ayuso
2009-02-14 20:28 ` Jan Engelhardt
2009-02-14 20:42   ` Pablo Neira Ayuso
2009-02-14 22:31     ` Jan Engelhardt
2009-02-14 22:32       ` Jan Engelhardt
2009-02-16 10:56 ` Patrick McHardy
2009-02-16 14:01   ` Pablo Neira Ayuso [this message]
2009-02-16 14:03     ` Patrick McHardy
2009-02-16 14:30       ` Pablo Neira Ayuso
2009-02-16 15:01         ` Patrick McHardy
2009-02-16 15:14         ` Pablo Neira Ayuso
2009-02-16 15:10           ` Patrick McHardy
2009-02-16 15:27             ` Pablo Neira Ayuso
2009-02-17 10:46             ` Pablo Neira Ayuso
2009-02-17 10:50               ` Patrick McHardy
2009-02-17 13:50                 ` Pablo Neira Ayuso
2009-02-17 19:45                   ` Vincent Bernat
2009-02-18 10:14                     ` Patrick McHardy
2009-02-18 10:13                   ` Patrick McHardy
2009-02-18 11:06                     ` Pablo Neira Ayuso
2009-02-18 11:14                       ` Patrick McHardy
2009-02-18 17:20                       ` Vincent Bernat
2009-02-18 17:25                         ` Patrick McHardy
2009-02-18 18:38                           ` Pablo Neira Ayuso
2009-02-16 17:17         ` Jan Engelhardt
2009-02-16 17:13     ` Jan Engelhardt
2009-02-16 17:16       ` Patrick McHardy
2009-02-16 17:22         ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2009-02-16  9:23 Pablo Neira Ayuso
2009-02-16  9:31 ` Pablo Neira Ayuso
2009-02-16 12:13   ` Jan Engelhardt
2009-02-16 12:17     ` Patrick McHardy
2009-02-16  9:32 Pablo Neira Ayuso
2009-02-19 23:14 Pablo Neira Ayuso
2009-02-20  9:24 ` Patrick McHardy
2009-02-20 13:15   ` Pablo Neira Ayuso
2009-02-20 13:48     ` Patrick McHardy
2009-02-20 16:52       ` Pablo Neira Ayuso
2009-02-20 20:50 Pablo Neira Ayuso
2009-02-20 20:56 ` Pablo Neira Ayuso
2009-02-23 10:13 Pablo Neira Ayuso
2009-02-24 13:46 ` Patrick McHardy
2009-02-24 14:05   ` Pablo Neira Ayuso
2009-02-24 14:06     ` Patrick McHardy
2009-02-24 23:13       ` Pablo Neira Ayuso
2009-02-25  5:52         ` Patrick McHardy
2009-02-25  9:42           ` Pablo Neira Ayuso
2009-02-25 10:20             ` Patrick McHardy
2009-03-16 16:11 ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=499971CC.6040903@netfilter.org \
    --to=pablo@netfilter.org \
    --cc=kaber@trash.net \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).