From mboxrd@z Thu Jan 1 00:00:00 1970 From: Pablo Neira Ayuso Subject: Re: [PATCH] netfilter: xtables: add cluster match Date: Mon, 16 Feb 2009 15:01:48 +0100 Message-ID: <499971CC.6040903@netfilter.org> References: <20090214192936.11718.44732.stgit@Decadence> <49994643.8010001@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: netfilter-devel@vger.kernel.org To: Patrick McHardy Return-path: Received: from mail.us.es ([193.147.175.20]:33885 "EHLO us.es" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756513AbZBPNxg (ORCPT ); Mon, 16 Feb 2009 08:53:36 -0500 In-Reply-To: <49994643.8010001@trash.net> Sender: netfilter-devel-owner@vger.kernel.org List-ID: Patrick McHardy wrote: > Pablo Neira Ayuso wrote: >> This patch adds the iptables cluster match. This match can be used >> to deploy gateway and back-end load-sharing clusters. > > I'm mixing comments to the cluster match and the ARP mangle target. > >> Assuming that all the nodes see all packets (see below for an >> example on how to do that if your switch does not allow this), the >> cluster match decides if this node has to handle a packet given: >> >> jhash(source IP) % total_nodes == node_id >> >> For related connections, the master conntrack is used. The following >> is an example of its use to deploy a gateway cluster composed of two >> nodes (where this is the node 1): >> >> iptables -I PREROUTING -t mangle -i eth1 -m cluster \ >> --cluster-total-nodes 2 --cluster-local-node 1 \ >> --cluster-proc-name eth1 -j MARK --set-mark 0xffff >> iptables -A PREROUTING -t mangle -i eth1 \ >> -m mark ! --mark 0xffff -j DROP >> iptables -A PREROUTING -t mangle -i eth2 -m cluster \ >> --cluster-total-nodes 2 --cluster-local-node 1 \ >> --cluster-proc-name eth2 -j MARK --set-mark 0xffff >> iptables -A PREROUTING -t mangle -i eth2 \ >> -m mark ! --mark 0xffff -j DROP >> >> And the following commands to make all nodes see the same packets: >> >> ip maddr add 01:00:5e:00:01:01 dev eth1 >> ip maddr add 01:00:5e:00:01:02 dev eth2 >> arptables -I OUTPUT -o eth1 --h-length 6 \ >> -j mangle --mangle-mac-s 01:00:5e:00:01:01 >> arptables -I INPUT -i eth1 --h-length 6 \ >> --destination-mac 01:00:5e:00:01:01 \ >> -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27 > > Mhh, is the saving of one or two characters really worth these > deviations from the kind-of established naming scheme? Its hard > to remember all these minor differences in my opinion. Hm, you mean the name "mangle" or the name of the option "--mangle-mac-d"? This is what we currently have in kernel mainline and arptables userspace, it's not my fault :). I can send you a patch to fix it with a consistent naming without breaking backward compatibility both in kernel and user-space. >> arptables -I OUTPUT -o eth2 --h-length 6 \ >> -j mangle --mangle-mac-s 01:00:5e:00:01:02 >> arptables -I INPUT -i eth2 --h-length 6 \ >> --destination-mac 01:00:5e:00:01:02 \ >> -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27 >> >> In the case of TCP connections, pickup facility has to be disabled >> to avoid marking TCP ACK packets coming in the reply direction as >> valid. >> >> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose > > I'm not sure I understand this. You *don't* want to mark them > as valid, and you need to disable pickup for this? If TCP pickup is enabled, one TCP ACK packet coming in the reply direction enters TCP ESTABLISHED state. Since that's a valid state-transition, the cluster match will consider that this is part of a connection that this node is handling since it's a valid state-transition. The cluster match does not mark packets that trigger invalid state transitions. > Unrelated to this patch, but maybe the target would also be > better named "NAT" instead of the much more generic term "mangle". > Why is it using lower case letters btw? No idea who has done this, but I can send you a patch to fix this naming without breaking backward. >> The match also provides a /proc entry under: >> >> /proc/sys/net/netfilter/cluster/$PROC_NAME >> >> where PROC_NAME is set via --cluster-proc-name. This is useful to >> include possible cluster reconfigurations via fail-over scripts. >> Assuming that this is the node 1, if node 2 is down, you can add >> node 2 to your node-mask as follows: >> >> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME > > Does this provide anything you can't do by replacing the rule > itself? Yes, the nodes in the cluster are identifies by an ID, the rule allows you to specify one ID. Say you have two cluster nodes, one with ID 1, and the other with ID 2. If the cluster node with ID 1 goes down, you can echo +1 to node with ID 2 so that it will handle packets going to node with ID 1 and ID 2. Of course, you need conntrackd to allow node ID 2 recover the filtering. Now, I see that there is a possible optimization that consists of checking if one node has its node mask all set with regards to the total number of nodes, so that hashing can be skipped. But that's something that we can add later I think. -- "Los honestos son inadaptados sociales" -- Les Luthiers