From: Patrick McHardy <kaber@trash.net>
To: davem@davemloft.net
Cc: netdev@vger.kernel.org, Patrick McHardy <kaber@trash.net>,
netfilter-devel@vger.kernel.org
Subject: netfilter 35/41: xtables: add cluster match
Date: Tue, 24 Mar 2009 15:03:53 +0100 (MET) [thread overview]
Message-ID: <20090324140349.31401.91299.sendpatchset@x2.localnet> (raw)
In-Reply-To: <20090324140302.31401.37732.sendpatchset@x2.localnet>
commit 0269ea4937343536ec7e85649932bc8c9686ea78
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Mon Mar 16 17:10:36 2009 +0100
netfilter: xtables: add cluster match
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).
Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:
(jhash(source IP) % total_nodes) & node_mask
For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):
iptables -I PREROUTING -t mangle -i eth1 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
-m mark ! --mark 0xffff -j DROP
And the following commands to make all nodes see the same packets:
ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
--destination-mac 01:00:5e:00:01:01 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
--destination-mac 01:00:5e:00:01:02 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
BTW, some final notes:
* This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
* This match supersedes the CLUSTERIP target.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index 947b47d..af9d2fb 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -21,6 +21,7 @@ header-y += xt_connbytes.h
header-y += xt_connlimit.h
header-y += xt_connmark.h
header-y += xt_conntrack.h
+header-y += xt_cluster.h
header-y += xt_dccp.h
header-y += xt_dscp.h
header-y += xt_esp.h
diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..5e0a0d0
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,15 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+enum xt_cluster_flags {
+ XT_CLUSTER_F_INV = (1 << 0)
+};
+
+struct xt_cluster_match_info {
+ u_int32_t total_nodes;
+ u_int32_t node_mask;
+ u_int32_t hash_seed;
+ u_int32_t flags;
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index cdbaaff..2562d05 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -527,6 +527,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
This option adds a "TCPOPTSTRIP" target, which allows you to strip
TCP options from TCP packets.
+config NETFILTER_XT_MATCH_CLUSTER
+ tristate '"cluster" match support'
+ depends on NF_CONNTRACK
+ depends on NETFILTER_ADVANCED
+ ---help---
+ This option allows you to build work-load-sharing clusters of
+ network servers/stateful firewalls without having a dedicated
+ load-balancing router/server/switch. Basically, this match returns
+ true when the packet must be handled by this cluster node. Thus,
+ all nodes see all packets and this match decides which node handles
+ what packets. The work-load sharing algorithm is based on source
+ address hashing.
+
+ If you say Y or M here, try `iptables -m cluster --help` for
+ more information.
+
config NETFILTER_XT_MATCH_COMMENT
tristate '"comment" match support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 7a9b839..6282060 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
# matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..ad5bd89
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,164 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+ return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+ return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+ return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+ return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+ const struct xt_cluster_match_info *info)
+{
+ u_int32_t hash = 0;
+
+ switch(nf_ct_l3num(ct)) {
+ case AF_INET:
+ hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+ break;
+ case AF_INET6:
+ hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+ break;
+ default:
+ WARN_ON(1);
+ break;
+ }
+ return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+ bool is_multicast = false;
+
+ switch(family) {
+ case NFPROTO_IPV4:
+ is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+ break;
+ case NFPROTO_IPV6:
+ is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+ IPV6_ADDR_MULTICAST;
+ break;
+ default:
+ WARN_ON(1);
+ break;
+ }
+ return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+ struct sk_buff *pskb = (struct sk_buff *)skb;
+ const struct xt_cluster_match_info *info = par->matchinfo;
+ const struct nf_conn *ct;
+ enum ip_conntrack_info ctinfo;
+ unsigned long hash;
+
+ /* This match assumes that all nodes see the same packets. This can be
+ * achieved if the switch that connects the cluster nodes support some
+ * sort of 'port mirroring'. However, if your switch does not support
+ * this, your cluster nodes can reply ARP request using a multicast MAC
+ * address. Thus, your switch will flood the same packets to the
+ * cluster nodes with the same multicast MAC address. Using a multicast
+ * link address is a RFC 1812 (section 3.3.2) violation, but this works
+ * fine in practise.
+ *
+ * Unfortunately, if you use the multicast MAC address, the link layer
+ * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+ * by TCP and others for packets coming to this node. For that reason,
+ * this match mangles skbuff's pkt_type if it detects a packet
+ * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+ * know, matches should not alter packets, but we are doing this here
+ * because we would need to add a PKTTYPE target for this sole purpose.
+ */
+ if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+ skb->pkt_type == PACKET_MULTICAST) {
+ pskb->pkt_type = PACKET_HOST;
+ }
+
+ ct = nf_ct_get(skb, &ctinfo);
+ if (ct == NULL)
+ return false;
+
+ if (ct == &nf_conntrack_untracked)
+ return false;
+
+ if (ct->master)
+ hash = xt_cluster_hash(ct->master, info);
+ else
+ hash = xt_cluster_hash(ct, info);
+
+ return !!((1 << hash) & info->node_mask) ^
+ !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+ struct xt_cluster_match_info *info = par->matchinfo;
+
+ if (info->node_mask >= (1 << info->total_nodes)) {
+ printk(KERN_ERR "xt_cluster: this node mask cannot be "
+ "higher than the total number of nodes\n");
+ return false;
+ }
+ return true;
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+ .name = "cluster",
+ .family = NFPROTO_UNSPEC,
+ .match = xt_cluster_mt,
+ .checkentry = xt_cluster_mt_checkentry,
+ .matchsize = sizeof(struct xt_cluster_match_info),
+ .me = THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+ return xt_register_match(&xt_cluster_match);
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+ xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);
next prev parent reply other threads:[~2009-03-24 14:03 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-24 14:03 netfilter 00/41: Netfilter update for 2.6.30 Patrick McHardy
2009-03-24 14:03 ` netfilter 01/41: change generic l4 protocol number Patrick McHardy
2009-03-24 14:03 ` netfilter 02/41: remove unneeded goto Patrick McHardy
2009-03-24 14:03 ` netfilter 03/41: x_tables: change elements in x_tables Patrick McHardy
2009-03-24 14:03 ` netfilter 04/41: x_tables: remove unneeded initializations Patrick McHardy
2009-03-24 14:03 ` netfilter 05/41: ebtables: " Patrick McHardy
2009-03-24 14:03 ` netfilter 06/41: log invalid new icmpv6 packet with nf_log_packet() Patrick McHardy
2009-03-24 14:03 ` netfilter 07/41: arp_tables: unfold two critical loops in arp_packet_match() Patrick McHardy
2009-03-24 20:29 ` David Miller
2009-03-24 21:06 ` Eric Dumazet
2009-03-24 21:16 ` David Miller
2009-03-24 21:17 ` Jan Engelhardt
2009-03-24 21:18 ` David Miller
2009-03-24 21:23 ` Jan Engelhardt
2009-03-24 21:25 ` David Miller
2009-03-24 21:39 ` Eric Dumazet
2009-03-24 21:52 ` Jan Engelhardt
2009-03-25 11:27 ` [PATCH] netfilter: factorize ifname_compare() Eric Dumazet
2009-03-25 16:32 ` Patrick McHardy
2009-03-25 10:33 ` netfilter 07/41: arp_tables: unfold two critical loops in arp_packet_match() Andi Kleen
2009-03-24 14:03 ` netfilter 08/41: Combine ipt_TTL and ip6t_HL source Patrick McHardy
2009-03-24 14:03 ` netfilter 09/41: Combine ipt_ttl and ip6t_hl source Patrick McHardy
2009-03-24 14:03 ` netfilter 10/41: xt_physdev fixes Patrick McHardy
2009-03-24 14:03 ` netfilter 11/41: xtables: add backward-compat options Patrick McHardy
2009-03-24 14:03 ` netfilter 12/41: xt_physdev: unfold two loops in physdev_mt() Patrick McHardy
2009-03-24 14:03 ` netfilter 13/41: ip6_tables: unfold two loops in ip6_packet_match() Patrick McHardy
2009-03-24 14:03 ` netfilter 14/41: iptables: lock free counters Patrick McHardy
2009-03-24 14:03 ` netfilter 15/41: nf_conntrack: table max size should hold at least table size Patrick McHardy
2009-03-24 14:03 ` netfilter 16/41: fix hardcoded size assumptions Patrick McHardy
2009-03-24 14:03 ` netfilter 17/41: x_tables: add LED trigger target Patrick McHardy
2009-03-24 14:03 ` netfilter 18/41: ip_tables: unfold two critical loops in ip_packet_match() Patrick McHardy
2009-03-24 14:03 ` netfilter 19/41: nf_conntrack: account packets drop by tcp_packet() Patrick McHardy
2009-03-24 14:03 ` netfilter 20/41: install missing headers Patrick McHardy
2009-03-24 14:03 ` netfilter 21/41: xt_hashlimit fix Patrick McHardy
2009-03-24 14:03 ` netfilter 22/41: use a linked list of loggers Patrick McHardy
2009-03-24 14:03 ` netfilter 23/41: print the list of register loggers Patrick McHardy
2009-03-24 14:03 ` netfilter 24/41: remove IPvX specific parts from nf_conntrack_l4proto.h Patrick McHardy
2009-03-24 14:03 ` netfilter 25/41: Kconfig spelling fixes (trivial) Patrick McHardy
2009-03-24 14:20 ` Jan Engelhardt
2009-03-24 20:35 ` David Miller
2009-03-24 14:03 ` netfilter 26/41: conntrack: increase drop stats if sequence adjustment fails Patrick McHardy
2009-03-24 14:03 ` netfilter 27/41: ctnetlink: cleanup master conntrack assignation Patrick McHardy
2009-03-24 14:03 ` netfilter 28/41: ctnetlink: cleanup conntrack update preliminary checkings Patrick McHardy
2009-03-24 14:03 ` netfilter 29/41: ctnetlink: move event reporting for new entries outside the lock Patrick McHardy
2009-03-24 14:03 ` netfilter 30/41: auto-load ip6_queue module when socket opened Patrick McHardy
2009-03-24 14:03 ` netfilter 31/41: auto-load ip_queue " Patrick McHardy
2009-03-24 14:03 ` netfilter 32/41: xtables: avoid pointer to self Patrick McHardy
2009-03-24 14:03 ` net 33/41: sysctl_net - use net_eq to compare nets Patrick McHardy
2009-03-24 14:03 ` net 34/41: netfilter conntrack - add per-net functionality for DCCP protocol Patrick McHardy
2009-03-24 14:03 ` Patrick McHardy [this message]
2009-03-24 14:03 ` netfilter 36/41: ctnetlink: remove remaining module refcounting Patrick McHardy
2009-03-24 14:03 ` netfilter 37/41: remove nf_ct_l4proto_find_get/nf_ct_l4proto_put Patrick McHardy
2009-03-24 14:03 ` netfilter 38/41: ctnetlink: fix rcu context imbalance Patrick McHardy
2009-03-24 14:03 ` netfilter 39/41: sysctl support of logger choice Patrick McHardy
2009-03-24 14:04 ` nefilter 40/41: nfnetlink: add nfnetlink_set_err and use it in ctnetlink Patrick McHardy
2009-03-24 14:04 ` netfilter 41/41: nf_conntrack: Reduce conntrack count in nf_conntrack_free() Patrick McHardy
2009-03-24 20:26 ` netfilter 00/41: Netfilter update for 2.6.30 David Miller
2009-03-25 16:29 ` Patrick McHardy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090324140349.31401.91299.sendpatchset@x2.localnet \
--to=kaber@trash.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).