netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-14 19:29 Pablo Neira Ayuso
  2009-02-14 20:28 ` Jan Engelhardt
  2009-02-16 10:56 ` Patrick McHardy
  0 siblings, 2 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-14 19:29 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters.

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	jhash(source IP) % total_nodes == node_id

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

The match also provides a /proc entry under:

/proc/sys/net/netfilter/cluster/$PROC_NAME

where PROC_NAME is set via --cluster-proc-name. This is useful to
include possible cluster reconfigurations via fail-over scripts.
Assuming that this is the node 1, if node 2 is down, you can add
node 2 to your node-mask as follows:

echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   21 ++
 net/netfilter/Kconfig                |   15 +
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  368 ++++++++++++++++++++++++++++++++++
 4 files changed, 405 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..a06401d
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,21 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+struct proc_dir_entry;
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV = 0,
+};
+
+struct xt_cluster_match_info {
+	u_int16_t		total_nodes;
+	u_int16_t		node_id;
+	u_int32_t		hash_seed;
+	char			proc_name[16];
+	u_int32_t		flags;
+
+	/* Used internally by the kernel */
+	void			*data __attribute__((aligned(8)));
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..52240cc 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,21 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..7f96db2
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,368 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/bitops.h>
+#include <linux/proc_fs.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+struct xt_cluster_internal {
+	unsigned long		node_mask;
+	struct proc_dir_entry	*proc;
+	atomic_t		use;
+};
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, int family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct xt_cluster_internal *internal = info->data;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+	bool inv = !!(info->flags & XT_CLUSTER_F_INV);
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return test_bit(hash, &internal->node_mask) ^ inv;
+}
+
+#ifdef CONFIG_PROC_FS
+static void *xt_cluster_seq_start(struct seq_file *s, loff_t *pos)
+{
+	if (*pos == 0) {
+		struct xt_cluster_internal *data = s->private;
+
+		return &data->node_mask;
+	} else {
+		*pos = 0;
+		return NULL;
+	}
+}
+
+static void *xt_cluster_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return NULL;
+}
+
+static void xt_cluster_seq_stop(struct seq_file *s, void *v) {}
+
+static int xt_cluster_seq_show(struct seq_file *s, void *v)
+{
+	unsigned long *mask = v;
+	seq_printf(s, "0x%.8lx\n", *mask);
+	return 0;
+}
+
+static const struct seq_operations xt_cluster_seq_ops = {
+	.start	= xt_cluster_seq_start,
+	.next	= xt_cluster_seq_next,
+	.stop	= xt_cluster_seq_stop,
+	.show	= xt_cluster_seq_show
+};
+
+#define XT_CLUSTER_PROC_WRITELEN	10
+
+static ssize_t
+xt_cluster_write_proc(struct file *file, const char __user *input,
+		      size_t size, loff_t *ofs)
+{
+	const struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
+	struct xt_cluster_internal *info = pde->data;
+	char buffer[XT_CLUSTER_PROC_WRITELEN+1];
+	unsigned int new_node_id;
+
+	if (copy_from_user(buffer, input, XT_CLUSTER_PROC_WRITELEN))
+		return -EFAULT;
+
+	switch(*buffer) {
+	case '+':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EIO;
+		printk(KERN_NOTICE "cluster: adding node %u\n", new_node_id);
+		set_bit(new_node_id-1, &info->node_mask);
+		break;
+	case '-':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EIO;
+		printk(KERN_NOTICE "cluster: deleting node %u\n", new_node_id);
+		clear_bit(new_node_id-1, &info->node_mask);
+		break;
+	default:
+		return -EIO;
+	}
+
+	return size;
+}
+
+static int xt_cluster_open_proc(struct inode *inode, struct file *file)
+{
+	int ret;
+	
+	ret = seq_open(file, &xt_cluster_seq_ops);
+	if (!ret) {
+		struct seq_file *seq = file->private_data;
+		const struct proc_dir_entry *pde = PDE(inode);
+		struct xt_cluster_match_info *info = pde->data;
+
+		seq->private = info;
+	}
+	return ret;
+};
+
+static struct proc_dir_entry *proc_cluster;
+static const struct file_operations xt_cluster_proc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= xt_cluster_open_proc,
+	.release	= seq_release,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.write		= xt_cluster_write_proc,
+};
+
+static bool
+xt_cluster_proc_entry_exist(struct proc_dir_entry *dir, const char *name)
+{
+	struct proc_dir_entry *tmp;
+
+	for (tmp = dir->subdir; tmp; tmp = tmp->next) {
+		if (strcmp(tmp->name, name) == 0)
+			return true;
+	}
+	return false;
+}
+
+static bool xt_cluster_proc_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *internal = info->data;
+
+	BUG_ON(info->data == NULL);
+
+	if (xt_cluster_proc_entry_exist(proc_cluster, info->proc_name)) {
+		printk(KERN_ERR "xt_cluster: proc entry entry `%s' "
+				"already exists\n", info->proc_name);
+		return false;
+	}
+	internal->proc = proc_create_data(info->proc_name,
+					  S_IWUSR|S_IRUSR,
+					  proc_cluster,
+					  &xt_cluster_proc_fops, 
+					  info->data);
+	if (!internal->proc) {
+		printk(KERN_ERR "xt_cluster: cannot create proc entry `%s'\n",
+				info->proc_name);
+		return false;
+	}
+	return true;
+}
+#endif /* CONFIG_PROC_FS */
+
+static bool xt_cluster_internal_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *data;
+
+	data = kzalloc(sizeof(struct xt_cluster_internal), GFP_KERNEL);
+	if (!data) {
+		printk(KERN_ERR "xt_cluster: OOM\n");
+		return false;
+	}
+	info->data = data;
+
+#ifdef CONFIG_PROC_FS
+	if (!xt_cluster_proc_init(info)) {
+		kfree(data);
+		return false;
+	}
+#endif
+	atomic_set(&data->use, 1);
+	data->node_mask = (1 << (info->node_id - 1));
+
+	return true;
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (info->node_id > info->total_nodes) {
+		printk(KERN_ERR "xt_cluster: the id of this node cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+
+	if (!info->data) {
+		if (!xt_cluster_internal_init(info))
+			return false;
+	} else
+		atomic_inc(&data->use);
+
+	return true;
+}
+
+static void xt_cluster_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (atomic_dec_and_test(&data->use)) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry(info->proc_name, proc_cluster);
+#endif
+		kfree(info->data);
+	}
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.destroy	= xt_cluster_mt_destroy,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	int ret;
+
+#ifdef CONFIG_PROC_FS
+	proc_cluster = proc_mkdir("cluster", proc_net_netfilter);
+	if (!proc_cluster)
+		return -ENOMEM;
+#endif
+	ret = xt_register_match(&xt_cluster_match);
+	if (ret < 0) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+		return ret;
+	}
+	return 0;
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-14 19:29 Pablo Neira Ayuso
@ 2009-02-14 20:28 ` Jan Engelhardt
  2009-02-14 20:42   ` Pablo Neira Ayuso
  2009-02-16 10:56 ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-14 20:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber


On Saturday 2009-02-14 20:29, Pablo Neira Ayuso wrote:

>This patch adds the iptables cluster match. This match can be used
>to deploy gateway and back-end load-sharing clusters.

All of this nice text (below) should go into libxt_cluster.man :)

>Assuming that all the nodes see all packets (see below for an
>example on how to do that if your switch does not allow this), the
>cluster match decides if this node has to handle a packet given:
>
>	jhash(source IP) % total_nodes == node_id
>
>For related connections, the master conntrack is used. The following
>is an example of its use to deploy a gateway cluster composed of two
>nodes (where this is the node 1):
>
>iptables -I PREROUTING -t mangle -i eth1 -m cluster \
>	--cluster-total-nodes 2 --cluster-local-node 1 \
[...]
>
>echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
>
>BTW, some final notes:
>
> * This match mangles the skbuff pkt_type in case that it detects
>PACKET_MULTICAST for a non-multicast address. This may be done in
>a PKTTYPE target for this sole purpose.
> * This match supersedes the CLUSTERIP target.

The supersedes statement should also go into Kconfig.

>@@ -0,0 +1,21 @@
>+#ifndef _XT_CLUSTER_MATCH_H
>+#define _XT_CLUSTER_MATCH_H
>+
>+struct proc_dir_entry;
>+
>+enum xt_cluster_flags {
>+	XT_CLUSTER_F_INV = 0,
>+};

Hm, that should be XT_CLUSTER_F_INV = 1 << 0.

>+config NETFILTER_XT_MATCH_CLUSTER
>+	tristate '"cluster" match support'
>+	depends on NETFILTER_ADVANCED

xt_cluster depends on NF_CONNTRACK too.

>+	---help---
>+	  This option allows you to build work-load-sharing clusters of
>+	  network servers/stateful firewalls without having a dedicated
>+	  load-balancing router/server/switch. Basically, this match returns
>+	  true when the packet must be handled by this cluster node. Thus,
>+	  all nodes see all packets and this match decides which node handles
>+	  what packets. The work-load sharing algorithm is based on source
>+	  address hashing.
>+
>+	  If you say Y here, try `iptables -m cluster --help` for
>+	  more information.

Somehow this gives the impression that Y is the only logical choice,
when indeed, M would work too.

>+struct xt_cluster_internal {
>+	unsigned long		node_mask;

I think I raised concern about this, but can't remember.
Should not this be of type nodemask_t instead? Or ... is this
"node" perhaps not describing a NUMA node, but a cluster node?
Some comments would be appreciated, along the lines of

/**
 * @node_mask:	cluster node mask
 */

>+	struct proc_dir_entry	*proc;
>+	atomic_t		use;
>+};

>+static inline bool
>+xt_cluster_is_multicast_addr(const struct sk_buff *skb, int family)

Cosmetic warrants uint8_t family.

>+static bool
>+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
>+{
>+	struct sk_buff *pskb = (struct sk_buff *)skb;
>+	const struct xt_cluster_match_info *info = par->matchinfo;
>+	const struct xt_cluster_internal *internal = info->data;
>+	const struct nf_conn *ct;
>+	enum ip_conntrack_info ctinfo;
>+	unsigned long hash;
>+	bool inv = !!(info->flags & XT_CLUSTER_F_INV);
>+
>+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
>+	    skb->pkt_type == PACKET_MULTICAST) {
>+	    	pskb->pkt_type = PACKET_HOST;
>+	}

{} are redundant here

>+	if (ct->master)
>+		hash = xt_cluster_hash(ct->master, info);
>+	else
>+		hash = xt_cluster_hash(ct, info);
>+
>+	return test_bit(hash, &internal->node_mask) ^ inv;

Since inv is just once used, I would "inline" it, aka
	return test_bit(hash, &internal->node_mask) ^
	       !!(info->flags & XT_CLUSTER_MATCH_INV);

>+static int xt_cluster_seq_show(struct seq_file *s, void *v)
>+{
>+	unsigned long *mask = v;
>+	seq_printf(s, "0x%.8lx\n", *mask);

'.' does not make sense with non-string,non-floating point numbers
(though it is a stdc feature it seems). I'd say "0x%08lx", for clarity.

>+	case '-':
>+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
>+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
>+			return -EIO;
>+		printk(KERN_NOTICE "cluster: deleting node %u\n", new_node_id);
>+		clear_bit(new_node_id-1, &info->node_mask);
>+		break;
>+	default:
>+		return -EIO;

EINVAL, I'd say.

>+static bool
>+xt_cluster_proc_entry_exist(struct proc_dir_entry *dir, const char *name)
>+{
>+	struct proc_dir_entry *tmp;
>+
>+	for (tmp = dir->subdir; tmp; tmp = tmp->next) {
>+		if (strcmp(tmp->name, name) == 0)
>+			return true;
>+	}

-{}


Looks good so far.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-14 20:28 ` Jan Engelhardt
@ 2009-02-14 20:42   ` Pablo Neira Ayuso
  2009-02-14 22:31     ` Jan Engelhardt
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-14 20:42 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, kaber

Jan Engelhardt wrote:
> On Saturday 2009-02-14 20:29, Pablo Neira Ayuso wrote:
> 
>> This patch adds the iptables cluster match. This match can be used
>> to deploy gateway and back-end load-sharing clusters.
> 
> All of this nice text (below) should go into libxt_cluster.man :)

Indeed, that will go into the manpage. I have kept the iptables part
until you finish with Jamal's libxtables renaming, to avoid any
clashing. Anyway, I don't think that adding this information to the
manpage is a reason to trim the patch description, actually I think that
this patch deserves this long description and I have seen longer
descriptions in the kernel changelog :)

>> Assuming that all the nodes see all packets (see below for an
>> example on how to do that if your switch does not allow this), the
>> cluster match decides if this node has to handle a packet given:
>>
>> 	jhash(source IP) % total_nodes == node_id
>>
>> For related connections, the master conntrack is used. The following
>> is an example of its use to deploy a gateway cluster composed of two
>> nodes (where this is the node 1):
>>
>> iptables -I PREROUTING -t mangle -i eth1 -m cluster \
>> 	--cluster-total-nodes 2 --cluster-local-node 1 \
> [...]
>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
>>
>> BTW, some final notes:
>>
>> * This match mangles the skbuff pkt_type in case that it detects
>> PACKET_MULTICAST for a non-multicast address. This may be done in
>> a PKTTYPE target for this sole purpose.
>> * This match supersedes the CLUSTERIP target.
> 
> The supersedes statement should also go into Kconfig.

Yes, but I was also waiting for Patrick to tell how to proceed with
this. I think that we also have to add CLUSTERIP to the
schedule-for-removal list.

>> @@ -0,0 +1,21 @@
>> +#ifndef _XT_CLUSTER_MATCH_H
>> +#define _XT_CLUSTER_MATCH_H
>> +
>> +struct proc_dir_entry;
>> +
>> +enum xt_cluster_flags {
>> +	XT_CLUSTER_F_INV = 0,
>> +};
> 
> Hm, that should be XT_CLUSTER_F_INV = 1 << 0.

Indeed.

>> +config NETFILTER_XT_MATCH_CLUSTER
>> +	tristate '"cluster" match support'
>> +	depends on NETFILTER_ADVANCED
> 
> xt_cluster depends on NF_CONNTRACK too.

Right.

>> +	---help---
>> +	  This option allows you to build work-load-sharing clusters of
>> +	  network servers/stateful firewalls without having a dedicated
>> +	  load-balancing router/server/switch. Basically, this match returns
>> +	  true when the packet must be handled by this cluster node. Thus,
>> +	  all nodes see all packets and this match decides which node handles
>> +	  what packets. The work-load sharing algorithm is based on source
>> +	  address hashing.
>> +
>> +	  If you say Y here, try `iptables -m cluster --help` for
>> +	  more information.
> 
> Somehow this gives the impression that Y is the only logical choice,
> when indeed, M would work too.
> 
>> +struct xt_cluster_internal {
>> +	unsigned long		node_mask;
> 
> I think I raised concern about this, but can't remember.
> Should not this be of type nodemask_t instead? Or ... is this
> "node" perhaps not describing a NUMA node, but a cluster node?
> Some comments would be appreciated, along the lines of

You mentioned cpumask_t last time, but this is neither related to NUMA
nor a CPU mask, so I prefer to keep using a long for this thing,
moreover test and set bit operations use long.

> /**
>  * @node_mask:	cluster node mask
>  */
> 
>> +	struct proc_dir_entry	*proc;
>> +	atomic_t		use;
>> +};
> 
>> +static inline bool
>> +xt_cluster_is_multicast_addr(const struct sk_buff *skb, int family)
> 
> Cosmetic warrants uint8_t family.
> 
>> +static bool
>> +xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
>> +{
>> +	struct sk_buff *pskb = (struct sk_buff *)skb;
>> +	const struct xt_cluster_match_info *info = par->matchinfo;
>> +	const struct xt_cluster_internal *internal = info->data;
>> +	const struct nf_conn *ct;
>> +	enum ip_conntrack_info ctinfo;
>> +	unsigned long hash;
>> +	bool inv = !!(info->flags & XT_CLUSTER_F_INV);
>> +
>> +	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
>> +	    skb->pkt_type == PACKET_MULTICAST) {
>> +	    	pskb->pkt_type = PACKET_HOST;
>> +	}
> 
> {} are redundant here

When lines are splitted, I have seen these redundant {} for readability.

>> +	if (ct->master)
>> +		hash = xt_cluster_hash(ct->master, info);
>> +	else
>> +		hash = xt_cluster_hash(ct, info);
>> +
>> +	return test_bit(hash, &internal->node_mask) ^ inv;
> 
> Since inv is just once used, I would "inline" it, aka
> 	return test_bit(hash, &internal->node_mask) ^
> 	       !!(info->flags & XT_CLUSTER_MATCH_INV);

Makes sense.

>> +static int xt_cluster_seq_show(struct seq_file *s, void *v)
>> +{
>> +	unsigned long *mask = v;
>> +	seq_printf(s, "0x%.8lx\n", *mask);
> 
> '.' does not make sense with non-string,non-floating point numbers
> (though it is a stdc feature it seems). I'd say "0x%08lx", for clarity.

OK

>> +	case '-':
>> +		new_node_id = simple_strtoul(buffer+1, NULL, 10);
>> +		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
>> +			return -EIO;
>> +		printk(KERN_NOTICE "cluster: deleting node %u\n", new_node_id);
>> +		clear_bit(new_node_id-1, &info->node_mask);
>> +		break;
>> +	default:
>> +		return -EIO;
> 
> EINVAL, I'd say.

Other /proc-related code uses this error, but indeed I prefer EINVAL.

>> +static bool
>> +xt_cluster_proc_entry_exist(struct proc_dir_entry *dir, const char *name)
>> +{
>> +	struct proc_dir_entry *tmp;
>> +
>> +	for (tmp = dir->subdir; tmp; tmp = tmp->next) {
>> +		if (strcmp(tmp->name, name) == 0)
>> +			return true;
>> +	}
> 
> -{}

Same comment as above, it's there for readability.

> Looks good so far.

Thanks for the review.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-14 20:42   ` Pablo Neira Ayuso
@ 2009-02-14 22:31     ` Jan Engelhardt
  2009-02-14 22:32       ` Jan Engelhardt
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-14 22:31 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber


On Saturday 2009-02-14 21:42, Pablo Neira Ayuso wrote:
>Jan Engelhardt wrote:
>> On Saturday 2009-02-14 20:29, Pablo Neira Ayuso wrote:
>> 
>>> This patch adds the iptables cluster match. This match can be used
>>> to deploy gateway and back-end load-sharing clusters.
>> 
>> All of this nice text (below) should go into libxt_cluster.man :)
>
>Indeed, that will go into the manpage. I have kept the iptables part
>until you finish with Jamal's libxtables renaming, to avoid any
>clashing.

We are done and ready for release as I understood.

>>> * This match mangles the skbuff pkt_type in case that it detects
>>> PACKET_MULTICAST for a non-multicast address. This may be done in
>>> a PKTTYPE target for this sole purpose.
>>> * This match supersedes the CLUSTERIP target.
>> 
>> The supersedes statement should also go into Kconfig.
>
>Yes, but I was also waiting for Patrick to tell how to proceed with
>this. I think that we also have to add CLUSTERIP to the
>schedule-for-removal list.

We do not have to (there is still xt_state and no mention of
deprecation despite xt_conntrack superseding it) - but it would
be a most logical step indeed.

>>> +	unsigned long		node_mask;
>> 
>You mentioned cpumask_t last time, but this is neither related to NUMA
>nor a CPU mask, so I prefer to keep using a long for this thing,
>moreover test and set bit operations use long.
>
>> /**
>>  * @node_mask:	cluster node mask
>>  */

Hence the need for a comment so that this misunderstanding
is resolved early.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-14 22:31     ` Jan Engelhardt
@ 2009-02-14 22:32       ` Jan Engelhardt
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-14 22:32 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber


On Saturday 2009-02-14 23:31, Jan Engelhardt wrote:
>On Saturday 2009-02-14 21:42, Pablo Neira Ayuso wrote:
>>Jan Engelhardt wrote:
>>> On Saturday 2009-02-14 20:29, Pablo Neira Ayuso wrote:
>>> 
>>>> This patch adds the iptables cluster match. This match can be used
>>>> to deploy gateway and back-end load-sharing clusters.
>>> 
>>> All of this nice text (below) should go into libxt_cluster.man :)
>>
>>Indeed, that will go into the manpage. I have kept the iptables part
>>until you finish with Jamal's libxtables renaming, to avoid any
>>clashing.
>
>We are done and ready for release as I understood.

That is, after grabbing the latest contents of our tree.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-16  9:23 Pablo Neira Ayuso
  2009-02-16  9:31 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16  9:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters.

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	jhash(source IP) % total_nodes == node_id

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

The match also provides a /proc entry under:

/proc/sys/net/netfilter/cluster/$PROC_NAME

where PROC_NAME is set via --cluster-proc-name. This is useful to
include possible cluster reconfigurations via fail-over scripts.
Assuming that this is the node 1, if node 2 is down, you can add
node 2 to your node-mask as follows:

echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   21 ++
 net/netfilter/Kconfig                |   16 +
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  368 ++++++++++++++++++++++++++++++++++
 4 files changed, 406 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..2cfc24e
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,21 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+struct proc_dir_entry;
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV = (1 << 0),
+};
+
+struct xt_cluster_match_info {
+	u_int16_t		total_nodes;
+	u_int16_t		node_id;
+	u_int32_t		hash_seed;
+	char			proc_name[16];
+	u_int32_t		flags;
+
+	/* Used internally by the kernel */
+	void			*data __attribute__((aligned(8)));
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..77b6405 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NF_CONNTRACK
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y or M here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..f483de4
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,368 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/bitops.h>
+#include <linux/proc_fs.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+struct xt_cluster_internal {
+	unsigned long		node_mask;	/* cluster node mask */
+	struct proc_dir_entry	*proc;
+	atomic_t		use;
+};
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct xt_cluster_internal *internal = info->data;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return test_bit(hash, &internal->node_mask) ^
+	       !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+#ifdef CONFIG_PROC_FS
+static void *xt_cluster_seq_start(struct seq_file *s, loff_t *pos)
+{
+	if (*pos == 0) {
+		struct xt_cluster_internal *data = s->private;
+
+		return &data->node_mask;
+	} else {
+		*pos = 0;
+		return NULL;
+	}
+}
+
+static void *xt_cluster_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return NULL;
+}
+
+static void xt_cluster_seq_stop(struct seq_file *s, void *v) {}
+
+static int xt_cluster_seq_show(struct seq_file *s, void *v)
+{
+	unsigned long *mask = v;
+	seq_printf(s, "0x%8lx\n", *mask);
+	return 0;
+}
+
+static const struct seq_operations xt_cluster_seq_ops = {
+	.start	= xt_cluster_seq_start,
+	.next	= xt_cluster_seq_next,
+	.stop	= xt_cluster_seq_stop,
+	.show	= xt_cluster_seq_show
+};
+
+#define XT_CLUSTER_PROC_WRITELEN	10
+
+static ssize_t
+xt_cluster_write_proc(struct file *file, const char __user *input,
+		      size_t size, loff_t *ofs)
+{
+	const struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
+	struct xt_cluster_internal *info = pde->data;
+	char buffer[XT_CLUSTER_PROC_WRITELEN+1];
+	unsigned int new_node_id;
+
+	if (copy_from_user(buffer, input, XT_CLUSTER_PROC_WRITELEN))
+		return -EFAULT;
+
+	switch(*buffer) {
+	case '+':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EINVAL;
+		printk(KERN_NOTICE "cluster: adding node %u\n", new_node_id);
+		set_bit(new_node_id-1, &info->node_mask);
+		break;
+	case '-':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EINVAL;
+		printk(KERN_NOTICE "cluster: deleting node %u\n", new_node_id);
+		clear_bit(new_node_id-1, &info->node_mask);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return size;
+}
+
+static int xt_cluster_open_proc(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	ret = seq_open(file, &xt_cluster_seq_ops);
+	if (!ret) {
+		struct seq_file *seq = file->private_data;
+		const struct proc_dir_entry *pde = PDE(inode);
+		struct xt_cluster_match_info *info = pde->data;
+
+		seq->private = info;
+	}
+	return ret;
+};
+
+static struct proc_dir_entry *proc_cluster;
+static const struct file_operations xt_cluster_proc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= xt_cluster_open_proc,
+	.release	= seq_release,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.write		= xt_cluster_write_proc,
+};
+
+static bool
+xt_cluster_proc_entry_exist(struct proc_dir_entry *dir, const char *name)
+{
+	struct proc_dir_entry *tmp;
+
+	for (tmp = dir->subdir; tmp; tmp = tmp->next) {
+		if (strcmp(tmp->name, name) == 0)
+			return true;
+	}
+	return false;
+}
+
+static bool xt_cluster_proc_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *internal = info->data;
+
+	BUG_ON(info->data == NULL);
+
+	if (xt_cluster_proc_entry_exist(proc_cluster, info->proc_name)) {
+		printk(KERN_ERR "xt_cluster: proc entry entry `%s' "
+				"already exists\n", info->proc_name);
+		return false;
+	}
+	internal->proc = proc_create_data(info->proc_name,
+					  S_IWUSR|S_IRUSR,
+					  proc_cluster,
+					  &xt_cluster_proc_fops,
+					  info->data);
+	if (!internal->proc) {
+		printk(KERN_ERR "xt_cluster: cannot create proc entry `%s'\n",
+				info->proc_name);
+		return false;
+	}
+	return true;
+}
+#endif /* CONFIG_PROC_FS */
+
+static bool xt_cluster_internal_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *data;
+
+	data = kzalloc(sizeof(struct xt_cluster_internal), GFP_KERNEL);
+	if (!data) {
+		printk(KERN_ERR "xt_cluster: OOM\n");
+		return false;
+	}
+	info->data = data;
+
+#ifdef CONFIG_PROC_FS
+	if (!xt_cluster_proc_init(info)) {
+		kfree(data);
+		return false;
+	}
+#endif
+	atomic_set(&data->use, 1);
+	data->node_mask = (1 << (info->node_id - 1));
+
+	return true;
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (info->node_id > info->total_nodes) {
+		printk(KERN_ERR "xt_cluster: the id of this node cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+
+	if (!info->data) {
+		if (!xt_cluster_internal_init(info))
+			return false;
+	} else
+		atomic_inc(&data->use);
+
+	return true;
+}
+
+static void xt_cluster_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (atomic_dec_and_test(&data->use)) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry(info->proc_name, proc_cluster);
+#endif
+		kfree(info->data);
+	}
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.destroy	= xt_cluster_mt_destroy,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	int ret;
+
+#ifdef CONFIG_PROC_FS
+	proc_cluster = proc_mkdir("cluster", proc_net_netfilter);
+	if (!proc_cluster)
+		return -ENOMEM;
+#endif
+	ret = xt_register_match(&xt_cluster_match);
+	if (ret < 0) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+		return ret;
+	}
+	return 0;
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16  9:23 Pablo Neira Ayuso
@ 2009-02-16  9:31 ` Pablo Neira Ayuso
  2009-02-16 12:13   ` Jan Engelhardt
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16  9:31 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

Pablo Neira Ayuso wrote:

> +static int xt_cluster_seq_show(struct seq_file *s, void *v)
> +{
> +	unsigned long *mask = v;
> +	seq_printf(s, "0x%8lx\n", *mask);
                         ^^^
Damn, this needs the dot before the 8 to fill with zero the empty
spaces. Jan's suggestion was wrong and I forgot to check this. I'll send
the patch again.

pablo@debian:~$ sudo cat /proc/net/netfilter/cluster/eth2
0x       1

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-16  9:32 Pablo Neira Ayuso
  0 siblings, 0 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16  9:32 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters.

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	jhash(source IP) % total_nodes == node_id

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

The match also provides a /proc entry under:

/proc/sys/net/netfilter/cluster/$PROC_NAME

where PROC_NAME is set via --cluster-proc-name. This is useful to
include possible cluster reconfigurations via fail-over scripts.
Assuming that this is the node 1, if node 2 is down, you can add
node 2 to your node-mask as follows:

echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   21 ++
 net/netfilter/Kconfig                |   16 +
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  368 ++++++++++++++++++++++++++++++++++
 4 files changed, 406 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..2cfc24e
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,21 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+struct proc_dir_entry;
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV = (1 << 0),
+};
+
+struct xt_cluster_match_info {
+	u_int16_t		total_nodes;
+	u_int16_t		node_id;
+	u_int32_t		hash_seed;
+	char			proc_name[16];
+	u_int32_t		flags;
+
+	/* Used internally by the kernel */
+	void			*data __attribute__((aligned(8)));
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..77b6405 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NF_CONNTRACK
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y or M here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..d437d80
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,368 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/bitops.h>
+#include <linux/proc_fs.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+struct xt_cluster_internal {
+	unsigned long		node_mask;	/* cluster node mask */
+	struct proc_dir_entry	*proc;
+	atomic_t		use;
+};
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct xt_cluster_internal *internal = info->data;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return test_bit(hash, &internal->node_mask) ^
+	       !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+#ifdef CONFIG_PROC_FS
+static void *xt_cluster_seq_start(struct seq_file *s, loff_t *pos)
+{
+	if (*pos == 0) {
+		struct xt_cluster_internal *data = s->private;
+
+		return &data->node_mask;
+	} else {
+		*pos = 0;
+		return NULL;
+	}
+}
+
+static void *xt_cluster_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return NULL;
+}
+
+static void xt_cluster_seq_stop(struct seq_file *s, void *v) {}
+
+static int xt_cluster_seq_show(struct seq_file *s, void *v)
+{
+	unsigned long *mask = v;
+	seq_printf(s, "0x%.8lx\n", *mask);
+	return 0;
+}
+
+static const struct seq_operations xt_cluster_seq_ops = {
+	.start	= xt_cluster_seq_start,
+	.next	= xt_cluster_seq_next,
+	.stop	= xt_cluster_seq_stop,
+	.show	= xt_cluster_seq_show
+};
+
+#define XT_CLUSTER_PROC_WRITELEN	10
+
+static ssize_t
+xt_cluster_write_proc(struct file *file, const char __user *input,
+		      size_t size, loff_t *ofs)
+{
+	const struct proc_dir_entry *pde = PDE(file->f_path.dentry->d_inode);
+	struct xt_cluster_internal *info = pde->data;
+	char buffer[XT_CLUSTER_PROC_WRITELEN+1];
+	unsigned int new_node_id;
+
+	if (copy_from_user(buffer, input, XT_CLUSTER_PROC_WRITELEN))
+		return -EFAULT;
+
+	switch(*buffer) {
+	case '+':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EINVAL;
+		printk(KERN_NOTICE "cluster: adding node %u\n", new_node_id);
+		set_bit(new_node_id-1, &info->node_mask);
+		break;
+	case '-':
+		new_node_id = simple_strtoul(buffer+1, NULL, 10);
+		if (!new_node_id || new_node_id > sizeof(info->node_mask)*8)
+			return -EINVAL;
+		printk(KERN_NOTICE "cluster: deleting node %u\n", new_node_id);
+		clear_bit(new_node_id-1, &info->node_mask);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return size;
+}
+
+static int xt_cluster_open_proc(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	ret = seq_open(file, &xt_cluster_seq_ops);
+	if (!ret) {
+		struct seq_file *seq = file->private_data;
+		const struct proc_dir_entry *pde = PDE(inode);
+		struct xt_cluster_match_info *info = pde->data;
+
+		seq->private = info;
+	}
+	return ret;
+};
+
+static struct proc_dir_entry *proc_cluster;
+static const struct file_operations xt_cluster_proc_fops = {
+	.owner		= THIS_MODULE,
+	.open		= xt_cluster_open_proc,
+	.release	= seq_release,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.write		= xt_cluster_write_proc,
+};
+
+static bool
+xt_cluster_proc_entry_exist(struct proc_dir_entry *dir, const char *name)
+{
+	struct proc_dir_entry *tmp;
+
+	for (tmp = dir->subdir; tmp; tmp = tmp->next) {
+		if (strcmp(tmp->name, name) == 0)
+			return true;
+	}
+	return false;
+}
+
+static bool xt_cluster_proc_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *internal = info->data;
+
+	BUG_ON(info->data == NULL);
+
+	if (xt_cluster_proc_entry_exist(proc_cluster, info->proc_name)) {
+		printk(KERN_ERR "xt_cluster: proc entry entry `%s' "
+				"already exists\n", info->proc_name);
+		return false;
+	}
+	internal->proc = proc_create_data(info->proc_name,
+					  S_IWUSR|S_IRUSR,
+					  proc_cluster,
+					  &xt_cluster_proc_fops,
+					  info->data);
+	if (!internal->proc) {
+		printk(KERN_ERR "xt_cluster: cannot create proc entry `%s'\n",
+				info->proc_name);
+		return false;
+	}
+	return true;
+}
+#endif /* CONFIG_PROC_FS */
+
+static bool xt_cluster_internal_init(struct xt_cluster_match_info *info)
+{
+	struct xt_cluster_internal *data;
+
+	data = kzalloc(sizeof(struct xt_cluster_internal), GFP_KERNEL);
+	if (!data) {
+		printk(KERN_ERR "xt_cluster: OOM\n");
+		return false;
+	}
+	info->data = data;
+
+#ifdef CONFIG_PROC_FS
+	if (!xt_cluster_proc_init(info)) {
+		kfree(data);
+		return false;
+	}
+#endif
+	atomic_set(&data->use, 1);
+	data->node_mask = (1 << (info->node_id - 1));
+
+	return true;
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (info->node_id > info->total_nodes) {
+		printk(KERN_ERR "xt_cluster: the id of this node cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+
+	if (!info->data) {
+		if (!xt_cluster_internal_init(info))
+			return false;
+	} else
+		atomic_inc(&data->use);
+
+	return true;
+}
+
+static void xt_cluster_mt_destroy(const struct xt_mtdtor_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+	struct xt_cluster_internal *data = info->data;
+
+	if (atomic_dec_and_test(&data->use)) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry(info->proc_name, proc_cluster);
+#endif
+		kfree(info->data);
+	}
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.destroy	= xt_cluster_mt_destroy,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	int ret;
+
+#ifdef CONFIG_PROC_FS
+	proc_cluster = proc_mkdir("cluster", proc_net_netfilter);
+	if (!proc_cluster)
+		return -ENOMEM;
+#endif
+	ret = xt_register_match(&xt_cluster_match);
+	if (ret < 0) {
+#ifdef CONFIG_PROC_FS
+		remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+		return ret;
+	}
+	return 0;
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_entry("cluster", proc_net_netfilter);
+#endif
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-14 19:29 Pablo Neira Ayuso
  2009-02-14 20:28 ` Jan Engelhardt
@ 2009-02-16 10:56 ` Patrick McHardy
  2009-02-16 14:01   ` Pablo Neira Ayuso
  1 sibling, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 10:56 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> This patch adds the iptables cluster match. This match can be used
> to deploy gateway and back-end load-sharing clusters.

I'm mixing comments to the cluster match and the ARP mangle target.

> Assuming that all the nodes see all packets (see below for an
> example on how to do that if your switch does not allow this), the
> cluster match decides if this node has to handle a packet given:
> 
> 	jhash(source IP) % total_nodes == node_id
> 
> For related connections, the master conntrack is used. The following
> is an example of its use to deploy a gateway cluster composed of two
> nodes (where this is the node 1):
> 
> iptables -I PREROUTING -t mangle -i eth1 -m cluster \
> 	--cluster-total-nodes 2 --cluster-local-node 1 \
> 	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth1 \
> 	-m mark ! --mark 0xffff -j DROP
> iptables -A PREROUTING -t mangle -i eth2 -m cluster \
> 	--cluster-total-nodes 2 --cluster-local-node 1 \
> 	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth2 \
> 	-m mark ! --mark 0xffff -j DROP
> 
> And the following commands to make all nodes see the same packets:
> 
> ip maddr add 01:00:5e:00:01:01 dev eth1
> ip maddr add 01:00:5e:00:01:02 dev eth2
> arptables -I OUTPUT -o eth1 --h-length 6 \
> 	-j mangle --mangle-mac-s 01:00:5e:00:01:01
> arptables -I INPUT -i eth1 --h-length 6 \
> 	--destination-mac 01:00:5e:00:01:01 \
> 	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

Mhh, is the saving of one or two characters really worth these
deviations from the kind-of established naming scheme? Its hard
to remember all these minor differences in my opinion.

> arptables -I OUTPUT -o eth2 --h-length 6 \
> 	-j mangle --mangle-mac-s 01:00:5e:00:01:02
> arptables -I INPUT -i eth2 --h-length 6 \
> 	--destination-mac 01:00:5e:00:01:02 \
> 	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
> 
> In the case of TCP connections, pickup facility has to be disabled
> to avoid marking TCP ACK packets coming in the reply direction as
> valid.
> 
> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

I'm not sure I understand this. You *don't* want to mark them
as valid, and you need to disable pickup for this?

Unrelated to this patch, but maybe the target would also be
better named "NAT" instead of the much more generic term "mangle".
Why is it using lower case letters btw?

> The match also provides a /proc entry under:
> 
> /proc/sys/net/netfilter/cluster/$PROC_NAME
> 
> where PROC_NAME is set via --cluster-proc-name. This is useful to
> include possible cluster reconfigurations via fail-over scripts.
> Assuming that this is the node 1, if node 2 is down, you can add
> node 2 to your node-mask as follows:
> 
> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME

Does this provide anything you can't do by replacing the rule
itself?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16  9:31 ` Pablo Neira Ayuso
@ 2009-02-16 12:13   ` Jan Engelhardt
  2009-02-16 12:17     ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-16 12:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, kaber


On Monday 2009-02-16 10:31, Pablo Neira Ayuso wrote:

>Pablo Neira Ayuso wrote:
>
>> +static int xt_cluster_seq_show(struct seq_file *s, void *v)
>> +{
>> +	unsigned long *mask = v;
>> +	seq_printf(s, "0x%8lx\n", *mask);
>                         ^^^
>Damn, this needs the dot before the 8 to fill with zero the empty
>spaces. Jan's suggestion was wrong and I forgot to check this. I'll send
>the patch again.

Wait wait, let me requote myself:

>'.' does not make sense with non-string,non-floating point numbers 
>(though it is a stdc feature it seems). I'd say "0x%08lx", for clarity.

The second '0' in "0x%08lx" is missing.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 12:13   ` Jan Engelhardt
@ 2009-02-16 12:17     ` Patrick McHardy
  0 siblings, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 12:17 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Pablo Neira Ayuso, netfilter-devel

Jan Engelhardt wrote:
> On Monday 2009-02-16 10:31, Pablo Neira Ayuso wrote:
> 
>> Pablo Neira Ayuso wrote:
>>
>>> +static int xt_cluster_seq_show(struct seq_file *s, void *v)
>>> +{
>>> +	unsigned long *mask = v;
>>> +	seq_printf(s, "0x%8lx\n", *mask);
>>                         ^^^
>> Damn, this needs the dot before the 8 to fill with zero the empty
>> spaces. Jan's suggestion was wrong and I forgot to check this. I'll send
>> the patch again.
> 
> Wait wait, let me requote myself:
> 
>> '.' does not make sense with non-string,non-floating point numbers 
>> (though it is a stdc feature it seems). I'd say "0x%08lx", for clarity.
> 
> The second '0' in "0x%08lx" is missing.

Lets discuss the need for this interface first before fixing it :)
It looks like a way to get around using (potentially slow) rule
replacement, which doesn't make sense to add on a per-module base.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 10:56 ` Patrick McHardy
@ 2009-02-16 14:01   ` Pablo Neira Ayuso
  2009-02-16 14:03     ` Patrick McHardy
  2009-02-16 17:13     ` Jan Engelhardt
  0 siblings, 2 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 14:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> This patch adds the iptables cluster match. This match can be used
>> to deploy gateway and back-end load-sharing clusters.
> 
> I'm mixing comments to the cluster match and the ARP mangle target.
> 
>> Assuming that all the nodes see all packets (see below for an
>> example on how to do that if your switch does not allow this), the
>> cluster match decides if this node has to handle a packet given:
>>
>>     jhash(source IP) % total_nodes == node_id
>>
>> For related connections, the master conntrack is used. The following
>> is an example of its use to deploy a gateway cluster composed of two
>> nodes (where this is the node 1):
>>
>> iptables -I PREROUTING -t mangle -i eth1 -m cluster \
>>     --cluster-total-nodes 2 --cluster-local-node 1 \
>>     --cluster-proc-name eth1 -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth1 \
>>     -m mark ! --mark 0xffff -j DROP
>> iptables -A PREROUTING -t mangle -i eth2 -m cluster \
>>     --cluster-total-nodes 2 --cluster-local-node 1 \
>>     --cluster-proc-name eth2 -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth2 \
>>     -m mark ! --mark 0xffff -j DROP
>>
>> And the following commands to make all nodes see the same packets:
>>
>> ip maddr add 01:00:5e:00:01:01 dev eth1
>> ip maddr add 01:00:5e:00:01:02 dev eth2
>> arptables -I OUTPUT -o eth1 --h-length 6 \
>>     -j mangle --mangle-mac-s 01:00:5e:00:01:01
>> arptables -I INPUT -i eth1 --h-length 6 \
>>     --destination-mac 01:00:5e:00:01:01 \
>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
> 
> Mhh, is the saving of one or two characters really worth these
> deviations from the kind-of established naming scheme? Its hard
> to remember all these minor differences in my opinion.

Hm, you mean the name "mangle" or the name of the option 
"--mangle-mac-d"? This is what we currently have in kernel mainline and 
arptables userspace, it's not my fault :). I can send you a patch to fix 
it with a consistent naming without breaking backward compatibility both 
in kernel and user-space.

>> arptables -I OUTPUT -o eth2 --h-length 6 \
>>     -j mangle --mangle-mac-s 01:00:5e:00:01:02
>> arptables -I INPUT -i eth2 --h-length 6 \
>>     --destination-mac 01:00:5e:00:01:02 \
>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
>>
>> In the case of TCP connections, pickup facility has to be disabled
>> to avoid marking TCP ACK packets coming in the reply direction as
>> valid.
>>
>> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
> 
> I'm not sure I understand this. You *don't* want to mark them
> as valid, and you need to disable pickup for this?

If TCP pickup is enabled, one TCP ACK packet coming in the reply 
direction enters TCP ESTABLISHED state. Since that's a valid 
state-transition, the cluster match will consider that this is part of a 
connection that this node is handling since it's a valid 
state-transition. The cluster match does not mark packets that trigger 
invalid state transitions.

> Unrelated to this patch, but maybe the target would also be
> better named "NAT" instead of the much more generic term "mangle".
> Why is it using lower case letters btw?

No idea who has done this, but I can send you a patch to fix this naming 
without breaking backward.

>> The match also provides a /proc entry under:
>>
>> /proc/sys/net/netfilter/cluster/$PROC_NAME
>>
>> where PROC_NAME is set via --cluster-proc-name. This is useful to
>> include possible cluster reconfigurations via fail-over scripts.
>> Assuming that this is the node 1, if node 2 is down, you can add
>> node 2 to your node-mask as follows:
>>
>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
> 
> Does this provide anything you can't do by replacing the rule
> itself?

Yes, the nodes in the cluster are identifies by an ID, the rule allows 
you to specify one ID. Say you have two cluster nodes, one with ID 1, 
and the other with ID 2. If the cluster node with ID 1 goes down, you 
can echo +1 to node with ID 2 so that it will handle packets going to 
node with ID 1 and ID 2. Of course, you need conntrackd to allow node ID 
2 recover the filtering.

Now, I see that there is a possible optimization that consists of 
checking if one node has its node mask all set with regards to the total 
number of nodes, so that hashing can be skipped. But that's something 
that we can add later I think.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:01   ` Pablo Neira Ayuso
@ 2009-02-16 14:03     ` Patrick McHardy
  2009-02-16 14:30       ` Pablo Neira Ayuso
  2009-02-16 17:13     ` Jan Engelhardt
  1 sibling, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 14:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>>> ip maddr add 01:00:5e:00:01:01 dev eth1
>>> ip maddr add 01:00:5e:00:01:02 dev eth2
>>> arptables -I OUTPUT -o eth1 --h-length 6 \
>>>     -j mangle --mangle-mac-s 01:00:5e:00:01:01
>>> arptables -I INPUT -i eth1 --h-length 6 \
>>>     --destination-mac 01:00:5e:00:01:01 \
>>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
>>
>> Mhh, is the saving of one or two characters really worth these
>> deviations from the kind-of established naming scheme? Its hard
>> to remember all these minor differences in my opinion.
> 
> Hm, you mean the name "mangle" or the name of the option 
> "--mangle-mac-d"? This is what we currently have in kernel mainline and 
> arptables userspace, it's not my fault :). I can send you a patch to fix 
> it with a consistent naming without breaking backward compatibility both 
> in kernel and user-space.

Great, I wasn't aware that this already existed in userspace :)

>>> In the case of TCP connections, pickup facility has to be disabled
>>> to avoid marking TCP ACK packets coming in the reply direction as
>>> valid.
>>>
>>> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
>>
>> I'm not sure I understand this. You *don't* want to mark them
>> as valid, and you need to disable pickup for this?
> 
> If TCP pickup is enabled, one TCP ACK packet coming in the reply 
> direction enters TCP ESTABLISHED state. Since that's a valid 
> state-transition, the cluster match will consider that this is part of a 
> connection that this node is handling since it's a valid 
> state-transition. The cluster match does not mark packets that trigger 
> invalid state transitions.

Why use conntrack at all? Shouldn't the cluster match simply
filter out all packets not for this cluster and thats it?
You stated it needs conntrack to get a constant tuple, but I
don't see why the conntrack tuple would differ from the data
that you can gather from the packet headers.

>>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
>>
>> Does this provide anything you can't do by replacing the rule
>> itself?
> 
> Yes, the nodes in the cluster are identifies by an ID, the rule allows 
> you to specify one ID. Say you have two cluster nodes, one with ID 1, 
> and the other with ID 2. If the cluster node with ID 1 goes down, you 
> can echo +1 to node with ID 2 so that it will handle packets going to 
> node with ID 1 and ID 2. Of course, you need conntrackd to allow node ID 
> 2 recover the filtering.

I see. That kind of makes sense, but if you're running a
synchronization daemon anyways, you might as well renumber
all nodes so you still have proper balancing, right?

> Now, I see that there is a possible optimization that consists of 
> checking if one node has its node mask all set with regards to the total 
> number of nodes, so that hashing can be skipped. But that's something 
> that we can add later I think.

Indeed.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:03     ` Patrick McHardy
@ 2009-02-16 14:30       ` Pablo Neira Ayuso
  2009-02-16 15:01         ` Patrick McHardy
                           ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 14:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>>> ip maddr add 01:00:5e:00:01:01 dev eth1
>>>> ip maddr add 01:00:5e:00:01:02 dev eth2
>>>> arptables -I OUTPUT -o eth1 --h-length 6 \
>>>>     -j mangle --mangle-mac-s 01:00:5e:00:01:01
>>>> arptables -I INPUT -i eth1 --h-length 6 \
>>>>     --destination-mac 01:00:5e:00:01:01 \
>>>>     -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
>>>
>>> Mhh, is the saving of one or two characters really worth these
>>> deviations from the kind-of established naming scheme? Its hard
>>> to remember all these minor differences in my opinion.
>>
>> Hm, you mean the name "mangle" or the name of the option 
>> "--mangle-mac-d"? This is what we currently have in kernel mainline 
>> and arptables userspace, it's not my fault :). I can send you a patch 
>> to fix it with a consistent naming without breaking backward 
>> compatibility both in kernel and user-space.
> 
> Great, I wasn't aware that this already existed in userspace :)

Yes, it's hosted by the ebtables projects. That tool really need some 
care. It works but I don't know if it's actively maintained. Probably we 
can offer hosting for it in git.netfilter.org. I'll investigate this.

>>>> In the case of TCP connections, pickup facility has to be disabled
>>>> to avoid marking TCP ACK packets coming in the reply direction as
>>>> valid.
>>>>
>>>> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
>>>
>>> I'm not sure I understand this. You *don't* want to mark them
>>> as valid, and you need to disable pickup for this?
>>
>> If TCP pickup is enabled, one TCP ACK packet coming in the reply 
>> direction enters TCP ESTABLISHED state. Since that's a valid 
>> state-transition, the cluster match will consider that this is part of 
>> a connection that this node is handling since it's a valid 
>> state-transition. The cluster match does not mark packets that trigger 
>> invalid state transitions.
> 
> Why use conntrack at all? Shouldn't the cluster match simply
> filter out all packets not for this cluster and thats it?
> You stated it needs conntrack to get a constant tuple, but I
> don't see why the conntrack tuple would differ from the data
> that you can gather from the packet headers.

No, source NAT connections would have different headers. A -> B for 
original, and B -> FW for reply direction. Thus, I cannot apply the same 
hashing for packets going in the original and the reply direction. 
Moreover, if this is packet-based, think also about possible asymmetric 
filtering: original traffic direction filtered by node 1 but reply 
traffic filtered by node 2. That will not work for a stateful firewall.

>>>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
>>>
>>> Does this provide anything you can't do by replacing the rule
>>> itself?
>>
>> Yes, the nodes in the cluster are identifies by an ID, the rule allows 
>> you to specify one ID. Say you have two cluster nodes, one with ID 1, 
>> and the other with ID 2. If the cluster node with ID 1 goes down, you 
>> can echo +1 to node with ID 2 so that it will handle packets going to 
>> node with ID 1 and ID 2. Of course, you need conntrackd to allow node 
>> ID 2 recover the filtering.
> 
> I see. That kind of makes sense, but if you're running a
> synchronization daemon anyways, you might as well renumber
> all nodes so you still have proper balancing, right?

Indeed, the daemon may also add a new rule for the node that has gone 
down but that results in another extra hash operation to mark it or not 
(one extra hash per rule) :(.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:30       ` Pablo Neira Ayuso
@ 2009-02-16 15:01         ` Patrick McHardy
  2009-02-16 15:14         ` Pablo Neira Ayuso
  2009-02-16 17:17         ` Jan Engelhardt
  2 siblings, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 15:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Why use conntrack at all? Shouldn't the cluster match simply
>> filter out all packets not for this cluster and thats it?
>> You stated it needs conntrack to get a constant tuple, but I
>> don't see why the conntrack tuple would differ from the data
>> that you can gather from the packet headers.
> 
> No, source NAT connections would have different headers. A -> B for 
> original, and B -> FW for reply direction. Thus, I cannot apply the same 
> hashing for packets going in the original and the reply direction. 

Ah I'm beginning to understand the topology I think :) Actually
it seems its only combined SNAT+DNAT on one connections thats a
problem, with only one of both you could tell the cluster match
to look at either source or destination address (the unchanged one)
in the opposite direction. Only if the opposite direction is
completely unrelated from a non-conntrack view we can't deal with
it. Anyways, your way to deal with this seems fine to me.

>>>>> echo +2 > /proc/sys/net/netfilter/cluster/$PROC_NAME
>>>>
>>>> Does this provide anything you can't do by replacing the rule
>>>> itself?
>>>
>>> Yes, the nodes in the cluster are identifies by an ID, the rule 
>>> allows you to specify one ID. Say you have two cluster nodes, one 
>>> with ID 1, and the other with ID 2. If the cluster node with ID 1 
>>> goes down, you can echo +1 to node with ID 2 so that it will handle 
>>> packets going to node with ID 1 and ID 2. Of course, you need 
>>> conntrackd to allow node ID 2 recover the filtering.
>>
>> I see. That kind of makes sense, but if you're running a
>> synchronization daemon anyways, you might as well renumber
>> all nodes so you still have proper balancing, right?
> 
> Indeed, the daemon may also add a new rule for the node that has gone 
> down but that results in another extra hash operation to mark it or not 
> (one extra hash per rule) :(.

Thats not what I meant. By having a single node handle all connections
from the one which went down, you have an imbalance in load
distribution. The nodes are synchronized, so they could just all
replace their cluster match with an updated number of nodes.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 15:14         ` Pablo Neira Ayuso
@ 2009-02-16 15:10           ` Patrick McHardy
  2009-02-16 15:27             ` Pablo Neira Ayuso
  2009-02-17 10:46             ` Pablo Neira Ayuso
  0 siblings, 2 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 15:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>> I see. That kind of makes sense, but if you're running a
>>> synchronization daemon anyways, you might as well renumber
>>> all nodes so you still have proper balancing, right?
> 
> Hm, I was not replying to your question ;). Right, the renumbering also 
> requires getting the states back to the original node. We can use the 
> same hashing approach in userspace to know which states belong to 
> original node that has come back to life when it requests a 
> resynchronization.
> 
>> Indeed, the daemon may also add a new rule for the node that has gone 
>> down but that results in another extra hash operation to mark it or 
>> not (one extra hash per rule) :(.
> 
> This is not true. We may have something like this (assuming two nodes):

To whom are you replying now? :)

> if no mark set and hash % 2 == 0, accept
> if no mark set and hash % 2 == 1, accept
> if no mark set, drop
> 
> So we can still do this adding rules with the iptables interface. But 
> still having the /proc looks like a simple interface for this.

I'm sure someone would argue that changing TCP port numbers of
the tcp match through proc would be a nice and simple interface.
The fact though is that we have an interface for handing a
configuration to the kernel and this is clearly a configuration
parameter. We're missing a proper way to use it in userspace
from within programs (well, hopefully not for long anymore),
but that needs to be fixed in userspace.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:30       ` Pablo Neira Ayuso
  2009-02-16 15:01         ` Patrick McHardy
@ 2009-02-16 15:14         ` Pablo Neira Ayuso
  2009-02-16 15:10           ` Patrick McHardy
  2009-02-16 17:17         ` Jan Engelhardt
  2 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 15:14 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> I see. That kind of makes sense, but if you're running a
>> synchronization daemon anyways, you might as well renumber
>> all nodes so you still have proper balancing, right?

Hm, I was not replying to your question ;). Right, the renumbering also 
requires getting the states back to the original node. We can use the 
same hashing approach in userspace to know which states belong to 
original node that has come back to life when it requests a 
resynchronization.

> Indeed, the daemon may also add a new rule for the node that has gone 
> down but that results in another extra hash operation to mark it or not 
> (one extra hash per rule) :(.

This is not true. We may have something like this (assuming two nodes):

if no mark set and hash % 2 == 0, accept
if no mark set and hash % 2 == 1, accept
if no mark set, drop

So we can still do this adding rules with the iptables interface. But 
still having the /proc looks like a simple interface for this.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 15:10           ` Patrick McHardy
@ 2009-02-16 15:27             ` Pablo Neira Ayuso
  2009-02-17 10:46             ` Pablo Neira Ayuso
  1 sibling, 0 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-16 15:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Pablo Neira Ayuso wrote:
>>> Patrick McHardy wrote:
>>>> I see. That kind of makes sense, but if you're running a
>>>> synchronization daemon anyways, you might as well renumber
>>>> all nodes so you still have proper balancing, right?
>>
>> Hm, I was not replying to your question ;). Right, the renumbering 
>> also requires getting the states back to the original node. We can use 
>> the same hashing approach in userspace to know which states belong to 
>> original node that has come back to life when it requests a 
>> resynchronization.
>>
>>> Indeed, the daemon may also add a new rule for the node that has gone 
>>> down but that results in another extra hash operation to mark it or 
>>> not (one extra hash per rule) :(.
>>
>> This is not true. We may have something like this (assuming two nodes):
> 
> To whom are you replying now? :)

To myself, never mind :)

>> if no mark set and hash % 2 == 0, accept
>> if no mark set and hash % 2 == 1, accept
>> if no mark set, drop
>>
>> So we can still do this adding rules with the iptables interface. But 
>> still having the /proc looks like a simple interface for this.
> 
> I'm sure someone would argue that changing TCP port numbers of
> the tcp match through proc would be a nice and simple interface.
> The fact though is that we have an interface for handing a
> configuration to the kernel and this is clearly a configuration
> parameter. We're missing a proper way to use it in userspace
> from within programs (well, hopefully not for long anymore),
> but that needs to be fixed in userspace.

Right right, I have no more arguments to support the /proc interface ;). 
I'll send you a patch without it late at night.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:01   ` Pablo Neira Ayuso
  2009-02-16 14:03     ` Patrick McHardy
@ 2009-02-16 17:13     ` Jan Engelhardt
  2009-02-16 17:16       ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-16 17:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel


On Monday 2009-02-16 15:01, Pablo Neira Ayuso wrote:
>>>
>>> And the following commands to make all nodes see the same packets:
>>>
>>> ip maddr add 01:00:5e:00:01:01 dev eth1
>>> ip maddr add 01:00:5e:00:01:02 dev eth2
>>> arptables -I OUTPUT -o eth1 --h-length 6 \
>>>    -j mangle --mangle-mac-s 01:00:5e:00:01:01
>>> arptables -I INPUT -i eth1 --h-length 6 \
>>>    --destination-mac 01:00:5e:00:01:01 \
>>>    -j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
>>
>> Mhh, is the saving of one or two characters really worth these
>> deviations from the kind-of established naming scheme? Its hard
>> to remember all these minor differences in my opinion.
>
> Hm, you mean the name "mangle" or the name of the option "--mangle-mac-d"?

In case of --mangle-mac-d, getopt automatically handles
abbreviations (the best example I can think of where this works
is `quilt ref --s --d` for --sort --diffstat).

>> Unrelated to this patch, but maybe the target would also be
>> better named "NAT" instead of the much more generic term "mangle".
>> Why is it using lower case letters btw?
>
> No idea who has done this,

Lowercasing targets was the original author's (Bart De Schuymer's)
deed ;-)

I have nothing against lowercase targets, in fact, many a user seems
to so not get case-sensitity right — “INPUT” vs. “input” (built-in
chain names), “MASQUERADE” vs. “masquerade” (targets), to name some —
that I would support lowercasing all targets...— if it were not for
the clash between MARK (target) and mark (match), for example.
Though, calling one of them “mark_m” or underscore-something to avoid
the clash however, is what I dislike in favor of uppercasing target
names.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 17:13     ` Jan Engelhardt
@ 2009-02-16 17:16       ` Patrick McHardy
  2009-02-16 17:22         ` Jan Engelhardt
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-16 17:16 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Pablo Neira Ayuso, netfilter-devel

Jan Engelhardt wrote:
>>> Unrelated to this patch, but maybe the target would also be
>>> better named "NAT" instead of the much more generic term "mangle".
>>> Why is it using lower case letters btw?
>> No idea who has done this,
> 
> Lowercasing targets was the original author's (Bart De Schuymer's)
> deed ;-)
> 
> I have nothing against lowercase targets, in fact, many a user seems
> to so not get case-sensitity right — “INPUT” vs. “input” (built-in
> chain names), “MASQUERADE” vs. “masquerade” (targets), to name some —
> that I would support lowercasing all targets...— if it were not for
> the clash between MARK (target) and mark (match), for example.
> Though, calling one of them “mark_m” or underscore-something to avoid
> the clash however, is what I dislike in favor of uppercasing target
> names.

Its actually just exposing an implementation detail of how match
and target modules are located on the filesystem. The meaning
(match or target) can be deduced from "-j" and "-m".
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 14:30       ` Pablo Neira Ayuso
  2009-02-16 15:01         ` Patrick McHardy
  2009-02-16 15:14         ` Pablo Neira Ayuso
@ 2009-02-16 17:17         ` Jan Engelhardt
  2 siblings, 0 replies; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-16 17:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel


On Monday 2009-02-16 15:30, Pablo Neira Ayuso wrote:
>
>>> Hm, you mean the name "mangle" or the name of the option
>>> "--mangle-mac-d"? This is what we currently have in kernel
>>> mainline and arptables userspace, it's not my fault :). I can
>>> send you a patch to fix it with a consistent naming without
>>> breaking backward compatibility both in kernel and user-space.
>>
>> Great, I wasn't aware that this already existed in userspace :)
>
> Yes, it's hosted by the ebtables projects. That tool really need
> some care.

It would indeed. The problem though is, that I, who originally
wanted to unify arptables into iptables (as a start, because it
has not diverged as much as ebtables), have thrown the towel on
that on the grounds of libiptc being such a beast[1], and that 
developers' time would be better spent on (fasten seatbelts)
duplicating xtables/iptables (once more) with the goal of
creating an NFPROTO-agnostic table structure instead that
obsoletes ip,ip6,arp and ebtables in one go.

[1] http://marc.info/?t=122633592000011&r=1&w=2

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 17:16       ` Patrick McHardy
@ 2009-02-16 17:22         ` Jan Engelhardt
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Engelhardt @ 2009-02-16 17:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Pablo Neira Ayuso, netfilter-devel


On Monday 2009-02-16 18:16, Patrick McHardy wrote:
>>
>> Lowercasing targets was the original author's (Bart De Schuymer's)
>> deed ;-)
>>
>>[...]
>> Though, calling one of them “mark_m” or underscore-something to avoid
>> the clash however, is what I dislike in favor of uppercasing target
>> names.
>
> Its actually just exposing an implementation detail of how match
> and target modules are located on the filesystem. The meaning
> (match or target) can be deduced from "-j" and "-m".
>
Too bad we cannot reasonably change it now.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-16 15:10           ` Patrick McHardy
  2009-02-16 15:27             ` Pablo Neira Ayuso
@ 2009-02-17 10:46             ` Pablo Neira Ayuso
  2009-02-17 10:50               ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-17 10:46 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Hi Patrick,

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> So we can still do this adding rules with the iptables interface. But
>> still having the /proc looks like a simple interface for this.
> 
> I'm sure someone would argue that changing TCP port numbers of
> the tcp match through proc would be a nice and simple interface.
> The fact though is that we have an interface for handing a
> configuration to the kernel and this is clearly a configuration
> parameter. We're missing a proper way to use it in userspace
> from within programs (well, hopefully not for long anymore),
> but that needs to be fixed in userspace.

While reworking this, I think that I have found one argument to support
the /proc interface that looks interesting in terms of resource
consumption. Assume that we have three nodes, where two of them are
down, thus, the only one active would have the following rule-set:

iptables -A PREROUTING -t mangle -i eth0 -m cluster \
        --cluster-total-nodes 3 --cluster-local-node 1 \
        -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth0 -m cluster \
        --cluster-total-nodes 3 --cluster-local-node 2 \
        -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth0 -m cluster \
        --cluster-total-nodes 3 --cluster-local-node 3 \
        -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth0 \
        -m mark ! --mark 0xffff -j DROP

Look at the worst case: if the packet goes to node 3, the hashing must
be done to check if the packet belongs to node 1 and node 2. Thus, the
hashing is done three times. This makes the cluster hashing O(n) where n
is the number of cluster nodes.

A possible solution (that thinking it well, I don't like too much yet)
would be to convert this to a HASHMARK target that will store the result
of the hash in the skbuff mark, but the problem is that it would require
a reserved space for hashmarks since they may clash with other
user-defined marks.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-17 10:46             ` Pablo Neira Ayuso
@ 2009-02-17 10:50               ` Patrick McHardy
  2009-02-17 13:50                 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-17 10:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> While reworking this, I think that I have found one argument to support
> the /proc interface that looks interesting in terms of resource
> consumption. Assume that we have three nodes, where two of them are
> down, thus, the only one active would have the following rule-set:
> 
> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>         --cluster-total-nodes 3 --cluster-local-node 1 \
>         -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>         --cluster-total-nodes 3 --cluster-local-node 2 \
>         -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>         --cluster-total-nodes 3 --cluster-local-node 3 \
>         -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth0 \
>         -m mark ! --mark 0xffff -j DROP
> 
> Look at the worst case: if the packet goes to node 3, the hashing must
> be done to check if the packet belongs to node 1 and node 2. Thus, the
> hashing is done three times. This makes the cluster hashing O(n) where n
> is the number of cluster nodes.
> 
> A possible solution (that thinking it well, I don't like too much yet)
> would be to convert this to a HASHMARK target that will store the result
> of the hash in the skbuff mark, but the problem is that it would require
> a reserved space for hashmarks since they may clash with other
> user-defined marks.

That sounds a bit like a premature optimization. What I don't get
is why you don't simply set cluster-total-nodes to one when two
are down or remove the rule entirely.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-17 10:50               ` Patrick McHardy
@ 2009-02-17 13:50                 ` Pablo Neira Ayuso
  2009-02-17 19:45                   ` Vincent Bernat
  2009-02-18 10:13                   ` Patrick McHardy
  0 siblings, 2 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-17 13:50 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> While reworking this, I think that I have found one argument to support
>> the /proc interface that looks interesting in terms of resource
>> consumption. Assume that we have three nodes, where two of them are
>> down, thus, the only one active would have the following rule-set:
>>
>> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>>         --cluster-total-nodes 3 --cluster-local-node 1 \
>>         -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>>         --cluster-total-nodes 3 --cluster-local-node 2 \
>>         -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth0 -m cluster \
>>         --cluster-total-nodes 3 --cluster-local-node 3 \
>>         -j MARK --set-mark 0xffff
>> iptables -A PREROUTING -t mangle -i eth0 \
>>         -m mark ! --mark 0xffff -j DROP
>>
>> Look at the worst case: if the packet goes to node 3, the hashing must
>> be done to check if the packet belongs to node 1 and node 2. Thus, the
>> hashing is done three times. This makes the cluster hashing O(n) where n
>> is the number of cluster nodes.
>>
>> A possible solution (that thinking it well, I don't like too much yet)
>> would be to convert this to a HASHMARK target that will store the result
>> of the hash in the skbuff mark, but the problem is that it would require
>> a reserved space for hashmarks since they may clash with other
>> user-defined marks.
> 
> That sounds a bit like a premature optimization. What I don't get
> is why you don't simply set cluster-total-nodes to one when two
> are down or remove the rule entirely.

Indeed, but in practise existing failover daemons (at least those
free/opensource that I know) doesn't show that "intelligent" behaviour
since they initially (according to the configuration file) assign the
resources to each node, and if one node fails, it assigns the
corresponding resources to another sane node (ie. the daemon runs a
script with the corresponding iptables rules).

Re-adjusting cluster-total-nodes and cluster-local-nodes options (eg. if
one cluster node goes down and there are only two nodes alive, change
the rule-set to have only two nodes) seems indeed the natural way to go
since the alive cluster nodes would share the workload that the failing
node has left. However, as said, existing failover daemons only select
one new master to recover what a failing node was doing, thus, only one
runs the script to inject the states into the kernel.

Therefore AFAICS, without the /proc interface, I would need one iptables
rule per cluster-local-node handled, and so it's still the possible
sub-optimal situation when one or several node fails.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-17 13:50                 ` Pablo Neira Ayuso
@ 2009-02-17 19:45                   ` Vincent Bernat
  2009-02-18 10:14                     ` Patrick McHardy
  2009-02-18 10:13                   ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Vincent Bernat @ 2009-02-17 19:45 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel

OoO  En ce début  d'après-midi nuageux  du mardi  17 février  2009, vers
14:50, Pablo Neira Ayuso <pablo@netfilter.org> disait :

> Re-adjusting cluster-total-nodes and cluster-local-nodes options (eg. if
> one cluster node goes down and there are only two nodes alive, change
> the rule-set to have only two nodes) seems indeed the natural way to go
> since the alive cluster nodes would share the workload that the failing
> node has left. However, as said, existing failover daemons only select
> one new master to recover what a failing node was doing, thus, only one
> runs the script to inject the states into the kernel.

Moreover, some of  them (the one that are using  VRRP for example) don't
report the total number of nodes  still alive. As a user, I would prefer
a simple /proc interface to add/remove a node.
-- 
Make sure all variables are initialised before use.
            - The Elements of Programming Style (Kernighan & Plauger)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-17 13:50                 ` Pablo Neira Ayuso
  2009-02-17 19:45                   ` Vincent Bernat
@ 2009-02-18 10:13                   ` Patrick McHardy
  2009-02-18 11:06                     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-18 10:13 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>>> A possible solution (that thinking it well, I don't like too much yet)
>>> would be to convert this to a HASHMARK target that will store the result
>>> of the hash in the skbuff mark, but the problem is that it would require
>>> a reserved space for hashmarks since they may clash with other
>>> user-defined marks.
>> That sounds a bit like a premature optimization. What I don't get
>> is why you don't simply set cluster-total-nodes to one when two
>> are down or remove the rule entirely.
> 
> Indeed, but in practise existing failover daemons (at least those
> free/opensource that I know) doesn't show that "intelligent" behaviour
> since they initially (according to the configuration file) assign the
> resources to each node, and if one node fails, it assigns the
> corresponding resources to another sane node (ie. the daemon runs a
> script with the corresponding iptables rules).
> 
> Re-adjusting cluster-total-nodes and cluster-local-nodes options (eg. if
> one cluster node goes down and there are only two nodes alive, change
> the rule-set to have only two nodes) seems indeed the natural way to go
> since the alive cluster nodes would share the workload that the failing
> node has left. However, as said, existing failover daemons only select
> one new master to recover what a failing node was doing, thus, only one
> runs the script to inject the states into the kernel.
> 
> Therefore AFAICS, without the /proc interface, I would need one iptables
> rule per cluster-local-node handled, and so it's still the possible
> sub-optimal situation when one or several node fails.

OK, that explains why you want to handle it this way. I don't want
to merge the proc file part though, so until the daemons get smarter,
people will have to use multiple rules.

BTW, I recently looked into TIPC, its incredibly easy to use since
it deals with dead-node dectection etc internally and all you need
to do is exchange a few messages. Might be quite easy to write a
smarter failover daemon.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-17 19:45                   ` Vincent Bernat
@ 2009-02-18 10:14                     ` Patrick McHardy
  0 siblings, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-18 10:14 UTC (permalink / raw)
  To: Vincent Bernat; +Cc: Pablo Neira Ayuso, netfilter-devel

Vincent Bernat wrote:
> OoO  En ce début  d'après-midi nuageux  du mardi  17 février  2009, vers
> 14:50, Pablo Neira Ayuso <pablo@netfilter.org> disait :
> 
>> Re-adjusting cluster-total-nodes and cluster-local-nodes options (eg. if
>> one cluster node goes down and there are only two nodes alive, change
>> the rule-set to have only two nodes) seems indeed the natural way to go
>> since the alive cluster nodes would share the workload that the failing
>> node has left. However, as said, existing failover daemons only select
>> one new master to recover what a failing node was doing, thus, only one
>> runs the script to inject the states into the kernel.
> 
> Moreover, some of  them (the one that are using  VRRP for example) don't
> report the total number of nodes  still alive. As a user, I would prefer
> a simple /proc interface to add/remove a node.

That "simple" argument really doesn't cut it, there's nothing inherently
more complicated in executing an iptables command compared to executing
an echo command. Most likely some program is going to do it anyways.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-18 10:13                   ` Patrick McHardy
@ 2009-02-18 11:06                     ` Pablo Neira Ayuso
  2009-02-18 11:14                       ` Patrick McHardy
  2009-02-18 17:20                       ` Vincent Bernat
  0 siblings, 2 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-18 11:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>>> A possible solution (that thinking it well, I don't like too much yet)
>>>> would be to convert this to a HASHMARK target that will store the
>>>> result
>>>> of the hash in the skbuff mark, but the problem is that it would
>>>> require
>>>> a reserved space for hashmarks since they may clash with other
>>>> user-defined marks.
>>> That sounds a bit like a premature optimization. What I don't get
>>> is why you don't simply set cluster-total-nodes to one when two
>>> are down or remove the rule entirely.
>>
>> Indeed, but in practise existing failover daemons (at least those
>> free/opensource that I know) doesn't show that "intelligent" behaviour
>> since they initially (according to the configuration file) assign the
>> resources to each node, and if one node fails, it assigns the
>> corresponding resources to another sane node (ie. the daemon runs a
>> script with the corresponding iptables rules).
>>
>> Re-adjusting cluster-total-nodes and cluster-local-nodes options (eg. if
>> one cluster node goes down and there are only two nodes alive, change
>> the rule-set to have only two nodes) seems indeed the natural way to go
>> since the alive cluster nodes would share the workload that the failing
>> node has left. However, as said, existing failover daemons only select
>> one new master to recover what a failing node was doing, thus, only one
>> runs the script to inject the states into the kernel.
>>
>> Therefore AFAICS, without the /proc interface, I would need one iptables
>> rule per cluster-local-node handled, and so it's still the possible
>> sub-optimal situation when one or several node fails.
> 
> OK, that explains why you want to handle it this way. I don't want
> to merge the proc file part though, so until the daemons get smarter,
> people will have to use multiple rules.

:(

> BTW, I recently looked into TIPC, its incredibly easy to use since
> it deals with dead-node dectection etc internally and all you need
> to do is exchange a few messages. Might be quite easy to write a
> smarter failover daemon.

I see, I don't have more convincing arguments that "I would also need
time for that but in the meanwhile, please allow this". Well, failover
daemons are delicate pieces of software, they have to be stable,
well-tested, bug-free, give timely responses. Still TIPC is experimental
and I guess that the dead-node detection is only layer 3/4 based on
heartbeats. Dead-node detection is a tricky issue, the more you can
perform different layer checkings, the more increase chances to make
wrong decisions that may lead to inconsistent situations and tons of
problems. VRRP is the current standard and this one of his limitations,
and so on.

Well, if you are not going to accept the /proc interface, not matter
what I can argument, I give up on this ;)

Anyway, probably, this is a premature optimization (but worth?). Some
numbers, in my testbed, I get ~1800 TCP connections per second less with
eight cluster rules (no /proc interface).

24347 TCP connections per second with one rule.
22580 TCP connections per second with eight rules.

OK, I'll send you another patch without the /proc interface.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-18 11:06                     ` Pablo Neira Ayuso
@ 2009-02-18 11:14                       ` Patrick McHardy
  2009-02-18 17:20                       ` Vincent Bernat
  1 sibling, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-18 11:14 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

[Please trim unrelated content, these mails are getting hard to read]

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> BTW, I recently looked into TIPC, its incredibly easy to use since
>> it deals with dead-node dectection etc internally and all you need
>> to do is exchange a few messages. Might be quite easy to write a
>> smarter failover daemon.
> 
> I see, I don't have more convincing arguments that "I would also need
> time for that but in the meanwhile, please allow this". Well, failover
> daemons are delicate pieces of software, they have to be stable,
> well-tested, bug-free, give timely responses. Still TIPC is experimental
> and I guess that the dead-node detection is only layer 3/4 based on
> heartbeats. Dead-node detection is a tricky issue, the more you can
> perform different layer checkings, the more increase chances to make
> wrong decisions that may lead to inconsistent situations and tons of
> problems. VRRP is the current standard and this one of his limitations,
> and so on.
> 
> Well, if you are not going to accept the /proc interface, not matter
> what I can argument, I give up on this ;)

I'm afraid I can't be convinced of this. If you want to specify
multiple node ids, have the iptables command accept them, but
there's no reason to use proc for this.

> Anyway, probably, this is a premature optimization (but worth?). Some
> numbers, in my testbed, I get ~1800 TCP connections per second less with
> eight cluster rules (no /proc interface).
> 
> 24347 TCP connections per second with one rule.
> 22580 TCP connections per second with eight rules.
> 
> OK, I'll send you another patch without the /proc interface.

Thanks. As I said, I don't have anything against handling multiple
nodes in one rule, as long as its not done using proc.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-18 11:06                     ` Pablo Neira Ayuso
  2009-02-18 11:14                       ` Patrick McHardy
@ 2009-02-18 17:20                       ` Vincent Bernat
  2009-02-18 17:25                         ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Vincent Bernat @ 2009-02-18 17:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Patrick McHardy, netfilter-devel

OoO Pendant  le temps de midi  du mercredi 18 février  2009, vers 12:06,
Pablo Neira Ayuso <pablo@netfilter.org> disait :

>> OK, that explains why you want to handle it this way. I don't want
>> to merge the proc file part though, so until the daemons get smarter,
>> people will have to use multiple rules.

> :(

A solution would be to give the  mask on the command line. The HA daemon
will output  the assigned nodes to a  file and the content  of this file
will be computed into a mask.
-- 
THE BOYS ROOM IS NOT A WATER PARK
THE BOYS ROOM IS NOT A WATER PARK
THE BOYS ROOM IS NOT A WATER PARK
-+- Bart Simpson on chalkboard in episode 3F03
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-18 17:20                       ` Vincent Bernat
@ 2009-02-18 17:25                         ` Patrick McHardy
  2009-02-18 18:38                           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-18 17:25 UTC (permalink / raw)
  To: Vincent Bernat; +Cc: Pablo Neira Ayuso, netfilter-devel

Vincent Bernat wrote:
> OoO Pendant  le temps de midi  du mercredi 18 février  2009, vers 12:06,
> Pablo Neira Ayuso <pablo@netfilter.org> disait :
> 
>>> OK, that explains why you want to handle it this way. I don't want
>>> to merge the proc file part though, so until the daemons get smarter,
>>> people will have to use multiple rules.
> 
>> :(
> 
> A solution would be to give the  mask on the command line. The HA daemon
> will output  the assigned nodes to a  file and the content  of this file
> will be computed into a mask.

Indeed. Just a bitmask (is 64 nodes max enough?) should be fine.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-18 17:25                         ` Patrick McHardy
@ 2009-02-18 18:38                           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-18 18:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Vincent Bernat, netfilter-devel

Patrick McHardy wrote:
> Vincent Bernat wrote:
>> OoO Pendant  le temps de midi  du mercredi 18 février  2009, vers 12:06,
>> Pablo Neira Ayuso <pablo@netfilter.org> disait :
>>
>>>> OK, that explains why you want to handle it this way. I don't want
>>>> to merge the proc file part though, so until the daemons get smarter,
>>>> people will have to use multiple rules.
>>
>>> :(
>>
>> A solution would be to give the  mask on the command line. The HA daemon
>> will output  the assigned nodes to a  file and the content  of this file
>> will be computed into a mask.
> 
> Indeed. Just a bitmask (is 64 nodes max enough?) should be fine.

Good point :), I'll add this to the patch that I'm reworking.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-19 23:14 Pablo Neira Ayuso
  2009-02-20  9:24 ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-19 23:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   15 +++
 net/netfilter/Kconfig                |   16 +++
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  165 ++++++++++++++++++++++++++++++++++
 4 files changed, 197 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..5e0a0d0
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,15 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV	= (1 << 0)
+};
+
+struct xt_cluster_match_info {
+	u_int32_t		total_nodes;
+	u_int32_t		node_mask;
+	u_int32_t		hash_seed;
+	u_int32_t		flags;
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..77b6405 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NF_CONNTRACK
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y or M here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..2203aa2
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,165 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/bitops.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return test_bit(hash, &info->node_mask) ^
+	       !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+
+	if (info->node_mask > (1 << info->total_nodes)) {
+		printk(KERN_ERR "xt_cluster: the id of this node cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+	return true;
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	return xt_register_match(&xt_cluster_match);
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-19 23:14 Pablo Neira Ayuso
@ 2009-02-20  9:24 ` Patrick McHardy
  2009-02-20 13:15   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-20  9:24 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> This patch adds the iptables cluster match. This match can be used
> to deploy gateway and back-end load-sharing clusters. The cluster
> can be composed of 32 nodes maximum (although I have only tested
> this with two nodes, so I cannot tell what is the real scalability
> limit of this solution in terms of cluster nodes).

Thanks Pablo.

> +	ct = nf_ct_get(skb, &ctinfo);
> +	if (ct == NULL)
> +		return false;
> +
> +	if (ct == &nf_conntrack_untracked)
> +		return false;
> +
> +	if (ct->master)
> +		hash = xt_cluster_hash(ct->master, info);
> +	else
> +		hash = xt_cluster_hash(ct, info);

This makes a lot of sense for helpers like SIP, where the expectation
can arrive from a different source address. I'm just wondering how
this works when not using reliable synchronization - in that case, other
nodes might not be aware of the expectation and also accept the packet.
I don't have a suggestion besides making sure expectations are
synchronized, just thought I'd point it out.

> +static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
> +{
> +	struct xt_cluster_match_info *info = par->matchinfo;
> +
> +	if (info->node_mask > (1 << info->total_nodes)) {
> +		printk(KERN_ERR "xt_cluster: the id of this node cannot be "
> +				"higher than the total number of nodes\n");

This looks like an off-by-one (warning: still at first coffee :)).
It may also not be equal to the mask I'd expect. I can change it
to >= when applying if you agree.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-20  9:24 ` Patrick McHardy
@ 2009-02-20 13:15   ` Pablo Neira Ayuso
  2009-02-20 13:48     ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-20 13:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Hi Patrick,

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> +    ct = nf_ct_get(skb, &ctinfo);
>> +    if (ct == NULL)
>> +        return false;
>> +
>> +    if (ct == &nf_conntrack_untracked)
>> +        return false;
>> +
>> +    if (ct->master)
>> +        hash = xt_cluster_hash(ct->master, info);
>> +    else
>> +        hash = xt_cluster_hash(ct, info);
> 
> This makes a lot of sense for helpers like SIP, where the expectation
> can arrive from a different source address. I'm just wondering how
> this works when not using reliable synchronization - in that case, other
> nodes might not be aware of the expectation and also accept the packet.
> I don't have a suggestion besides making sure expectations are
> synchronized, just thought I'd point it out.

Indeed.

This sort of problem is interesting, just in case that you have some 
spare time to think about other synchronization-related problems 
(otherwise you can skip the following below :)). Conntrackd does not 
synchronize expectations (at least, it's not in my plans yet), it 
synchronizes conntrack entries, and that includes the relationship 
between master and related conntracks. Thus, after the failover, the new 
primary node knows that the master connection has a helper (so it can 
create new expectations) and already existing established-related 
connections are linked to the master conntracks.

Still I see two possible problematic situations with these approach:

  * If expectations are not propagated, this means than a FTP-data 
connections that is about to start would not success if that connection 
happens during a failover as the expectation information is lost.

  * If the state information is lost for whatever reason (like not using 
conntrackd at all or losing the state information due to netlink 
unreliability), then the former expected connection would be handled 
like a normal connection by one cluster node. For example, this would 
break if destination nat is used in the case of FTP (and similarly for 
other helpers I think).

For the first problem, I can say that conntrackd can be tuned to reduce 
the chances of this to happen (at the cost of investing more resources 
in the synchronization). Moreover, connections that are about to start 
may retry in short and no data was exchanged indeed.

For the second problem, this is actually the sort of problems that I 
want to avoid making netlink reliable by dropping packets. By reducing 
the chances to lose state information for whatever reason.

>> +static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
>> +{
>> +    struct xt_cluster_match_info *info = par->matchinfo;
>> +
>> +    if (info->node_mask > (1 << info->total_nodes)) {
>> +        printk(KERN_ERR "xt_cluster: the id of this node cannot be "
>> +                "higher than the total number of nodes\n");
> 
> This looks like an off-by-one (warning: still at first coffee :)).
> It may also not be equal to the mask I'd expect. I can change it
> to >= when applying if you agree.

You're right! Please change it.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-20 13:15   ` Pablo Neira Ayuso
@ 2009-02-20 13:48     ` Patrick McHardy
  2009-02-20 16:52       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-20 13:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo Neira Ayuso wrote:
>>> +    if (ct->master)
>>> +        hash = xt_cluster_hash(ct->master, info);
>>> +    else
>>> +        hash = xt_cluster_hash(ct, info);
>>
>> This makes a lot of sense for helpers like SIP, where the expectation
>> can arrive from a different source address. I'm just wondering how
>> this works when not using reliable synchronization - in that case, other
>> nodes might not be aware of the expectation and also accept the packet.
>> I don't have a suggestion besides making sure expectations are
>> synchronized, just thought I'd point it out.
> 
> Indeed.
> 
> This sort of problem is interesting, just in case that you have some 
> spare time to think about other synchronization-related problems 
> (otherwise you can skip the following below :)). Conntrackd does not 
> synchronize expectations (at least, it's not in my plans yet), it 
> synchronizes conntrack entries, and that includes the relationship 
> between master and related conntracks. Thus, after the failover, the new 
> primary node knows that the master connection has a helper (so it can 
> create new expectations) and already existing established-related 
> connections are linked to the master conntracks.
> 
> Still I see two possible problematic situations with these approach:
> 
>  * If expectations are not propagated, this means than a FTP-data 
> connections that is about to start would not success if that connection 
> happens during a failover as the expectation information is lost.
> 
>  * If the state information is lost for whatever reason (like not using 
> conntrackd at all or losing the state information due to netlink 
> unreliability), then the former expected connection would be handled 
> like a normal connection by one cluster node. For example, this would 
> break if destination nat is used in the case of FTP (and similarly for 
> other helpers I think).
> 
> For the first problem, I can say that conntrackd can be tuned to reduce 
> the chances of this to happen (at the cost of investing more resources 
> in the synchronization). Moreover, connections that are about to start 
> may retry in short and no data was exchanged indeed.

Good point.

> For the second problem, this is actually the sort of problems that I 
> want to avoid making netlink reliable by dropping packets. By reducing 
> the chances to lose state information for whatever reason.

Yes, although the netlink delivery only covers part of it. It might
be the path where most events are lost though.

>>> +static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
>>> +{
>>> +    struct xt_cluster_match_info *info = par->matchinfo;
>>> +
>>> +    if (info->node_mask > (1 << info->total_nodes)) {
>>> +        printk(KERN_ERR "xt_cluster: the id of this node cannot be "
>>> +                "higher than the total number of nodes\n");
>>
>> This looks like an off-by-one (warning: still at first coffee :)).
>> It may also not be equal to the mask I'd expect. I can change it
>> to >= when applying if you agree.
> 
> You're right! Please change it.

I noticed another problem during compilation:

net/netfilter/xt_cluster.c: In function 'xt_cluster_mt':
net/netfilter/xt_cluster.c:124: warning: passing argument 2 of 
'constant_test_bit' from incompatible pointer type
net/netfilter/xt_cluster.c:124: warning: passing argument 2 of 
'variable_test_bit' from incompatible pointer type

The problem is that is uses a u32 for the mask, but the bitops are
only defined for unsigned longs. Which is a bit unfortunate since
they're not well suited for ABI structures. I'd suggest to simply
open-code the bit tests.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-20 13:48     ` Patrick McHardy
@ 2009-02-20 16:52       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-20 16:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
>> For the second problem, this is actually the sort of problems that I
>> want to avoid making netlink reliable by dropping packets. By reducing
>> the chances to lose state information for whatever reason.
> 
> Yes, although the netlink delivery only covers part of it. It might
> be the path where most events are lost though.

Right, to extend this comment, during the failure there can also be some
events pending to be sent in all the existing kernel queues (netlink
queues and device transmission queues) that the daemon did not have time
to propagate during the failure, this is part of the asynchronous
approach. I've been trying to measure this behaviour but I didn't see
any significant amount of lost connections in my testbed, not yet at least.

>>>> +static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
>>>> +{
>>>> +    struct xt_cluster_match_info *info = par->matchinfo;
>>>> +
>>>> +    if (info->node_mask > (1 << info->total_nodes)) {
>>>> +        printk(KERN_ERR "xt_cluster: the id of this node cannot be "
>>>> +                "higher than the total number of nodes\n");
>>>
>>> This looks like an off-by-one (warning: still at first coffee :)).
>>> It may also not be equal to the mask I'd expect. I can change it
>>> to >= when applying if you agree.
>>
>> You're right! Please change it.
> 
> I noticed another problem during compilation:
> 
> net/netfilter/xt_cluster.c: In function 'xt_cluster_mt':
> net/netfilter/xt_cluster.c:124: warning: passing argument 2 of
> 'constant_test_bit' from incompatible pointer type
> net/netfilter/xt_cluster.c:124: warning: passing argument 2 of
> 'variable_test_bit' from incompatible pointer type
> 
> The problem is that is uses a u32 for the mask, but the bitops are
> only defined for unsigned longs. Which is a bit unfortunate since
> they're not well suited for ABI structures. I'd suggest to simply
> open-code the bit tests.

Agreed. The test_bit was a reminiscent of the /proc interface (which
allowed node mask bit setting), so I don't need it. I'm going to resend
you the patch in a couple of minutes (including the off-by-one issue
resolved).

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-20 20:50 Pablo Neira Ayuso
  2009-02-20 20:56 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-20 20:50 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   15 +++
 net/netfilter/Kconfig                |   16 +++
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  164 ++++++++++++++++++++++++++++++++++
 4 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..5e0a0d0
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,15 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV	= (1 << 0)
+};
+
+struct xt_cluster_match_info {
+	u_int32_t		total_nodes;
+	u_int32_t		node_mask;
+	u_int32_t		hash_seed;
+	u_int32_t		flags;
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..77b6405 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NF_CONNTRACK
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y or M here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..81dd3f1
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,164 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return ((1 << hash) & info->node_mask) ^
+	       !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+
+	if (info->node_mask >= (1 << info->total_nodes)) {
+		printk(KERN_ERR "xt_cluster: this node mask cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+	return true;
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	return xt_register_match(&xt_cluster_match);
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-20 20:50 [PATCH] netfilter: xtables: add cluster match Pablo Neira Ayuso
@ 2009-02-20 20:56 ` Pablo Neira Ayuso
  0 siblings, 0 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-20 20:56 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

Pablo Neira Ayuso wrote:
> +	return ((1 << hash) & info->node_mask) ^
> +	       !!(info->flags & XT_CLUSTER_F_INV);

Am, wait this is completely broken... this happen when one rushes...

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] netfilter: xtables: add cluster match
@ 2009-02-23 10:13 Pablo Neira Ayuso
  2009-02-24 13:46 ` Patrick McHardy
  2009-03-16 16:11 ` Patrick McHardy
  0 siblings, 2 replies; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-23 10:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: kaber

This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).

Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:

	(jhash(source IP) % total_nodes) & node_mask

For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):

iptables -I PREROUTING -t mangle -i eth1 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
	-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
	--cluster-total-nodes 2 --cluster-local-node 1 \
	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
	-m mark ! --mark 0xffff -j DROP

And the following commands to make all nodes see the same packets:

ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
	--destination-mac 01:00:5e:00:01:01 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
	-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
	--destination-mac 01:00:5e:00:01:02 \
	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27

In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose

BTW, some final notes:

 * This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
 * This match supersedes the CLUSTERIP target.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/linux/netfilter/xt_cluster.h |   15 +++
 net/netfilter/Kconfig                |   16 +++
 net/netfilter/Makefile               |    1 
 net/netfilter/xt_cluster.c           |  164 ++++++++++++++++++++++++++++++++++
 4 files changed, 196 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_cluster.h
 create mode 100644 net/netfilter/xt_cluster.c

diff --git a/include/linux/netfilter/xt_cluster.h b/include/linux/netfilter/xt_cluster.h
new file mode 100644
index 0000000..5e0a0d0
--- /dev/null
+++ b/include/linux/netfilter/xt_cluster.h
@@ -0,0 +1,15 @@
+#ifndef _XT_CLUSTER_MATCH_H
+#define _XT_CLUSTER_MATCH_H
+
+enum xt_cluster_flags {
+	XT_CLUSTER_F_INV	= (1 << 0)
+};
+
+struct xt_cluster_match_info {
+	u_int32_t		total_nodes;
+	u_int32_t		node_mask;
+	u_int32_t		hash_seed;
+	u_int32_t		flags;
+};
+
+#endif /* _XT_CLUSTER_MATCH_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index c2bac9c..77b6405 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -488,6 +488,22 @@ config NETFILTER_XT_TARGET_TCPOPTSTRIP
 	  This option adds a "TCPOPTSTRIP" target, which allows you to strip
 	  TCP options from TCP packets.
 
+config NETFILTER_XT_MATCH_CLUSTER
+	tristate '"cluster" match support'
+	depends on NF_CONNTRACK
+	depends on NETFILTER_ADVANCED
+	---help---
+	  This option allows you to build work-load-sharing clusters of
+	  network servers/stateful firewalls without having a dedicated
+	  load-balancing router/server/switch. Basically, this match returns
+	  true when the packet must be handled by this cluster node. Thus,
+	  all nodes see all packets and this match decides which node handles
+	  what packets. The work-load sharing algorithm is based on source
+	  address hashing.
+
+	  If you say Y or M here, try `iptables -m cluster --help` for
+	  more information.
+
 config NETFILTER_XT_MATCH_COMMENT
 	tristate  '"comment" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index da3d909..960399a 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -57,6 +57,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
+obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_CONNLIMIT) += xt_connlimit.o
diff --git a/net/netfilter/xt_cluster.c b/net/netfilter/xt_cluster.c
new file mode 100644
index 0000000..ad5bd89
--- /dev/null
+++ b/net/netfilter/xt_cluster.c
@@ -0,0 +1,164 @@
+/*
+ * (C) 2008-2009 Pablo Neira Ayuso <pablo@netfilter.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <linux/netfilter/xt_cluster.h>
+
+static inline u_int32_t nf_ct_orig_ipv4_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+}
+
+static inline const void *nf_ct_orig_ipv6_src(const struct nf_conn *ct)
+{
+	return ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip6;
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv4(u_int32_t ip, const struct xt_cluster_match_info *info)
+{
+	return jhash_1word(ip, info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash_ipv6(const void *ip, const struct xt_cluster_match_info *info)
+{
+	return jhash2(ip, NF_CT_TUPLE_L3SIZE / sizeof(__u32), info->hash_seed);
+}
+
+static inline u_int32_t
+xt_cluster_hash(const struct nf_conn *ct,
+		const struct xt_cluster_match_info *info)
+{
+	u_int32_t hash = 0;
+
+	switch(nf_ct_l3num(ct)) {
+	case AF_INET:
+		hash = xt_cluster_hash_ipv4(nf_ct_orig_ipv4_src(ct), info);
+		break;
+	case AF_INET6:
+		hash = xt_cluster_hash_ipv6(nf_ct_orig_ipv6_src(ct), info);
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return (((u64)hash * info->total_nodes) >> 32);
+}
+
+static inline bool
+xt_cluster_is_multicast_addr(const struct sk_buff *skb, u_int8_t family)
+{
+	bool is_multicast = false;
+
+	switch(family) {
+	case NFPROTO_IPV4:
+		is_multicast = ipv4_is_multicast(ip_hdr(skb)->daddr);
+		break;
+	case NFPROTO_IPV6:
+		is_multicast = ipv6_addr_type(&ipv6_hdr(skb)->daddr) &
+						IPV6_ADDR_MULTICAST;
+		break;
+	default:
+		WARN_ON(1);
+		break;
+	}
+	return is_multicast;
+}
+
+static bool
+xt_cluster_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	struct sk_buff *pskb = (struct sk_buff *)skb;
+	const struct xt_cluster_match_info *info = par->matchinfo;
+	const struct nf_conn *ct;
+	enum ip_conntrack_info ctinfo;
+	unsigned long hash;
+
+	/* This match assumes that all nodes see the same packets. This can be
+	 * achieved if the switch that connects the cluster nodes support some
+	 * sort of 'port mirroring'. However, if your switch does not support
+	 * this, your cluster nodes can reply ARP request using a multicast MAC
+	 * address. Thus, your switch will flood the same packets to the
+	 * cluster nodes with the same multicast MAC address. Using a multicast
+	 * link address is a RFC 1812 (section 3.3.2) violation, but this works
+	 * fine in practise.
+	 *
+	 * Unfortunately, if you use the multicast MAC address, the link layer
+	 * sets skbuff's pkt_type to PACKET_MULTICAST, which is not accepted
+	 * by TCP and others for packets coming to this node. For that reason,
+	 * this match mangles skbuff's pkt_type if it detects a packet
+	 * addressed to a unicast address but using PACKET_MULTICAST. Yes, I
+	 * know, matches should not alter packets, but we are doing this here
+	 * because we would need to add a PKTTYPE target for this sole purpose.
+	 */
+	if (!xt_cluster_is_multicast_addr(skb, par->family) &&
+	    skb->pkt_type == PACKET_MULTICAST) {
+	    	pskb->pkt_type = PACKET_HOST;
+	}
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct == NULL)
+		return false;
+
+	if (ct == &nf_conntrack_untracked)
+		return false;
+
+	if (ct->master)
+		hash = xt_cluster_hash(ct->master, info);
+	else
+		hash = xt_cluster_hash(ct, info);
+
+	return !!((1 << hash) & info->node_mask) ^
+	       !!(info->flags & XT_CLUSTER_F_INV);
+}
+
+static bool xt_cluster_mt_checkentry(const struct xt_mtchk_param *par)
+{
+	struct xt_cluster_match_info *info = par->matchinfo;
+
+	if (info->node_mask >= (1 << info->total_nodes)) {
+		printk(KERN_ERR "xt_cluster: this node mask cannot be "
+				"higher than the total number of nodes\n");
+		return false;
+	}
+	return true;
+}
+
+static struct xt_match xt_cluster_match __read_mostly = {
+	.name		= "cluster",
+	.family		= NFPROTO_UNSPEC,
+	.match		= xt_cluster_mt,
+	.checkentry	= xt_cluster_mt_checkentry,
+	.matchsize	= sizeof(struct xt_cluster_match_info),
+	.me		= THIS_MODULE,
+};
+
+static int __init xt_cluster_mt_init(void)
+{
+	return xt_register_match(&xt_cluster_match);
+}
+
+static void __exit xt_cluster_mt_fini(void)
+{
+	xt_unregister_match(&xt_cluster_match);
+}
+
+MODULE_AUTHOR("Pablo Neira Ayuso <pablo@netfilter.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Xtables: hash-based cluster match");
+MODULE_ALIAS("ipt_cluster");
+MODULE_ALIAS("ip6t_cluster");
+module_init(xt_cluster_mt_init);
+module_exit(xt_cluster_mt_fini);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-23 10:13 Pablo Neira Ayuso
@ 2009-02-24 13:46 ` Patrick McHardy
  2009-02-24 14:05   ` Pablo Neira Ayuso
  2009-03-16 16:11 ` Patrick McHardy
  1 sibling, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-24 13:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> +enum xt_cluster_flags {
> +	XT_CLUSTER_F_INV	= (1 << 0)
> +};
> +
> +struct xt_cluster_match_info {
> +	u_int32_t		total_nodes;
> +	u_int32_t		node_mask;
> +	u_int32_t		hash_seed;
> +	u_int32_t		flags;
> +};

This doesn't seem like such a hot idea. I haven't seen the new
userspace patch, but assuming you're interested in the flags and
not ignoring them in userspace, the user has to specify the hash
seed for rule deletions.

You also have to chose the same seed for all nodes in a cluster.
This seems needlessly complicated, I'd suggest to simply use zero.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-24 13:46 ` Patrick McHardy
@ 2009-02-24 14:05   ` Pablo Neira Ayuso
  2009-02-24 14:06     ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-24 14:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> +enum xt_cluster_flags {
>> +    XT_CLUSTER_F_INV    = (1 << 0)
>> +};
>> +
>> +struct xt_cluster_match_info {
>> +    u_int32_t        total_nodes;
>> +    u_int32_t        node_mask;
>> +    u_int32_t        hash_seed;
>> +    u_int32_t        flags;
>> +};
> 
> This doesn't seem like such a hot idea. I haven't seen the new
> userspace patch, but assuming you're interested in the flags and
> not ignoring them in userspace, the user has to specify the hash
> seed for rule deletions.

The user has to specify the hash seed to delete the rule if it's 
non-zero, otherwise it must be specified. The hash seed is optional. I 
don't quite see the problem.

> You also have to chose the same seed for all nodes in a cluster.
> This seems needlessly complicated, I'd suggest to simply use zero.

One may want to forge traffic to flood a single node? The hash seed 
avoids this.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-24 14:05   ` Pablo Neira Ayuso
@ 2009-02-24 14:06     ` Patrick McHardy
  2009-02-24 23:13       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-24 14:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo Neira Ayuso wrote:
>>> +enum xt_cluster_flags {
>>> +    XT_CLUSTER_F_INV    = (1 << 0)
>>> +};
>>> +
>>> +struct xt_cluster_match_info {
>>> +    u_int32_t        total_nodes;
>>> +    u_int32_t        node_mask;
>>> +    u_int32_t        hash_seed;
>>> +    u_int32_t        flags;
>>> +};
>>
>> This doesn't seem like such a hot idea. I haven't seen the new
>> userspace patch, but assuming you're interested in the flags and
>> not ignoring them in userspace, the user has to specify the hash
>> seed for rule deletions.
> 
> The user has to specify the hash seed to delete the rule if it's 
> non-zero, otherwise it must be specified. The hash seed is optional. I 
> don't quite see the problem.

Its a parameter without a meaning, the user is needlessly bothered
with this.

>> You also have to chose the same seed for all nodes in a cluster.
>> This seems needlessly complicated, I'd suggest to simply use zero.
> 
> One may want to forge traffic to flood a single node? The hash seed 
> avoids this.

No, it only makes it easier to shut off since I have to use the same
source address to be sure I hit the same node. This seems like a valid
argument though.

The fact that you have to specify it for deletion still seems unnecesary
though. You would never have two rules differing only in the seed value
since that would mean the node is part of two clusters. So we might as
well move it to the end and ignore it in userspace. What do you think?
In case you agree, I also think "secret" would be a more fitting name.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-24 14:06     ` Patrick McHardy
@ 2009-02-24 23:13       ` Pablo Neira Ayuso
  2009-02-25  5:52         ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-24 23:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>> Pablo Neira Ayuso wrote:
>>>> +enum xt_cluster_flags {
>>>> +    XT_CLUSTER_F_INV    = (1 << 0)
>>>> +};
>>>> +
>>>> +struct xt_cluster_match_info {
>>>> +    u_int32_t        total_nodes;
>>>> +    u_int32_t        node_mask;
>>>> +    u_int32_t        hash_seed;
>>>> +    u_int32_t        flags;
>>>> +};
>>>
>>> This doesn't seem like such a hot idea. I haven't seen the new
>>> userspace patch, but assuming you're interested in the flags and
>>> not ignoring them in userspace, the user has to specify the hash
>>> seed for rule deletions.
>>
>> The user has to specify the hash seed to delete the rule if it's
>> non-zero, otherwise it must be specified. The hash seed is optional. I
>> don't quite see the problem.
> 
> Its a parameter without a meaning, the user is needlessly bothered
> with this.

>From the user view, yes. No matter what value you set as long as it is
the same in all the cluster nodes.

>>> You also have to chose the same seed for all nodes in a cluster.
>>> This seems needlessly complicated, I'd suggest to simply use zero.
>>
>> One may want to forge traffic to flood a single node? The hash seed
>> avoids this.
> 
> No, it only makes it easier to shut off since I have to use the same
> source address to be sure I hit the same node. This seems like a valid
> argument though.
> 
> The fact that you have to specify it for deletion still seems unnecesary
> though. You would never have two rules differing only in the seed value
> since that would mean the node is part of two clusters. So we might as
> well move it to the end and ignore it in userspace. What do you think?

But the value has to be the same in all the cluster nodes, so how can it
be set to ensure that it is the same value?

> In case you agree, I also think "secret" would be a more fitting name.

I can rename the field to "secret" in the structure or change the
iptables cluster match option to be "--cluster-secret" instead of
"--cluster-hash-seed" if you like.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-24 23:13       ` Pablo Neira Ayuso
@ 2009-02-25  5:52         ` Patrick McHardy
  2009-02-25  9:42           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 49+ messages in thread
From: Patrick McHardy @ 2009-02-25  5:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> The fact that you have to specify it for deletion still seems unnecesary
>> though. You would never have two rules differing only in the seed value
>> since that would mean the node is part of two clusters. So we might as
>> well move it to the end and ignore it in userspace. What do you think?
> 
> But the value has to be the same in all the cluster nodes, so how can it
> be set to ensure that it is the same value?

I only meant ignoring it on comparisons of course, just as we do
with all the private pointer stuff. Anyways, its not that important
and it in fact would be slightly different behaviour from what we
do in other cases, where we only ignore state. So perhaps not a good
idea after all.

>> In case you agree, I also think "secret" would be a more fitting name.
> 
> I can rename the field to "secret" in the structure or change the
> iptables cluster match option to be "--cluster-secret" instead of
> "--cluster-hash-seed" if you like.

Its more fitting in my opinion, but I don't really care.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-25  5:52         ` Patrick McHardy
@ 2009-02-25  9:42           ` Pablo Neira Ayuso
  2009-02-25 10:20             ` Patrick McHardy
  0 siblings, 1 reply; 49+ messages in thread
From: Pablo Neira Ayuso @ 2009-02-25  9:42 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>>> In case you agree, I also think "secret" would be a more fitting name.
>>
>> I can rename the field to "secret" in the structure or change the
>> iptables cluster match option to be "--cluster-secret" instead of
>> "--cluster-hash-seed" if you like.
> 
> Its more fitting in my opinion, but I don't really care.

I don't either, would you apply the patch as is now?

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-25  9:42           ` Pablo Neira Ayuso
@ 2009-02-25 10:20             ` Patrick McHardy
  0 siblings, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-02-25 10:20 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>   
>> Pablo Neira Ayuso wrote:
>>     
>>>> In case you agree, I also think "secret" would be a more fitting name.
>>>>         
>>> I can rename the field to "secret" in the structure or change the
>>> iptables cluster match option to be "--cluster-secret" instead of
>>> "--cluster-hash-seed" if you like.
>>>       
>> Its more fitting in my opinion, but I don't really care.
>>     
>
> I don't either, would you apply the patch as is now?
>   

I will :)



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] netfilter: xtables: add cluster match
  2009-02-23 10:13 Pablo Neira Ayuso
  2009-02-24 13:46 ` Patrick McHardy
@ 2009-03-16 16:11 ` Patrick McHardy
  1 sibling, 0 replies; 49+ messages in thread
From: Patrick McHardy @ 2009-03-16 16:11 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso wrote:
> This patch adds the iptables cluster match. This match can be used
> to deploy gateway and back-end load-sharing clusters. The cluster
> can be composed of 32 nodes maximum (although I have only tested
> this with two nodes, so I cannot tell what is the real scalability
> limit of this solution in terms of cluster nodes).
> 
> Assuming that all the nodes see all packets (see below for an
> example on how to do that if your switch does not allow this), the
> cluster match decides if this node has to handle a packet given:
> 
> 	(jhash(source IP) % total_nodes) & node_mask
> 
> For related connections, the master conntrack is used. The following
> is an example of its use to deploy a gateway cluster composed of two
> nodes (where this is the node 1):
> 
> iptables -I PREROUTING -t mangle -i eth1 -m cluster \
> 	--cluster-total-nodes 2 --cluster-local-node 1 \
> 	--cluster-proc-name eth1 -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth1 \
> 	-m mark ! --mark 0xffff -j DROP
> iptables -A PREROUTING -t mangle -i eth2 -m cluster \
> 	--cluster-total-nodes 2 --cluster-local-node 1 \
> 	--cluster-proc-name eth2 -j MARK --set-mark 0xffff
> iptables -A PREROUTING -t mangle -i eth2 \
> 	-m mark ! --mark 0xffff -j DROP
> 
> And the following commands to make all nodes see the same packets:
> 
> ip maddr add 01:00:5e:00:01:01 dev eth1
> ip maddr add 01:00:5e:00:01:02 dev eth2
> arptables -I OUTPUT -o eth1 --h-length 6 \
> 	-j mangle --mangle-mac-s 01:00:5e:00:01:01
> arptables -I INPUT -i eth1 --h-length 6 \
> 	--destination-mac 01:00:5e:00:01:01 \
> 	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
> arptables -I OUTPUT -o eth2 --h-length 6 \
> 	-j mangle --mangle-mac-s 01:00:5e:00:01:02
> arptables -I INPUT -i eth2 --h-length 6 \
> 	--destination-mac 01:00:5e:00:01:02 \
> 	-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
> 
> In the case of TCP connections, pickup facility has to be disabled
> to avoid marking TCP ACK packets coming in the reply direction as
> valid.
> 
> echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
> 
> BTW, some final notes:
> 
>  * This match mangles the skbuff pkt_type in case that it detects
> PACKET_MULTICAST for a non-multicast address. This may be done in
> a PKTTYPE target for this sole purpose.
>  * This match supersedes the CLUSTERIP target.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> 
>  include/linux/netfilter/xt_cluster.h |   15 +++
>  net/netfilter/Kconfig                |   16 +++
>  net/netfilter/Makefile               |    1 
>  net/netfilter/xt_cluster.c           |  164 ++++++++++++++++++++++++++++++++++
>  4 files changed, 196 insertions(+), 0 deletions(-)
>  create mode 100644 include/linux/netfilter/xt_cluster.h
>  create mode 100644 net/netfilter/xt_cluster.c

Applied, thanks. I've also added xt_cluster.h to the Kbuild file so
the header will be installed.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2009-03-16 16:11 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-20 20:50 [PATCH] netfilter: xtables: add cluster match Pablo Neira Ayuso
2009-02-20 20:56 ` Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2009-02-23 10:13 Pablo Neira Ayuso
2009-02-24 13:46 ` Patrick McHardy
2009-02-24 14:05   ` Pablo Neira Ayuso
2009-02-24 14:06     ` Patrick McHardy
2009-02-24 23:13       ` Pablo Neira Ayuso
2009-02-25  5:52         ` Patrick McHardy
2009-02-25  9:42           ` Pablo Neira Ayuso
2009-02-25 10:20             ` Patrick McHardy
2009-03-16 16:11 ` Patrick McHardy
2009-02-19 23:14 Pablo Neira Ayuso
2009-02-20  9:24 ` Patrick McHardy
2009-02-20 13:15   ` Pablo Neira Ayuso
2009-02-20 13:48     ` Patrick McHardy
2009-02-20 16:52       ` Pablo Neira Ayuso
2009-02-16  9:32 Pablo Neira Ayuso
2009-02-16  9:23 Pablo Neira Ayuso
2009-02-16  9:31 ` Pablo Neira Ayuso
2009-02-16 12:13   ` Jan Engelhardt
2009-02-16 12:17     ` Patrick McHardy
2009-02-14 19:29 Pablo Neira Ayuso
2009-02-14 20:28 ` Jan Engelhardt
2009-02-14 20:42   ` Pablo Neira Ayuso
2009-02-14 22:31     ` Jan Engelhardt
2009-02-14 22:32       ` Jan Engelhardt
2009-02-16 10:56 ` Patrick McHardy
2009-02-16 14:01   ` Pablo Neira Ayuso
2009-02-16 14:03     ` Patrick McHardy
2009-02-16 14:30       ` Pablo Neira Ayuso
2009-02-16 15:01         ` Patrick McHardy
2009-02-16 15:14         ` Pablo Neira Ayuso
2009-02-16 15:10           ` Patrick McHardy
2009-02-16 15:27             ` Pablo Neira Ayuso
2009-02-17 10:46             ` Pablo Neira Ayuso
2009-02-17 10:50               ` Patrick McHardy
2009-02-17 13:50                 ` Pablo Neira Ayuso
2009-02-17 19:45                   ` Vincent Bernat
2009-02-18 10:14                     ` Patrick McHardy
2009-02-18 10:13                   ` Patrick McHardy
2009-02-18 11:06                     ` Pablo Neira Ayuso
2009-02-18 11:14                       ` Patrick McHardy
2009-02-18 17:20                       ` Vincent Bernat
2009-02-18 17:25                         ` Patrick McHardy
2009-02-18 18:38                           ` Pablo Neira Ayuso
2009-02-16 17:17         ` Jan Engelhardt
2009-02-16 17:13     ` Jan Engelhardt
2009-02-16 17:16       ` Patrick McHardy
2009-02-16 17:22         ` Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).