netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support
@ 2009-07-27 13:46 Hannes Eder
  2009-07-27 13:46 ` [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:" Hannes Eder
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:46 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

The following series is aiming at adding full NAT support to IPVS.  The
approach is via a minimal change to IPVS (make friends with nf_conntrack)
and adding a netfilter matcher (xt_ipvs + libxt_ipvs).

Example usage:

% ipvsadm -A -t 192.168.100.30:8080 -s rr
% ipvsadm -a -t 192.168.100.30:8080 -r 192.168.10.20:8080 -m
# ...

# SNAT for VIP 192.168.100.30:8080
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \
> -j SNAT --to-source 192.168.10.10 

Comments?

Changes to the linux kernel:

Hannes Eder (4):
      IPVS: debugging output for ip_vs_update_conntrack
      IPVS: make friends with nf_conntrack
      netfilter: xt_ipvs (netfilter matcher for ipvs)
      IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:"

 include/linux/netfilter/xt_ipvs.h |   32 +++++++
 include/net/ip_vs.h               |   24 +++--
 net/netfilter/Kconfig             |    8 ++
 net/netfilter/Makefile            |    1 
 net/netfilter/ipvs/ip_vs_core.c   |   36 --------
 net/netfilter/ipvs/ip_vs_proto.c  |    1 
 net/netfilter/ipvs/ip_vs_xmit.c   |   54 ++++++++++++
 net/netfilter/xt_ipvs.c           |  171 +++++++++++++++++++++++++++++++++++++
 8 files changed, 279 insertions(+), 48 deletions(-)
 create mode 100644 include/linux/netfilter/xt_ipvs.h
 create mode 100644 net/netfilter/xt_ipvs.c


Changes to iptables:

Hannes Eder (1):
      libxt_ipvs: user space lib for netfilter matcher xt_ipvs

 extensions/libxt_ipvs.c           |  350 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |    7 +
 include/linux/netfilter/xt_ipvs.h |   32 +++
 3 files changed, 389 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h


Cheers,
-Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:"
  2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
@ 2009-07-27 13:46 ` Hannes Eder
  2009-07-27 18:14   ` Jan Engelhardt
  2009-07-27 13:46 ` [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs) Hannes Eder
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:46 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

From: Hannes Eder <hannes@hanneseder.net>

Now all printk messages from IPVS are prefixed with "IPVS:".

Signed-off-by: Hannes Eder <heder@google.com>

 include/net/ip_vs.h |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index e01cc2d..e03524e 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -161,18 +161,18 @@ static inline const char *ip_vs_dbg_addr(int af, char *buf, size_t buf_len,
     } while (0)
 
 #ifdef CONFIG_IP_VS_DEBUG
-#define EnterFunction(level)						\
-    do {								\
-	    if (level <= ip_vs_get_debug_level())			\
-		    printk(KERN_DEBUG "Enter: %s, %s line %i\n",	\
-			   __func__, __FILE__, __LINE__);		\
-    } while (0)
-#define LeaveFunction(level)                                            \
-    do {                                                                \
-	    if (level <= ip_vs_get_debug_level())                       \
-			printk(KERN_DEBUG "Leave: %s, %s line %i\n",    \
-			       __func__, __FILE__, __LINE__);       \
-    } while (0)
+#define EnterFunction(level)						       \
+	do {								       \
+		if (level <= ip_vs_get_debug_level())			       \
+			printk(KERN_DEBUG "IPVS: Enter: %s, %s line %i\n",     \
+				__func__, __FILE__, __LINE__);		       \
+	} while (0)
+#define LeaveFunction(level)						       \
+	do {								       \
+		if (level <= ip_vs_get_debug_level())			       \
+			printk(KERN_DEBUG "IPVS: Leave: %s, %s line %i\n",     \
+				__func__, __FILE__, __LINE__);		       \
+	} while (0)
 #else
 #define EnterFunction(level)   do {} while (0)
 #define LeaveFunction(level)   do {} while (0)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs)
  2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
  2009-07-27 13:46 ` [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:" Hannes Eder
@ 2009-07-27 13:46 ` Hannes Eder
  2009-07-27 18:35   ` Jan Engelhardt
  2009-07-27 13:46 ` [RFC][PATCH 3/5] IPVS: make friends with nf_conntrack Hannes Eder
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:46 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

From: Hannes Eder <hannes@hanneseder.net>

This implements the kernel-space side of the iptables matcher xt_ipvs.

Signed-off-by: Hannes Eder <heder@google.com>

 include/linux/netfilter/xt_ipvs.h |   32 +++++++
 net/netfilter/Kconfig             |    8 ++
 net/netfilter/Makefile            |    1 
 net/netfilter/ipvs/ip_vs_proto.c  |    1 
 net/netfilter/xt_ipvs.c           |  171 +++++++++++++++++++++++++++++++++++++
 5 files changed, 213 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_ipvs.h
 create mode 100644 net/netfilter/xt_ipvs.c

diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h
new file mode 100644
index 0000000..3a70289
--- /dev/null
+++ b/include/linux/netfilter/xt_ipvs.h
@@ -0,0 +1,32 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H 1
+
+#ifndef _IP_VS_H
+#define IP_VS_CONN_F_FWD_MASK	0x0007		/* mask for the fwd methods */
+#define IP_VS_CONN_F_MASQ	0x0000		/* masquerading/NAT */
+#define IP_VS_CONN_F_LOCALNODE	0x0001		/* local node */
+#define IP_VS_CONN_F_TUNNEL	0x0002		/* tunneling */
+#define IP_VS_CONN_F_DROUTE	0x0003		/* direct routing */
+#define IP_VS_CONN_F_BYPASS	0x0004		/* cache bypass */
+#endif
+
+#define XT_IPVS_IPVS		0x01 /* this is implied by all other options */
+#define XT_IPVS_PROTO		0x02
+#define XT_IPVS_VADDR		0x04
+#define XT_IPVS_VPORT		0x08
+#define XT_IPVS_DIR		0x10
+#define XT_IPVS_METHOD		0x20
+#define XT_IPVS_MASK		(0x40 - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS)
+
+struct xt_ipvs {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	u_int16_t		l4proto;
+	u_int16_t		fwd_method;
+
+	u_int8_t invert;
+	u_int8_t bitmask;
+};
+
+#endif /* _XT_IPVS_H */
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index cb3ad74..677a880 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -678,6 +678,14 @@ config NETFILTER_XT_MATCH_IPRANGE
 
 	If unsure, say M.
 
+config NETFILTER_XT_MATCH_IPVS
+	tristate '"ipvs" match support'
+	depends on NETFILTER_ADVANCED
+	help
+	  This option allows you to match against ipvs properties of a packet.
+
+          To compile it as a module, choos M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_LENGTH
 	tristate '"length" match support'
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 6282060..1c18e80 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMIT) += xt_hashlimit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_HL) += xt_hl.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_IPVS) += xt_ipvs.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index a01520e..84bf4e6 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -94,6 +94,7 @@ struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto)
 
 	return NULL;
 }
+EXPORT_SYMBOL(ip_vs_proto_get);
 
 
 /*
diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c
new file mode 100644
index 0000000..00373d9
--- /dev/null
+++ b/net/netfilter/xt_ipvs.c
@@ -0,0 +1,171 @@
+/*
+ *	xt_ipvs - kernel module to match ipvs properties of a packet
+ *
+ *	Author: Hannes Eder <heder@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/spinlock.h>
+#include <linux/skbuff.h>
+#ifdef CONFIG_IP_VS_IPV6
+#include <net/ipv6.h>
+#endif
+#include <linux/types.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_ipvs.h>
+
+#include <net/ip_vs.h>
+
+MODULE_AUTHOR("Hannes Eder <heder@google.com>");
+MODULE_DESCRIPTION("Xtables: match ipvs");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_ipvs");
+MODULE_ALIAS("ip6t_ipvs");
+
+/* borrowed from xt_conntrack */
+static bool ipvs_mt_addrcmp(const union nf_inet_addr *kaddr,
+			    const union nf_inet_addr *uaddr,
+			    const union nf_inet_addr *umask,
+			    unsigned int l3proto)
+{
+	if (l3proto == NFPROTO_IPV4)
+		return ((kaddr->ip ^ uaddr->ip) & umask->ip) == 0;
+#ifdef CONFIG_IP_VS_IPV6
+	else if (l3proto == NFPROTO_IPV6)
+		return ipv6_masked_addr_cmp(&kaddr->in6, &umask->in6,
+		       &uaddr->in6) == 0;
+#endif
+	else
+		return false;
+}
+
+bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par)
+{
+	const struct xt_ipvs *data = par->matchinfo;
+	const u_int8_t family = par->family;
+	struct ip_vs_iphdr iph;
+	struct ip_vs_protocol *pp;
+	struct ip_vs_conn *cp;
+	int af;
+	bool match = true;
+
+	if (data->bitmask == XT_IPVS_IPVS) {
+		match = !!(skb->ipvs_property)
+			^ !!(data->invert & XT_IPVS_IPVS);
+		goto out;
+	}
+
+	/* other flags than XT_IPVS_IPVS are set */
+	if (!skb->ipvs_property) {
+		match = false;
+		goto out;
+	}
+
+	switch (skb->protocol) {
+	case  htons(ETH_P_IP):
+		af = AF_INET;
+		break;
+#ifdef CONFIG_IP_VS_IPV6
+	case  htons(ETH_P_IPV6):
+		af = AF_INET6;
+		break;
+#endif
+	default:
+		match = false;
+		goto out;
+	}
+
+	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+
+	if (data->bitmask & XT_IPVS_PROTO)
+		if ((iph.protocol == data->l4proto)
+		    ^ !(data->invert & XT_IPVS_PROTO)) {
+			match = false;
+			goto out;
+		}
+
+	pp = ip_vs_proto_get(iph.protocol);
+	if (unlikely(!pp)) {
+		match = false;
+		goto out;
+	}
+
+	/*
+	 * Check if the packet belongs to an existing entry
+	 */
+	/* TODO: why does it only match in inverse? */
+	cp = pp->conn_out_get(af, skb, pp, &iph, iph.len, 1 /* inverse */);
+	if (unlikely(!cp)) {
+		match = false;
+		goto out;
+	}
+
+	/*
+	 * We found a connection, i.e. ct != 0, make sure to call
+	 * __ip_vs_conn_put before returning.  In our case jump to out_put_con.
+	 */
+
+	if (data->bitmask & XT_IPVS_VPORT)
+		if ((cp->vport == data->vport)
+		    ^ !(data->invert & XT_IPVS_VPORT)) {
+			match = false;
+			goto out_put_ct;
+		}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		/* TODO: implement */
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD)
+		if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method)
+		    ^ !(data->invert & XT_IPVS_METHOD)) {
+			match = false;
+			goto out_put_ct;
+		}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (af != family) {
+			match = false;
+			goto out_put_ct;
+		}
+
+		if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr, &data->vmask, af)
+		    ^ !(data->invert & XT_IPVS_VADDR)) {
+			match = false;
+			goto out_put_ct;
+		}
+	}
+
+out_put_ct:
+	__ip_vs_conn_put(cp);
+out:
+	return match;
+}
+EXPORT_SYMBOL(ipvs_mt);
+
+static struct xt_match xt_ipvs_mt_reg[] __read_mostly = {
+	{
+		.name       = "ipvs",
+		.revision   = 0,
+		.family     = NFPROTO_UNSPEC,
+		.match      = ipvs_mt,
+		.matchsize  = sizeof(struct xt_ipvs),
+		.me         = THIS_MODULE,
+	},
+};
+
+static int __init ipvs_mt_init(void)
+{
+	return xt_register_matches(xt_ipvs_mt_reg,
+				   ARRAY_SIZE(xt_ipvs_mt_reg));
+}
+
+static void __exit ipvs_mt_exit(void)
+{
+	xt_unregister_matches(xt_ipvs_mt_reg,
+			      ARRAY_SIZE(xt_ipvs_mt_reg));
+}
+
+module_init(ipvs_mt_init);
+module_exit(ipvs_mt_exit);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 3/5] IPVS: make friends with nf_conntrack
  2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
  2009-07-27 13:46 ` [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:" Hannes Eder
  2009-07-27 13:46 ` [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs) Hannes Eder
@ 2009-07-27 13:46 ` Hannes Eder
  2009-07-27 13:46 ` [RFC][PATCH 4/5] IPVS: debugging output for ip_vs_update_conntrack Hannes Eder
  2009-07-27 13:48 ` [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs Hannes Eder
  4 siblings, 0 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:46 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

From: Hannes Eder <hannes@hanneseder.net>

We aim at adding full NAT support to IPVS.  With this patch it is
possible to use netfilters SNAT in POSTROUTING, especially with
xt_ipvs, e.g.:

iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \
	-j SNAT --to-source 192.168.10.10

There might be other use cases.

Current Status:
  - NAT works
  - DR works
  - IPIP not tested
  - overall: needs more testing
  - Performance impact?

Signed-off-by: Hannes Eder <heder@google.com>

 net/netfilter/ipvs/ip_vs_core.c |   36 ------------------------------------
 net/netfilter/ipvs/ip_vs_xmit.c |   28 ++++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 8dddb17..b021464 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -518,26 +518,6 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 	return NF_DROP;
 }
 
-
-/*
- *      It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING
- *      chain, and is used for VS/NAT.
- *      It detects packets for VS/NAT connections and sends the packets
- *      immediately. This can avoid that iptable_nat mangles the packets
- *      for VS/NAT.
- */
-static unsigned int ip_vs_post_routing(unsigned int hooknum,
-				       struct sk_buff *skb,
-				       const struct net_device *in,
-				       const struct net_device *out,
-				       int (*okfn)(struct sk_buff *))
-{
-	if (!skb->ipvs_property)
-		return NF_ACCEPT;
-	/* The packet was sent from IPVS, exit this chain */
-	return NF_STOP;
-}
-
 __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset)
 {
 	return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0));
@@ -1428,14 +1408,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
 		.hooknum        = NF_INET_FORWARD,
 		.priority       = 99,
 	},
-	/* Before the netfilter connection tracking, exit from POST_ROUTING */
-	{
-		.hook		= ip_vs_post_routing,
-		.owner		= THIS_MODULE,
-		.pf		= PF_INET,
-		.hooknum        = NF_INET_POST_ROUTING,
-		.priority       = NF_IP_PRI_NAT_SRC-1,
-	},
 #ifdef CONFIG_IP_VS_IPV6
 	/* After packet filtering, forward packet through VS/DR, VS/TUN,
 	 * or VS/NAT(change destination), so that filtering rules can be
@@ -1464,14 +1436,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
 		.hooknum        = NF_INET_FORWARD,
 		.priority       = 99,
 	},
-	/* Before the netfilter connection tracking, exit from POST_ROUTING */
-	{
-		.hook		= ip_vs_post_routing,
-		.owner		= THIS_MODULE,
-		.pf		= PF_INET6,
-		.hooknum        = NF_INET_POST_ROUTING,
-		.priority       = NF_IP6_PRI_NAT_SRC-1,
-	},
 #endif
 };
 
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 425ab14..f3b6810 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -24,6 +24,7 @@
 #include <net/ip6_route.h>
 #include <linux/icmpv6.h>
 #include <linux/netfilter.h>
+#include <net/netfilter/nf_conntrack.h>
 #include <linux/netfilter_ipv4.h>
 
 #include <net/ip_vs.h>
@@ -344,6 +345,29 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 }
 #endif
 
+static void
+ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp)
+{
+	if (skb->nfct) {
+		struct nf_conn *ct = (struct nf_conn *)skb->nfct;
+
+		if (ct != &nf_conntrack_untracked && !nf_ct_is_confirmed(ct)) {
+			/*
+			 * The connection is not yet in the hashtable, so we
+			 * update it.  CIP->VIP will remain the same, so leave
+			 * the tuple in IP_CT_DIR_ORIGINAL untouched.  When the
+			 * reply comes back from the real-server we will see
+			 * RIP->DIP.
+			 */
+
+			ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr;
+			/* this will also take care for UDP and  */
+			ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port =
+				cp->dport;
+		}
+	}
+}
+
 /*
  *      NAT transmitter (only for outside-to-inside nat forwarding)
  *      Not used for related ICMP
@@ -399,6 +423,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
 
+	ip_vs_update_conntrack(skb, cp);
+
 	/* FIXME: when application helper enlarges the packet and the length
 	   is larger than the MTU of outgoing device, there will be still
 	   MTU problem. */
@@ -475,6 +501,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 
 	IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
 
+	ip_vs_update_conntrack(skb, cp);
+
 	/* FIXME: when application helper enlarges the packet and the length
 	   is larger than the MTU of outgoing device, there will be still
 	   MTU problem. */


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 4/5] IPVS: debugging output for ip_vs_update_conntrack
  2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
                   ` (2 preceding siblings ...)
  2009-07-27 13:46 ` [RFC][PATCH 3/5] IPVS: make friends with nf_conntrack Hannes Eder
@ 2009-07-27 13:46 ` Hannes Eder
  2009-07-27 13:48 ` [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs Hannes Eder
  4 siblings, 0 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:46 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

This patch is not ment to be merged, its mere for debugging during
development.

Signed-off-by: Hannes Eder <heder@google.com>

 net/netfilter/ipvs/ip_vs_xmit.c |   26 ++++++++++++++++++++++++++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index f3b6810..ed6b811 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -24,6 +24,7 @@
 #include <net/ip6_route.h>
 #include <linux/icmpv6.h>
 #include <linux/netfilter.h>
+#define DEBUG
 #include <net/netfilter/nf_conntrack.h>
 #include <linux/netfilter_ipv4.h>
 
@@ -345,12 +346,31 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 }
 #endif
 
+#ifdef DEBUG
+static void
+ip_vs_dump_ct_tuple(const struct nf_conntrack_tuple *t)
+{
+	/*
+	 * We ignore the fact that this is not SMP-safe.  Otherwise we would
+	 * have to duplicate code and this code is not ment to stay here anyway.
+	 */
+	printk(KERN_DEBUG "IPVS: ");
+	nf_ct_dump_tuple(t);
+}
+#endif
+
 static void
 ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp)
 {
 	if (skb->nfct) {
 		struct nf_conn *ct = (struct nf_conn *)skb->nfct;
 
+#ifdef DEBUG
+		ip_vs_dump_ct_tuple(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+		ip_vs_dump_ct_tuple(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+		printk("IPVS: nfct confirmed=%d\n", nf_ct_is_confirmed(ct));
+#endif
+
 		if (ct != &nf_conntrack_untracked && !nf_ct_is_confirmed(ct)) {
 			/*
 			 * The connection is not yet in the hashtable, so we
@@ -365,6 +385,12 @@ ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp)
 			ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port =
 				cp->dport;
 		}
+
+#ifdef DEBUG
+		ip_vs_dump_ct_tuple(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+		ip_vs_dump_ct_tuple(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+#endif
+
 	}
 }
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs
  2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
                   ` (3 preceding siblings ...)
  2009-07-27 13:46 ` [RFC][PATCH 4/5] IPVS: debugging output for ip_vs_update_conntrack Hannes Eder
@ 2009-07-27 13:48 ` Hannes Eder
  2009-07-27 18:40   ` Jan Engelhardt
  4 siblings, 1 reply; 12+ messages in thread
From: Hannes Eder @ 2009-07-27 13:48 UTC (permalink / raw)
  To: lvs-devel; +Cc: netdev, netfilter-devel, linux-kernel

Signed-off-by: Hannes Eder <heder@google.com>

 extensions/libxt_ipvs.c           |  350 +++++++++++++++++++++++++++++++++++++
 extensions/libxt_ipvs.man         |    7 +
 include/linux/netfilter/xt_ipvs.h |   32 +++
 3 files changed, 389 insertions(+), 0 deletions(-)
 create mode 100644 extensions/libxt_ipvs.c
 create mode 100644 extensions/libxt_ipvs.man
 create mode 100644 include/linux/netfilter/xt_ipvs.h

diff --git a/extensions/libxt_ipvs.c b/extensions/libxt_ipvs.c
new file mode 100644
index 0000000..dc48aee
--- /dev/null
+++ b/extensions/libxt_ipvs.c
@@ -0,0 +1,350 @@
+/* Shared library add-on to iptables to add ipvs matching.
+ *
+ * Detailed doc is in the kernel module source
+ * net/netfilter/xt_ipvs.c
+ *
+ * Author: Hannes Eder <heder@google.com>
+ */
+#include <sys/types.h>
+#include <assert.h>
+#include <ctype.h>
+#include <errno.h>
+#include <getopt.h>
+#include <netdb.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <xtables.h>
+#include <linux/netfilter/xt_ipvs.h>
+
+
+static const struct option ipvs_mt_opts[] = {
+	{ .name = "ipvs",    .has_arg = false, .val = '0' },
+	{ .name = "vproto",  .has_arg = true,  .val = '1' },
+	{ .name = "vaddr",   .has_arg = true,  .val = '2' },
+	{ .name = "vport",   .has_arg = true,  .val = '3' },
+	{ .name = "vdir",    .has_arg = true,  .val = '4' },
+	{ .name = "vmethod", .has_arg = true,  .val = '5' },
+	{ .name = NULL }
+};
+
+static void ipvs_mt_help(void)
+{
+	printf(
+"ipvs match options:\n"
+"[!] --ipvs\n"
+"\n"
+"Any of the following options implies --ipvs (even negated)\n"
+"[!] --vproto protocol\n"
+"[!] --vaddr address[/mask]\n"
+"[!] --vport port\n"
+"    --vdir {ORIGINAL|REPLY}\n"
+"[!] --vmethod {GATE|IPIP|MASQ}\n"
+		);
+}
+
+static void ipvs_mt_parse_addr_and_mask(const char *arg,
+					union nf_inet_addr *address,
+					union nf_inet_addr *mask,
+					unsigned int family)
+{
+	struct in_addr *addr = NULL;
+	struct in6_addr *addr6 = NULL;
+	unsigned int naddrs = 0;
+
+	if (family == NFPROTO_IPV4) {
+		xtables_ipparse_any(arg, &addr, &mask->in, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in, addr, sizeof(*addr));
+	} else if (family == NFPROTO_IPV6) {
+		xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs);
+		if (naddrs > 1)
+			xtables_error(PARAMETER_PROBLEM,
+				      "multiple IP addresses not allowed");
+		if (naddrs == 1)
+			memcpy(&address->in6, addr6, sizeof(*addr6));
+	} else {
+		/* Hu? */
+		assert(false);
+	}
+}
+
+/* Function which parses command options; returns true if it ate an option */
+static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags,
+			 const void *entry, struct xt_entry_match **match,
+			 unsigned int family)
+{
+	struct xt_ipvs *data = (void *)(*match)->data;
+	char *p = NULL;
+	u_int8_t op = 0;
+
+	if ('0' <= c && c <= '5') {
+		int ops[] = {
+			XT_IPVS_IPVS,
+			XT_IPVS_PROTO,
+			XT_IPVS_VADDR,
+			XT_IPVS_VPORT,
+			XT_IPVS_DIR,
+			XT_IPVS_METHOD
+		};
+		op = ops[c - '0'];
+	} else
+		return 0;
+
+	if (*flags & op & XT_IPVS_ONCE_MASK)
+		goto multiple_use;
+
+	switch (c) {
+	case '0': /* --ipvs */
+		/* Nothing to do here. */
+		break;
+
+	case '1': /* --vproto */
+		/* Canonicalize into lower case */
+		for (p = optarg; *p != '\0'; ++p)
+			*p = tolower(*p);
+
+		data->l4proto = xtables_parse_protocol(optarg);
+		break;
+
+	case '2': /* --vaddr */
+		ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr,
+					    &data->vmask, family);
+		break;
+
+	case '3': /* --vport */
+		data->vport = htons(xtables_parse_port(optarg, "tcp"));
+		break;
+
+	case '4': /* --vdir */
+		xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert);
+		if (strcasecmp(optarg, "ORIGINAL") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert   &= ~XT_IPVS_DIR;
+		} else if (strcasecmp(optarg, "REPLY") == 0) {
+			data->bitmask |= XT_IPVS_DIR;
+			data->invert  |= XT_IPVS_DIR;
+		} else {
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vdir", optarg);
+		}
+		break;
+
+	case '5': /* --vmethod */
+		if (strcasecmp(optarg, "GATE") == 0)
+			data->fwd_method = IP_VS_CONN_F_DROUTE;
+		else if (strcasecmp(optarg, "IPIP") == 0)
+			data->fwd_method = IP_VS_CONN_F_TUNNEL;
+		else if (strcasecmp(optarg, "MASQ") == 0)
+			data->fwd_method = IP_VS_CONN_F_MASQ;
+		else
+			xtables_param_act(XTF_BAD_VALUE,
+					  "ipvs", "--vmethod", optarg);
+		break;
+
+	default:
+		/* Hu? How did we came here? */
+		assert(false);
+		return 0;
+	}
+
+	if (op & XT_IPVS_ONCE_MASK) {
+		if (data->invert & XT_IPVS_IPVS)
+			xtables_error(PARAMETER_PROBLEM,
+				      "! --ipvs can not be together with"
+				      " other options");
+		data->bitmask |= XT_IPVS_IPVS;
+	}
+
+	data->bitmask |= op;
+	if (invert)
+		data->invert |= op;
+	*flags |= op;
+	return 1;
+
+multiple_use:
+	xtables_error(PARAMETER_PROBLEM,
+		      "multiple use of the same ipvs option is not allowed");
+}
+
+static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV4);
+}
+
+static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags,
+			  const void *entry, struct xt_entry_match **match)
+{
+	return ipvs_mt_parse(c, argv, invert, flags, entry, match,
+			     NFPROTO_IPV6);
+}
+
+static void ipvs_mt_check(unsigned int flags)
+{
+	if (flags == 0)
+		xtables_error(PARAMETER_PROBLEM,
+			      "ipvs: At least one option is required");
+}
+
+/* Shamelessly copied from libxt_conntrack.c */
+static void
+ipvs_mt_dump_addr(const union nf_inet_addr *addr,
+		  const union nf_inet_addr *mask,
+		  unsigned int family, bool numeric)
+{
+	char buf[BUFSIZ];
+
+	if (family == NFPROTO_IPV4) {
+		if (!numeric && addr->ip == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
+		else
+			strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
+		strcat(buf, xtables_ipmask_to_numeric(&mask->in));
+		printf("%s ", buf);
+	} else if (family == NFPROTO_IPV6) {
+		if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 &&
+		    addr->ip6[2] == 0 && addr->ip6[3] == 0) {
+			printf("anywhere ");
+			return;
+		}
+		if (numeric)
+			strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6));
+		else
+			strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6));
+		strcat(buf, xtables_ip6mask_to_numeric(&mask->in6));
+		printf("%s ", buf);
+	}
+}
+
+static void ipvs_mt_dump(const void *ip, const struct xt_ipvs *data,
+			 unsigned int family, bool numeric, const char *prefix)
+{
+	if (data->bitmask == XT_IPVS_IPVS) {
+		if (data->invert & XT_IPVS_IPVS)
+			printf("! ");
+		printf("%sipvs ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_PROTO) {
+		if (data->invert & XT_IPVS_PROTO)
+			printf("! ");
+		printf("%sproto %u ", prefix, data->l4proto);
+	}
+
+	if (data->bitmask & XT_IPVS_VADDR) {
+		if (data->invert & XT_IPVS_VADDR)
+			printf("! ");
+
+		printf("%svaddr ", prefix);
+		ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric);
+	}
+
+	if (data->bitmask & XT_IPVS_VPORT) {
+		if (data->invert & XT_IPVS_VPORT)
+			printf("! ");
+
+		printf("%svport %u ", prefix, ntohs(data->vport));
+	}
+
+	if (data->bitmask & XT_IPVS_DIR) {
+		if (data->invert & XT_IPVS_DIR)
+			printf("%svdir REPLY ", prefix);
+		else
+			printf("%svdir ORIGINAL ", prefix);
+	}
+
+	if (data->bitmask & XT_IPVS_METHOD) {
+		if (data->invert & XT_IPVS_METHOD)
+			printf("! ");
+
+		printf("%svmethod ", prefix);
+		switch (data->fwd_method) {
+		case IP_VS_CONN_F_DROUTE:
+			printf("GATE ");
+			break;
+		case IP_VS_CONN_F_TUNNEL:
+			printf("IPIP ");
+			break;
+		case IP_VS_CONN_F_MASQ:
+			printf("MASQ ");
+			break;
+		default:
+			/* Hu? */
+			printf("UNKNOWN ");
+			break;
+		}
+	}
+
+}
+
+static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, "");
+}
+
+static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match,
+			   int numeric)
+{
+	const struct xt_ipvs *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, "");
+}
+
+static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--");
+}
+
+static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match)
+{
+	const struct xt_ipvs *data = (const void *)match->data;
+	ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--");
+}
+
+static struct xtables_match ipvs_matches_reg[] = {
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV4,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt4_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt4_print,
+		.save          = ipvs_mt4_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+	{
+		.version       = XTABLES_VERSION,
+		.name          = "ipvs",
+		.revision      = 0,
+		.family        = NFPROTO_IPV6,
+		.size          = XT_ALIGN(sizeof(struct xt_ipvs)),
+		.userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)),
+		.help          = ipvs_mt_help,
+		.parse         = ipvs_mt6_parse,
+		.final_check   = ipvs_mt_check,
+		.print         = ipvs_mt6_print,
+		.save          = ipvs_mt6_save,
+		.extra_opts    = ipvs_mt_opts,
+	},
+};
+
+void _init(void)
+{
+	xtables_register_matches(ipvs_matches_reg,
+				 ARRAY_SIZE(ipvs_matches_reg));
+}
diff --git a/extensions/libxt_ipvs.man b/extensions/libxt_ipvs.man
new file mode 100644
index 0000000..2c842d6
--- /dev/null
+++ b/extensions/libxt_ipvs.man
@@ -0,0 +1,7 @@
+ipvs tests where the packet was modified by IPVS, i.e. is the
+skb_buff->ipvs_property set.
+.TP
+[\fB!\fP] \fB--ipvs
+Does the packet have to IPVS property?
+
+TODO: Write proper documentation.
diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h
new file mode 100644
index 0000000..3a70289
--- /dev/null
+++ b/include/linux/netfilter/xt_ipvs.h
@@ -0,0 +1,32 @@
+#ifndef _XT_IPVS_H
+#define _XT_IPVS_H 1
+
+#ifndef _IP_VS_H
+#define IP_VS_CONN_F_FWD_MASK	0x0007		/* mask for the fwd methods */
+#define IP_VS_CONN_F_MASQ	0x0000		/* masquerading/NAT */
+#define IP_VS_CONN_F_LOCALNODE	0x0001		/* local node */
+#define IP_VS_CONN_F_TUNNEL	0x0002		/* tunneling */
+#define IP_VS_CONN_F_DROUTE	0x0003		/* direct routing */
+#define IP_VS_CONN_F_BYPASS	0x0004		/* cache bypass */
+#endif
+
+#define XT_IPVS_IPVS		0x01 /* this is implied by all other options */
+#define XT_IPVS_PROTO		0x02
+#define XT_IPVS_VADDR		0x04
+#define XT_IPVS_VPORT		0x08
+#define XT_IPVS_DIR		0x10
+#define XT_IPVS_METHOD		0x20
+#define XT_IPVS_MASK		(0x40 - 1)
+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS)
+
+struct xt_ipvs {
+	union nf_inet_addr	vaddr, vmask;
+	__be16			vport;
+	u_int16_t		l4proto;
+	u_int16_t		fwd_method;
+
+	u_int8_t invert;
+	u_int8_t bitmask;
+};
+
+#endif /* _XT_IPVS_H */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:"
  2009-07-27 13:46 ` [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:" Hannes Eder
@ 2009-07-27 18:14   ` Jan Engelhardt
  2009-07-28 11:15     ` Hannes Eder
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Engelhardt @ 2009-07-27 18:14 UTC (permalink / raw)
  To: Hannes Eder; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel


On Monday 2009-07-27 15:46, Hannes Eder wrote:
>
>Now all printk messages from IPVS are prefixed with "IPVS:".
>
>+#define EnterFunction(level)						       \
>+	do {								       \
>+		if (level <= ip_vs_get_debug_level())			       \
>+			printk(KERN_DEBUG "IPVS: Enter: %s, %s line %i\n",     \
>+				__func__, __FILE__, __LINE__);		       \
>+	} while (0)
>+#define LeaveFunction(level)						       \
>+	do {								       \
>+		if (level <= ip_vs_get_debug_level())			       \
>+			printk(KERN_DEBUG "IPVS: Leave: %s, %s line %i\n",     \
>+				__func__, __FILE__, __LINE__);		       \
>+	} while (0)

I think you should rather make use of pr_fmt:

<before any #includes>
#define pr_fmt(x) "IPVS: " x

And then use pr_<level>("Elvis has left the building") in code. This
will add IPVS: automatically to all pr_* calls, alleviating the need
to manually type it into all printks.
Of course, if you only want it for the two defines here, scrap
my idea :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs)
  2009-07-27 13:46 ` [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs) Hannes Eder
@ 2009-07-27 18:35   ` Jan Engelhardt
  2009-07-28 14:55     ` Hannes Eder
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Engelhardt @ 2009-07-27 18:35 UTC (permalink / raw)
  To: Hannes Eder; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel


On Monday 2009-07-27 15:46, Hannes Eder wrote:
>--- /dev/null
>+++ b/include/linux/netfilter/xt_ipvs.h
>@@ -0,0 +1,32 @@
>+#ifndef _XT_IPVS_H
>+#define _XT_IPVS_H 1
>+
>+#ifndef _IP_VS_H

Do the definitions (should probably be enums) exist in
very old <linux/ip_vs.h>? Then maybe you can get rid of them
from xt_ipvs.h.

>+#define IP_VS_CONN_F_FWD_MASK	0x0007		/* mask for the fwd methods */
>+#define IP_VS_CONN_F_MASQ	0x0000		/* masquerading/NAT */
>+#define IP_VS_CONN_F_LOCALNODE	0x0001		/* local node */
>+#define IP_VS_CONN_F_TUNNEL	0x0002		/* tunneling */
>+#define IP_VS_CONN_F_DROUTE	0x0003		/* direct routing */
>+#define IP_VS_CONN_F_BYPASS	0x0004		/* cache bypass */
>+#endif
>+
>+#define XT_IPVS_IPVS		0x01 /* this is implied by all other options */
>+#define XT_IPVS_PROTO		0x02
>+#define XT_IPVS_VADDR		0x04
>+#define XT_IPVS_VPORT		0x08
>+#define XT_IPVS_DIR		0x10
>+#define XT_IPVS_METHOD		0x20
>+#define XT_IPVS_MASK		(0x40 - 1)
>+#define XT_IPVS_ONCE_MASK	(XT_IPVS_MASK & ~XT_IPVS_IPVS)
>+
>+struct xt_ipvs {
>+	union nf_inet_addr	vaddr, vmask;
>+	__be16			vport;
>+	u_int16_t		l4proto;
>+	u_int16_t		fwd_method;
>+
>+	u_int8_t invert;
>+	u_int8_t bitmask;
>+};

As per the "Writing Netfilter modules" e-book, __u16/__u8 is required.

>+config NETFILTER_XT_MATCH_IPVS
>+	tristate '"ipvs" match support'
>+	depends on NETFILTER_ADVANCED
>+	help
>+	  This option allows you to match against ipvs properties of a packet.
>+
>+          To compile it as a module, choos M here.  If unsure, say N.
>+

IMHO the "to compile it as a module, choos[e] M" is a relict from
old times and should just be dropped. These days, I stupilate,
everyone knows that M makes modules. And if it doesnot, then
it's not allowed by Kconfig :>

>+MODULE_AUTHOR("Hannes Eder <heder@google.com>");
>+MODULE_DESCRIPTION("Xtables: match ipvs");

"Match IPVS connection properties" is what you previously stated,
and which I think is good. Use it here, too.

>+bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par)
>+{
>+	const struct xt_ipvs *data = par->matchinfo;
>+	const u_int8_t family = par->family;
>+	struct ip_vs_iphdr iph;
>+	struct ip_vs_protocol *pp;
>+	struct ip_vs_conn *cp;
>+	int af;
>+	bool match = true;
>+
>+	if (data->bitmask == XT_IPVS_IPVS) {
>+		match = !!(skb->ipvs_property)
>+			^ !!(data->invert & XT_IPVS_IPVS);
>+		goto out;
>+	}

XT_IPVS_IPVS? What's that supposed to tell me...
Anyway, this can be made much shorter, given that all following
the "out" label is a return:

	if (data->bitmask == XT_IPVS_IPVS)
		return skb->ipvs_property ^ !!(data->invert & XT_IPVS_IPVS);

>+	/* other flags than XT_IPVS_IPVS are set */
>+	if (!skb->ipvs_property) {
>+		match = false;
>+		goto out;
>+	}

	if (!skb->ipvs_property)
		return false;

>+	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>+
>+	if (data->bitmask & XT_IPVS_PROTO)
>+		if ((iph.protocol == data->l4proto)
>+		    ^ !(data->invert & XT_IPVS_PROTO)) {
>+			match = false;
>+			goto out;
>+		}

		if (iph.protocol == data->l4proto ^
		    !(data->invert & XT_IPVS_PROTO))
			return false;

>+	pp = ip_vs_proto_get(iph.protocol);
>+	if (unlikely(!pp)) {
>+		match = false;
>+		goto out;
>+	}

	if (unlikely(pp == NULL))
		return false;

Only after here we really need goto (to put pp).

>+	/*
>+	 * Check if the packet belongs to an existing entry
>+	 */
>+	/* TODO: why does it only match in inverse? */

FIXME: Figure it out :)

>+	cp = pp->conn_out_get(af, skb, pp, &iph, iph.len, 1 /* inverse */);
>+	if (unlikely(!cp)) {
>+		match = false;
>+		goto out;
>+	}
>+
>+	/*
>+	 * We found a connection, i.e. ct != 0, make sure to call
>+	 * __ip_vs_conn_put before returning.  In our case jump to out_put_con.
>+	 */
>+
>+	if (data->bitmask & XT_IPVS_VPORT)
>+		if ((cp->vport == data->vport)
>+		    ^ !(data->invert & XT_IPVS_VPORT)) {
>+			match = false;
>+			goto out_put_ct;
>+		}
>+
>+	if (data->bitmask & XT_IPVS_DIR) {
>+		/* TODO: implement */

		/* Yes please */

>+	}
>+
>+	if (data->bitmask & XT_IPVS_METHOD)
>+		if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method)
>+		    ^ !(data->invert & XT_IPVS_METHOD)) {
>+			match = false;
>+			goto out_put_ct;
>+		}
>+
>+	if (data->bitmask & XT_IPVS_VADDR) {
>+		if (af != family) {
>+			match = false;
>+			goto out_put_ct;
>+		}
>+
>+		if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr, &data->vmask, af)
>+		    ^ !(data->invert & XT_IPVS_VADDR)) {

I think the operator (^ in this case) always goes onto the same line as the ')'
((a) ^\n(b), not ((a)\n ^(b)), though some legacy code seems to do it
random so-so.)

>+	return match;
>+}
>+EXPORT_SYMBOL(ipvs_mt);

What will be using ipvs_mt?

>+static struct xt_match xt_ipvs_mt_reg[] __read_mostly = {
>+	{
>+		.name       = "ipvs",
>+		.revision   = 0,
>+		.family     = NFPROTO_UNSPEC,
>+		.match      = ipvs_mt,
>+		.matchsize  = sizeof(struct xt_ipvs),
>+		.me         = THIS_MODULE,
>+	},
>+};

There is just one element, so no strict need for the [] array.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs
  2009-07-27 13:48 ` [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs Hannes Eder
@ 2009-07-27 18:40   ` Jan Engelhardt
  2009-07-28 12:34     ` Hannes Eder
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Engelhardt @ 2009-07-27 18:40 UTC (permalink / raw)
  To: Hannes Eder; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel


On Monday 2009-07-27 15:48, Hannes Eder wrote:
>+
>+	switch (c) {
>+	case '0': /* --ipvs */
>+		/* Nothing to do here. */

		Then why add it?

>+	char buf[BUFSIZ];
>+
>+	if (family == NFPROTO_IPV4) {
>+		if (!numeric && addr->ip == 0) {
>+			printf("anywhere ");
>+			return;
>+		}
>+		if (numeric)
>+			strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
>+		else
>+			strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
>+		strcat(buf, xtables_ipmask_to_numeric(&mask->in));
>+		printf("%s ", buf);

There is no need to use the strcpy/strcat hacks. Just directly printf it.

>--- /dev/null
>+++ b/extensions/libxt_ipvs.man
>@@ -0,0 +1,7 @@
>+ipvs tests where the packet was modified by IPVS, i.e. is the
>+skb_buff->ipvs_property set.
>+.TP
>+[\fB!\fP] \fB--ipvs
>+Does the packet have to IPVS property?
>+
>+TODO: Write proper documentation.

Yes.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:"
  2009-07-27 18:14   ` Jan Engelhardt
@ 2009-07-28 11:15     ` Hannes Eder
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-28 11:15 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel

On Mon, Jul 27, 2009 at 20:14, Jan Engelhardt<jengelh@medozas.de> wrote:
>
> On Monday 2009-07-27 15:46, Hannes Eder wrote:
>>
>>Now all printk messages from IPVS are prefixed with "IPVS:".
>>
>>+#define EnterFunction(level)                                                 \
>>+      do {                                                                   \
>>+              if (level <= ip_vs_get_debug_level())                          \
>>+                      printk(KERN_DEBUG "IPVS: Enter: %s, %s line %i\n",     \
>>+                              __func__, __FILE__, __LINE__);                 \
>>+      } while (0)
>>+#define LeaveFunction(level)                                                 \
>>+      do {                                                                   \
>>+              if (level <= ip_vs_get_debug_level())                          \
>>+                      printk(KERN_DEBUG "IPVS: Leave: %s, %s line %i\n",     \
>>+                              __func__, __FILE__, __LINE__);                 \
>>+      } while (0)
>
> I think you should rather make use of pr_fmt:
>
> <before any #includes>
> #define pr_fmt(x) "IPVS: " x
>
> And then use pr_<level>("Elvis has left the building") in code. This
> will add IPVS: automatically to all pr_* calls, alleviating the need
> to manually type it into all printks.

I like this idea.  I'll come up with an extra patch, it does not fit
into this series anyway.

> Of course, if you only want it for the two defines here, scrap
> my idea :)
>

Cheers,
-Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs
  2009-07-27 18:40   ` Jan Engelhardt
@ 2009-07-28 12:34     ` Hannes Eder
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-28 12:34 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel

On Mon, Jul 27, 2009 at 20:40, Jan Engelhardt<jengelh@medozas.de> wrote:
>
> On Monday 2009-07-27 15:48, Hannes Eder wrote:
>>+
>>+      switch (c) {
>>+      case '0': /* --ipvs */
>>+              /* Nothing to do here. */
>
>                Then why add it?

In the 'default' branch is an assert(false);  Call it defensive programming.

>>+      char buf[BUFSIZ];
>>+
>>+      if (family == NFPROTO_IPV4) {
>>+              if (!numeric && addr->ip == 0) {
>>+                      printf("anywhere ");
>>+                      return;
>>+              }
>>+              if (numeric)
>>+                      strcpy(buf, xtables_ipaddr_to_numeric(&addr->in));
>>+              else
>>+                      strcpy(buf, xtables_ipaddr_to_anyname(&addr->in));
>>+              strcat(buf, xtables_ipmask_to_numeric(&mask->in));
>>+              printf("%s ", buf);
>
> There is no need to use the strcpy/strcat hacks. Just directly printf it.

As the comment says: "Shamelessly copied from libxt_conntrack.c". ;)

Furthermore I think it is good that way, because
xtables_ipaddr_to_numeric writes to a local static buffer, and
xtables_ipaddr_to_numeric might get called by
xtables_ipmask_to_numeric.

>>--- /dev/null
>>+++ b/extensions/libxt_ipvs.man
>>@@ -0,0 +1,7 @@
>>+ipvs tests where the packet was modified by IPVS, i.e. is the
>>+skb_buff->ipvs_property set.
>>+.TP
>>+[\fB!\fP] \fB--ipvs
>>+Does the packet have to IPVS property?
>>+
>>+TODO: Write proper documentation.
>
> Yes.

Sir, yes, sir ;) I am working on that.

Thanks,
-Hannes
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs)
  2009-07-27 18:35   ` Jan Engelhardt
@ 2009-07-28 14:55     ` Hannes Eder
  0 siblings, 0 replies; 12+ messages in thread
From: Hannes Eder @ 2009-07-28 14:55 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: lvs-devel, netdev, netfilter-devel, linux-kernel

On Mon, Jul 27, 2009 at 20:35, Jan Engelhardt<jengelh@medozas.de> wrote:
>
> On Monday 2009-07-27 15:46, Hannes Eder wrote:
>>--- /dev/null
>>+++ b/include/linux/netfilter/xt_ipvs.h
>>@@ -0,0 +1,32 @@
>>+#ifndef _XT_IPVS_H
>>+#define _XT_IPVS_H 1
>>+
>>+#ifndef _IP_VS_H
>
> Do the definitions (should probably be enums) exist in
> very old <linux/ip_vs.h>? Then maybe you can get rid of them
> from xt_ipvs.h.

Definitions removed from xt_ipvs.h.  For xt_ipvs.c I just included
linux/ip_vs.h.  For libxt_ipvs (user space library) I added the entire
linux/ip_vs.h.  Do you think this is better?

>>+#define IP_VS_CONN_F_FWD_MASK 0x0007          /* mask for the fwd methods */
>>+#define IP_VS_CONN_F_MASQ     0x0000          /* masquerading/NAT */
>>+#define IP_VS_CONN_F_LOCALNODE        0x0001          /* local node */
>>+#define IP_VS_CONN_F_TUNNEL   0x0002          /* tunneling */
>>+#define IP_VS_CONN_F_DROUTE   0x0003          /* direct routing */
>>+#define IP_VS_CONN_F_BYPASS   0x0004          /* cache bypass */
>>+#endif
>>+
>>+#define XT_IPVS_IPVS          0x01 /* this is implied by all other options */
>>+#define XT_IPVS_PROTO         0x02
>>+#define XT_IPVS_VADDR         0x04
>>+#define XT_IPVS_VPORT         0x08
>>+#define XT_IPVS_DIR           0x10
>>+#define XT_IPVS_METHOD                0x20
>>+#define XT_IPVS_MASK          (0x40 - 1)
>>+#define XT_IPVS_ONCE_MASK     (XT_IPVS_MASK & ~XT_IPVS_IPVS)
>>+
>>+struct xt_ipvs {
>>+      union nf_inet_addr      vaddr, vmask;
>>+      __be16                  vport;
>>+      u_int16_t               l4proto;
>>+      u_int16_t               fwd_method;
>>+
>>+      u_int8_t invert;
>>+      u_int8_t bitmask;
>>+};
>
> As per the "Writing Netfilter modules" e-book, __u16/__u8 is required.

Done.

>>+config NETFILTER_XT_MATCH_IPVS
>>+      tristate '"ipvs" match support'
>>+      depends on NETFILTER_ADVANCED
>>+      help
>>+        This option allows you to match against ipvs properties of a packet.
>>+
>>+          To compile it as a module, choos M here.  If unsure, say N.
>>+
>IMHO the "to compile it as a module, choos[e] M" is a relict from
> old times and should just be dropped. These days, I stupilate,
> everyone knows that M makes modules. And if it doesnot, then
> it's not allowed by Kconfig :>

Done.

>>+MODULE_AUTHOR("Hannes Eder <heder@google.com>");
>>+MODULE_DESCRIPTION("Xtables: match ipvs");
>
> "Match IPVS connection properties" is what you previously stated,
> and which I think is good. Use it here, too.

Done.

>>+bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par)
>>+{
>>+      const struct xt_ipvs *data = par->matchinfo;
>>+      const u_int8_t family = par->family;
>>+      struct ip_vs_iphdr iph;
>>+      struct ip_vs_protocol *pp;
>>+      struct ip_vs_conn *cp;
>>+      int af;
>>+      bool match = true;
>>+
>>+      if (data->bitmask == XT_IPVS_IPVS) {
>>+              match = !!(skb->ipvs_property)
>>+                      ^ !!(data->invert & XT_IPVS_IPVS);
>>+              goto out;
>>+      }
>
> XT_IPVS_IPVS? What's that supposed to tell me...

Just matching aginst the skb->ipvs_property, I changed to name to
XT_IPVS_IPVS_PROPERTY.

> Anyway, this can be made much shorter, given that all following
> the "out" label is a return:

You are right.  For the moment, I'll keep it as is.  Having a single
entry/exit point makes it easier to instrument the function for
debugging.

>
>        if (data->bitmask == XT_IPVS_IPVS)
>                return skb->ipvs_property ^ !!(data->invert & XT_IPVS_IPVS);
>
>>+      /* other flags than XT_IPVS_IPVS are set */
>>+      if (!skb->ipvs_property) {
>>+              match = false;
>>+              goto out;
>>+      }
>
>        if (!skb->ipvs_property)
>                return false;
>
>>+      ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
>>+
>>+      if (data->bitmask & XT_IPVS_PROTO)
>>+              if ((iph.protocol == data->l4proto)
>>+                  ^ !(data->invert & XT_IPVS_PROTO)) {
>>+                      match = false;
>>+                      goto out;
>>+              }
>
>                if (iph.protocol == data->l4proto ^
>                    !(data->invert & XT_IPVS_PROTO))
>                        return false;
>
>>+      pp = ip_vs_proto_get(iph.protocol);
>>+      if (unlikely(!pp)) {
>>+              match = false;
>>+              goto out;
>>+      }
>
>        if (unlikely(pp == NULL))
>                return false;
>
> Only after here we really need goto (to put pp).
>
>>+      /*
>>+       * Check if the packet belongs to an existing entry
>>+       */
>>+      /* TODO: why does it only match in inverse? */
>
> FIXME: Figure it out :)

Still working on that ;)

>>+      cp = pp->conn_out_get(af, skb, pp, &iph, iph.len, 1 /* inverse */);
>>+      if (unlikely(!cp)) {
>>+              match = false;
>>+              goto out;
>>+      }
>>+
>>+      /*
>>+       * We found a connection, i.e. ct != 0, make sure to call
>>+       * __ip_vs_conn_put before returning.  In our case jump to out_put_con.
>>+       */
>>+
>>+      if (data->bitmask & XT_IPVS_VPORT)
>>+              if ((cp->vport == data->vport)
>>+                  ^ !(data->invert & XT_IPVS_VPORT)) {
>>+                      match = false;
>>+                      goto out_put_ct;
>>+              }
>>+
>>+      if (data->bitmask & XT_IPVS_DIR) {
>>+              /* TODO: implement */
>
>                /* Yes please */

Here we go:

	if (data->bitmask & XT_IPVS_DIR) {
		enum ip_conntrack_info ctinfo;
		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);

		if (ct == NULL || ct == &nf_conntrack_untracked) {
			match = false;
			goto out_put_cp;
		}

		if ((ctinfo >= IP_CT_IS_REPLY) ^
		    !!(data->invert & XT_IPVS_DIR)) {
			match = false;
			goto out_put_cp;
		}
	}

>>+      }
>>+
>>+      if (data->bitmask & XT_IPVS_METHOD)
>>+              if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method)
>>+                  ^ !(data->invert & XT_IPVS_METHOD)) {
>>+                      match = false;
>>+                      goto out_put_ct;
>>+              }
>>+
>>+      if (data->bitmask & XT_IPVS_VADDR) {
>>+              if (af != family) {
>>+                      match = false;
>>+                      goto out_put_ct;
>>+              }
>>+
>>+              if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr, &data->vmask, af)
>>+                  ^ !(data->invert & XT_IPVS_VADDR)) {
>
> I think the operator (^ in this case) always goes onto the same line as the ')'
> ((a) ^\n(b), not ((a)\n ^(b)), though some legacy code seems to do it
> random so-so.)

I changed it as suggested, though I could not find anything in
Documentation/CodingStyle about that.

>>+      return match;
>>+}
>>+EXPORT_SYMBOL(ipvs_mt);
>
> What will be using ipvs_mt?

Nobody, EXPORT_SYMBOL removed.

>>+static struct xt_match xt_ipvs_mt_reg[] __read_mostly = {
>>+      {
>>+              .name       = "ipvs",
>>+              .revision   = 0,
>>+              .family     = NFPROTO_UNSPEC,
>>+              .match      = ipvs_mt,
>>+              .matchsize  = sizeof(struct xt_ipvs),
>>+              .me         = THIS_MODULE,
>>+      },
>>+};
>
> There is just one element, so no strict need for the [] array.

You are right.  Done.

Thanks for the review.

Cheers,
-Hannes
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-07-28 14:55 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-27 13:46 [RFC][PATCH 0/5] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder
2009-07-27 13:46 ` [RFC][PATCH 1/5] IPVS: prefix EnterFunction and LeaveFunction msg with "IPVS:" Hannes Eder
2009-07-27 18:14   ` Jan Engelhardt
2009-07-28 11:15     ` Hannes Eder
2009-07-27 13:46 ` [RFC][PATCH 2/5] netfilter: xt_ipvs (netfilter matcher for ipvs) Hannes Eder
2009-07-27 18:35   ` Jan Engelhardt
2009-07-28 14:55     ` Hannes Eder
2009-07-27 13:46 ` [RFC][PATCH 3/5] IPVS: make friends with nf_conntrack Hannes Eder
2009-07-27 13:46 ` [RFC][PATCH 4/5] IPVS: debugging output for ip_vs_update_conntrack Hannes Eder
2009-07-27 13:48 ` [RFC][PATCH 5/5] libxt_ipvs: user space lib for netfilter matcher xt_ipvs Hannes Eder
2009-07-27 18:40   ` Jan Engelhardt
2009-07-28 12:34     ` Hannes Eder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).