[PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation
  2010-03-31 10:31 nf-next: TEE and nesting Jan Engelhardt
@ 2010-03-31 10:31 ` Jan Engelhardt
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:31 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Similar to how IPv4's ip_output.c works, have ip6_output also check
the IPSKB_REROUTED flag. It will be set from xt_TEE for cloned packets
since Xtables can currently only deal with a single packet in flight
at a time.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv6/ip6_output.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f314ba4..7e10f62 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -172,8 +172,9 @@ int ip6_output(struct sk_buff *skb)
 		return 0;
 	}
 
-	return NF_HOOK(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, dev,
-		       ip6_finish_output);
+	return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, dev,
+			    ip6_finish_output,
+			    !(IP6CB(skb)->flags & IPSKB_REROUTED));
 }
 
 /*
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* nf-next: TEE and nesting
@ 2010-03-31 10:38 Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation Jan Engelhardt
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev


> Hi,
>
> next on the calendar is the xt_TEE submission.
>
>
> The following changes since commit b44672889c11e13e4f4dc0a8ee23f0e64f1e57c6:
>   Jan Engelhardt (1):
>         netfilter: xtables: merge registration structure to NFPROTO_UNSPEC
>
> are available in the git repository at:
>
>   git://dev.medozas.de/linux master
>
> Jan Engelhardt (5):
>       netfilter: ipv6: move POSTROUTING invocation before fragmentation
>       net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation
>       netfilter: xtables: inclusion of xt_TEE
>       netfilter: xtables2: make ip_tables reentrant
>       netfilter: xt_TEE: have cloned packet travel through Xtables too

cc'd netdev due to larger changes to the IPv6 code.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation
  2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
@ 2010-03-31 10:38 ` Jan Engelhardt
  2010-04-01 10:23   ` Patrick McHardy
  2010-03-31 10:38 ` [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation Jan Engelhardt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Patrick McHardy notes: "We used to invoke IPv4 POST_ROUTING after
fragmentation as well just to defragment the packets in conntrack
immediately afterwards, but that got changed during the
netfilter-ipsec integration. Ideally IPv6 would behave like IPv4."

This patch makes it so. Sending an oversized frame (e.g. `ping6
-s64000 -c1 ::1`) will now show up in POSTROUTING as a single skb
rather than multiple ones.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv6/ip6_output.c |   49 +++++++++++++++++++++++--------------------------
 1 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 4535b7a..f314ba4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -82,22 +82,6 @@ int ip6_local_out(struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(ip6_local_out);
 
-static int ip6_output_finish(struct sk_buff *skb)
-{
-	struct dst_entry *dst = skb_dst(skb);
-
-	if (dst->hh)
-		return neigh_hh_output(dst->hh, skb);
-	else if (dst->neighbour)
-		return dst->neighbour->output(skb);
-
-	IP6_INC_STATS_BH(dev_net(dst->dev),
-			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
-	kfree_skb(skb);
-	return -EINVAL;
-
-}
-
 /* dev_loopback_xmit for use with netfilter. */
 static int ip6_dev_loopback_xmit(struct sk_buff *newskb)
 {
@@ -111,8 +95,7 @@ static int ip6_dev_loopback_xmit(struct sk_buff *newskb)
 	return 0;
 }
 
-
-static int ip6_output2(struct sk_buff *skb)
+static int ip6_finish_output2(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
@@ -150,8 +133,15 @@ static int ip6_output2(struct sk_buff *skb)
 				skb->len);
 	}
 
-	return NF_HOOK(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, skb->dev,
-		       ip6_output_finish);
+	if (dst->hh)
+		return neigh_hh_output(dst->hh, skb);
+	else if (dst->neighbour)
+		return dst->neighbour->output(skb);
+
+	IP6_INC_STATS_BH(dev_net(dst->dev),
+			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+	kfree_skb(skb);
+	return -EINVAL;
 }
 
 static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
@@ -162,21 +152,28 @@ static inline int ip6_skb_dst_mtu(struct sk_buff *skb)
 	       skb_dst(skb)->dev->mtu : dst_mtu(skb_dst(skb));
 }
 
+static int ip6_finish_output(struct sk_buff *skb)
+{
+	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
+				dst_allfrag(skb_dst(skb)))
+		return ip6_fragment(skb, ip6_finish_output2);
+	else
+		return ip6_finish_output2(skb);
+}
+
 int ip6_output(struct sk_buff *skb)
 {
+	struct net_device *dev = skb_dst(skb)->dev;
 	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
 	if (unlikely(idev->cnf.disable_ipv6)) {
-		IP6_INC_STATS(dev_net(skb_dst(skb)->dev), idev,
+		IP6_INC_STATS(dev_net(dev), idev,
 			      IPSTATS_MIB_OUTDISCARDS);
 		kfree_skb(skb);
 		return 0;
 	}
 
-	if ((skb->len > ip6_skb_dst_mtu(skb) && !skb_is_gso(skb)) ||
-				dst_allfrag(skb_dst(skb)))
-		return ip6_fragment(skb, ip6_output2);
-	else
-		return ip6_output2(skb);
+	return NF_HOOK(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, dev,
+		       ip6_finish_output);
 }
 
 /*
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation
  2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation Jan Engelhardt
@ 2010-03-31 10:38 ` Jan Engelhardt
  2010-04-01  8:34   ` David Miller
  2010-03-31 10:38 ` [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Similar to how IPv4's ip_output.c works, have ip6_output also check
the IPSKB_REROUTED flag. It will be set from xt_TEE for cloned packets
since Xtables can currently only deal with a single packet in flight
at a time.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv6/ip6_output.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f314ba4..7e10f62 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -172,8 +172,9 @@ int ip6_output(struct sk_buff *skb)
 		return 0;
 	}
 
-	return NF_HOOK(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, dev,
-		       ip6_finish_output);
+	return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, skb, NULL, dev,
+			    ip6_finish_output,
+			    !(IP6CB(skb)->flags & IPSKB_REROUTED));
 }
 
 /*
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE
  2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation Jan Engelhardt
@ 2010-03-31 10:38 ` Jan Engelhardt
  2010-04-01 10:34   ` Patrick McHardy
  2010-03-31 10:38 ` [PATCH 4/5] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
  4 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/Kbuild   |    1 +
 include/linux/netfilter/xt_TEE.h |    8 +
 net/ipv4/ip_output.c             |    1 +
 net/ipv6/ip6_output.c            |    1 +
 net/netfilter/Kconfig            |    7 +
 net/netfilter/Makefile           |    1 +
 net/netfilter/xt_TEE.c           |  272 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 291 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index a5a63e4..48767cd 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -16,6 +16,7 @@ header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
 header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
+header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
diff --git a/include/linux/netfilter/xt_TEE.h b/include/linux/netfilter/xt_TEE.h
new file mode 100644
index 0000000..83fa768
--- /dev/null
+++ b/include/linux/netfilter/xt_TEE.h
@@ -0,0 +1,8 @@
+#ifndef _XT_TEE_TARGET_H
+#define _XT_TEE_TARGET_H
+
+struct xt_tee_tginfo {
+	union nf_inet_addr gw;
+};
+
+#endif /* _XT_TEE_TARGET_H */
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f09135e..0abfdde 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,6 +309,7 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7e10f62..307d8bf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,6 +176,7 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IPSKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8055786..673a6c8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -502,6 +502,13 @@ config NETFILTER_XT_TARGET_RATEEST
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_TARGET_TEE
+	tristate '"TEE" - packet cloning to alternate destiantion'
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds a "TEE" target with which a packet can be cloned and
+	this clone be rerouted to another nexthop.
+
 config NETFILTER_XT_TARGET_TPROXY
 	tristate '"TPROXY" target support (EXPERIMENTAL)'
 	depends on EXPERIMENTAL
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index cd31afe..14e3a8f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
new file mode 100644
index 0000000..96dd746
--- /dev/null
+++ b/net/netfilter/xt_TEE.c
@@ -0,0 +1,272 @@
+/*
+ *	"TEE" target extension for Xtables
+ *	Copyright Â© Sebastian ClaÃŸen <sebastian.classen [at] freenet de>, 2007
+ *	Jan Engelhardt <jengelh [at] medozas de>, 2007 - 2010
+ *
+ *	based on ipt_ROUTE.c from CÃ©dric de Launois
+ *	<delaunois@info.ucl.be>
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	version 2 or later, as published by the Free Software Foundation.
+ */
+#include <linux/ip.h>
+#include <linux/module.h>
+#include <linux/route.h>
+#include <linux/skbuff.h>
+#include <net/checksum.h>
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/route.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_TEE.h>
+
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+#	define WITH_CONNTRACK 1
+#	include <net/netfilter/nf_conntrack.h>
+static struct nf_conn tee_track;
+#endif
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#endif
+
+static const union nf_inet_addr tee_zero_address;
+
+/*
+ * Try to route the packet according to the routing keys specified in
+ * route_info. Keys are :
+ *  - ifindex :
+ *      0 if no oif preferred,
+ *      otherwise set to the index of the desired oif
+ *  - route_info->gateway :
+ *      0 if no gateway specified,
+ *      otherwise set to the next host to which the pkt must be routed
+ * If success, skb->dev is the output device to which the packet must
+ * be sent and skb->dst is not NULL
+ *
+ * RETURN: false - if an error occured
+ *         true  - if the packet was succesfully routed to the
+ *                 destination desired
+ */
+static bool
+tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct rtable *rt;
+	struct flowi fl;
+	int err;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.iif  = skb->skb_iif;
+	fl.mark = skb->mark;
+	fl.nl_u.ip4_u.daddr = info->gw.ip;
+	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
+	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
+
+	/* Trying to route the packet using the standard routing table. */
+	err = ip_route_output_key(dev_net(skb->dev), &rt, &fl);
+	if (err != 0)
+		return false;
+
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, &rt->u.dst);
+	skb->dev      = rt->u.dst.dev;
+	skb->protocol = htons(ETH_P_IP);
+	IPCB(skb)->flags |= IPSKB_REROUTED;
+	return true;
+}
+
+/*
+ * To detect and deter routed packet loopback when using the --tee option, we
+ * take a page out of the raw.patch book: on the copied skb, we set up a fake
+ * ->nfct entry, pointing to the local &route_tee_track. We skip routing
+ * packets when we see they already have that ->nfct.
+ */
+static unsigned int
+tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+	struct iphdr *iph;
+
+#ifdef WITH_CONNTRACK
+	if (skb->nfct == &tee_track.ct_general)
+		/*
+		 * Loopback - a packet we already routed, is to be
+		 * routed another time. Avoid that, now.
+		 */
+		return NF_DROP;
+#endif
+	/*
+	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
+	 * the original skb, which should continue on its way as if nothing has
+	 * happened. The copy should be independently delivered to the TEE
+	 * --gateway.
+	 */
+	skb = skb_copy(skb, GFP_ATOMIC);
+	if (skb == NULL)
+		return XT_CONTINUE;
+	/*
+	 * If we are in PREROUTING/INPUT, the checksum must be recalculated
+	 * since the length could have changed as a result of defragmentation.
+	 *
+	 * We also decrease the TTL to mitigate potential TEE loops
+	 * between two hosts.
+	 *
+	 * Set %IP_DF so that the original source is notified of a potentially
+	 * decreased MTU on the clone route. IPv6 does this too.
+	 */
+	iph = ip_hdr(skb);
+	iph->frag_off |= htons(IP_DF);
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN)
+		--iph->ttl;
+	ip_send_check(iph);
+
+#ifdef WITH_CONNTRACK
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &tee_track.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	/*
+	 * Xtables is not reentrant currently, so a choice has to be made:
+	 * 1. return absolute verdict for the original and let the cloned
+	 *    packet travel through the chains
+	 * 2. let the original continue travelling and not pass the clone
+	 *    to Xtables.
+	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
+	 * Because iph->check is already correct and we don't pass it to
+	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
+	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
+	 * invoke POSTROUTING on the cloned packet.
+	 */
+	IPCB(skb)->flags |= IPSKB_REROUTED;
+	if (tee_tg_route4(skb, info))
+		ip_output(skb);
+
+	return XT_CONTINUE;
+}
+
+#ifdef WITH_IPV6
+static bool
+tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	struct dst_entry *dst;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.iif  = skb->skb_iif;
+	fl.mark = skb->mark;
+	fl.nl_u.ip6_u.daddr = info->gw.in6;
+	fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
+				  (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
+
+	dst = ip6_route_output(dev_net(skb->dev), NULL, &fl);
+	if (dst == NULL)
+		return false;
+
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, dst);
+	skb->dev      = dst->dev;
+	skb->protocol = htons(ETH_P_IPV6);
+	IP6CB(skb)->flags |= IPSKB_REROUTED;
+	return true;
+}
+
+static unsigned int
+tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+#ifdef WITH_CONNTRACK
+	if (skb->nfct == &tee_track.ct_general)
+		return NF_DROP;
+#endif
+	if ((skb = skb_copy(skb, GFP_ATOMIC)) == NULL)
+		return XT_CONTINUE;
+
+#ifdef WITH_CONNTRACK
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &tee_track.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN) {
+		struct ipv6hdr *iph = ipv6_hdr(skb);
+		--iph->hop_limit;
+	}
+	IP6CB(skb)->flags |= IPSKB_REROUTED;
+	if (tee_tg_route6(skb, info))
+		ip6_output(skb);
+
+	return XT_CONTINUE;
+}
+#endif /* WITH_IPV6 */
+
+static int tee_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+	/* 0.0.0.0 and :: not allowed */
+	return (memcmp(&info->gw, &tee_zero_address,
+	       sizeof(tee_zero_address)) == 0) ? -EINVAL : 0;
+}
+
+static struct xt_target tee_tg_reg[] __read_mostly = {
+	{
+		.name       = "TEE",
+		.revision   = 0,
+		.family     = NFPROTO_IPV4,
+		.target     = tee_tg4,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#ifdef WITH_IPV6
+	{
+		.name       = "TEE",
+		.revision   = 0,
+		.family     = NFPROTO_IPV6,
+		.target     = tee_tg6,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init tee_tg_init(void)
+{
+#ifdef WITH_CONNTRACK
+	/*
+	 * Set up fake conntrack (stolen from raw.patch):
+	 * - to never be deleted, not in any hashes
+	 */
+	atomic_set(&tee_track.ct_general.use, 1);
+
+	/* - and look it like as a confirmed connection */
+	set_bit(IPS_CONFIRMED_BIT, &tee_track.status);
+
+	/* Initialize fake conntrack so that NAT will skip it */
+	tee_track.status |= IPS_NAT_DONE_MASK;
+#endif
+	return xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+static void __exit tee_tg_exit(void)
+{
+	xt_unregister_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+module_init(tee_tg_init);
+module_exit(tee_tg_exit);
+MODULE_AUTHOR("Sebastian ClaÃŸen <sebastian.classen@freenet.ag>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_DESCRIPTION("Xtables: Reroute packet copy");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_TEE");
+MODULE_ALIAS("ip6t_TEE");
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/5] netfilter: xtables2: make ip_tables reentrant
  2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
                   ` (2 preceding siblings ...)
  2010-03-31 10:38 ` [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-03-31 10:38 ` Jan Engelhardt
  2010-03-31 10:38 ` [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
  4 siblings, 0 replies; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser. The setting is
configurable at runtime through sysfs and will take effect when a
table is replaced by a new one.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/x_tables.h |    7 +++
 net/ipv4/netfilter/arp_tables.c    |    6 ++-
 net/ipv4/netfilter/ip_tables.c     |   65 ++++++++++++++++--------------
 net/ipv6/netfilter/ip6_tables.c    |   56 ++++++++++---------------
 net/netfilter/x_tables.c           |   79 ++++++++++++++++++++++++++++++++++++
 5 files changed, 147 insertions(+), 66 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 1a65d45..62cc5ca 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -401,6 +401,13 @@ struct xt_table_info {
 	unsigned int hook_entry[NF_INET_NUMHOOKS];
 	unsigned int underflow[NF_INET_NUMHOOKS];
 
+	/*
+	 * Number of user chains. Since tables cannot have loops, at most
+	 * @stacksize jumps (number of user chains) can possibly be made.
+	 */
+	unsigned int stacksize;
+	unsigned int *stackptr;
+	void ***jumpstack;
 	/* ipt_entry tables: one per CPU */
 	/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
 	void *entries[1];
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e8e363d..07a6990 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
+		if (strcmp(arpt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 18c5b15..70900ec 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb,
 	     const struct net_device *out,
 	     struct xt_table *table)
 {
-#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct iphdr *ip;
 	bool hotdrop = false;
@@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb,
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ipt_entry *e, *back;
+	struct ipt_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb,
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
+	pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n",
+		 table->name, hook, origptr,
+		 get_entry(table_base, private->underflow[hook]));
 
 	do {
 		const struct ipt_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip_packet_match(ip, indev, outdev,
 		    &e->ip, mtpar.fragoff)) {
  no_match:
@@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0) {
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+					pr_devel("Underflow (this is normal) "
+						 "to %p\n", e);
+				} else {
+					e = jumpstack[--*stackptr];
+					pr_devel("Pulled %p out from pos %u\n",
+						 e, *stackptr);
+					e = ipt_next_entry(e);
+				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ipt_entry *next = ipt_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
+				pr_devel("Pushed %p into pos %u\n",
+					 e, *stackptr - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb,
 		tgpar.targinfo = t->data;
 
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		/* Target might have changed stuff. */
 		ip = ip_hdr(skb);
 		if (verdict == IPT_CONTINUE)
@@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 	xt_info_rdunlock_bh();
-
+	pr_devel("Exiting %s; resetting sp from %u to %u\n",
+		 __func__, *stackptr, origptr);
+	*stackptr = origptr;
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
 #else
@@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ipt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f2b815e..2a2770b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb,
 	      const struct net_device *out,
 	      struct xt_table *table)
 {
-#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	bool hotdrop = false;
 	/* Initializing verdict to NF_DROP keeps gcc happy. */
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ip6t_entry *e, *back;
+	struct ip6t_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb,
 
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
-
 	do {
 		const struct ip6t_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &mtpar.thoff, &mtpar.fragoff, &hotdrop)) {
  no_match:
@@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0)
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+				else
+					e = ip6t_next_entry(jumpstack[--*stackptr]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ip6t_entry *next = ip6t_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb,
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		if (verdict == IP6T_CONTINUE)
 			e = ip6t_next_entry(e);
 		else
@@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 
-#ifdef CONFIG_NETFILTER_DEBUG
-	tb_comefrom = NETFILTER_LINK_POISON;
-#endif
 	xt_info_rdunlock_bh();
+	*stackptr = origptr;
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
@@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ip6t_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8e23d8f..2010b56 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -62,6 +62,11 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
+/* Allow this many total (re)entries. */
+static unsigned int xt_jumpstack_multiplier = 2;
+module_param_named(jumpstack_multiplier, xt_jumpstack_multiplier,
+	uint, S_IRUGO | S_IWUSR);
+
 /* Registration hooks for targets. */
 int
 xt_register_target(struct xt_target *target)
@@ -680,6 +685,26 @@ void xt_free_table_info(struct xt_table_info *info)
 		else
 			vfree(info->entries[cpu]);
 	}
+
+	if (info->jumpstack != NULL) {
+		if (sizeof(void *) * info->stacksize > PAGE_SIZE) {
+			for_each_possible_cpu(cpu)
+				vfree(info->jumpstack[cpu]);
+		} else {
+			for_each_possible_cpu(cpu)
+				kfree(info->jumpstack[cpu]);
+		}
+	}
+
+	if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->jumpstack);
+	else
+		kfree(info->jumpstack);
+	if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->stackptr);
+	else
+		kfree(info->stackptr);
+
 	kfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -724,6 +749,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
 
+static int xt_jumpstack_alloc(struct xt_table_info *i)
+{
+	unsigned int size;
+	int cpu;
+
+	size = sizeof(unsigned int) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->stackptr = vmalloc(size);
+	else
+		i->stackptr = kmalloc(size, GFP_KERNEL);
+	if (i->stackptr == NULL)
+		return -ENOMEM;
+	memset(i->stackptr, 0, size);
+
+	size = sizeof(void **) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->jumpstack = vmalloc(size);
+	else
+		i->jumpstack = kmalloc(size, GFP_KERNEL);
+	if (i->jumpstack == NULL)
+		return -ENOMEM;
+	memset(i->jumpstack, 0, size);
+
+	i->stacksize *= xt_jumpstack_multiplier;
+	size = sizeof(void *) * i->stacksize;
+	for_each_possible_cpu(cpu) {
+		if (size > PAGE_SIZE)
+			i->jumpstack[cpu] = vmalloc_node(size,
+				cpu_to_node(cpu));
+		else
+			i->jumpstack[cpu] = kmalloc_node(size,
+				GFP_KERNEL, cpu_to_node(cpu));
+		if (i->jumpstack[cpu] == NULL)
+			/*
+			 * Freeing will be done later on by the callers. The
+			 * chain is: xt_replace_table -> __do_replace ->
+			 * do_replace -> xt_free_table_info.
+			 */
+			return -ENOMEM;
+	}
+
+	return 0;
+}
 
 struct xt_table_info *
 xt_replace_table(struct xt_table *table,
@@ -732,6 +800,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	int ret;
 
 	/* Do the substitution. */
 	local_bh_disable();
@@ -746,6 +815,12 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
@@ -770,6 +845,10 @@ struct xt_table *xt_register_table(struct net *net,
 	struct xt_table_info *private;
 	struct xt_table *t, *table;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	/* Don't add one object to multiple lists. */
 	table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
 	if (!table) {
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
                   ` (3 preceding siblings ...)
  2010-03-31 10:38 ` [PATCH 4/5] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
@ 2010-03-31 10:38 ` Jan Engelhardt
  2010-04-01 10:37   ` Patrick McHardy
  4 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-03-31 10:38 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev

Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/ip_output.c   |    1 -
 net/ipv6/ip6_output.c  |    1 -
 net/netfilter/xt_TEE.c |   18 ++----------------
 3 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0abfdde..f09135e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,7 +309,6 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 307d8bf..7e10f62 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,7 +176,6 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 96dd746..70078f1 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -130,21 +130,8 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 	skb->nfctinfo = IP_CT_NEW;
 	nf_conntrack_get(skb->nfct);
 #endif
-	/*
-	 * Xtables is not reentrant currently, so a choice has to be made:
-	 * 1. return absolute verdict for the original and let the cloned
-	 *    packet travel through the chains
-	 * 2. let the original continue travelling and not pass the clone
-	 *    to Xtables.
-	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
-	 * Because iph->check is already correct and we don't pass it to
-	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
-	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
-	 * invoke POSTROUTING on the cloned packet.
-	 */
-	IPCB(skb)->flags |= IPSKB_REROUTED;
 	if (tee_tg_route4(skb, info))
-		ip_output(skb);
+		ip_local_out(skb);
 
 	return XT_CONTINUE;
 }
@@ -199,9 +186,8 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 		struct ipv6hdr *iph = ipv6_hdr(skb);
 		--iph->hop_limit;
 	}
-	IP6CB(skb)->flags |= IPSKB_REROUTED;
 	if (tee_tg_route6(skb, info))
-		ip6_output(skb);
+		ip6_local_out(skb);
 
 	return XT_CONTINUE;
 }
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation
  2010-03-31 10:38 ` [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation Jan Engelhardt
@ 2010-04-01  8:34   ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2010-04-01  8:34 UTC (permalink / raw)
  To: jengelh; +Cc: kaber, netfilter-devel, netdev

From: Jan Engelhardt <jengelh@medozas.de>
Date: Wed, 31 Mar 2010 12:38:50 +0200

> Similar to how IPv4's ip_output.c works, have ip6_output also check
> the IPSKB_REROUTED flag. It will be set from xt_TEE for cloned packets
> since Xtables can currently only deal with a single packet in flight
> at a time.
> 
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>

I defer to ipv6 experts as to whether this will cause trouble
or not.

If they are fine with it, feel free to add my:

Acked-by: David S. Miller <davem@davemloft.net>

and this can go in via the nf tree along with the other changes in
this set.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation
  2010-03-31 10:38 ` [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation Jan Engelhardt
@ 2010-04-01 10:23   ` Patrick McHardy
  0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 10:23 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> Patrick McHardy notes: "We used to invoke IPv4 POST_ROUTING after
> fragmentation as well just to defragment the packets in conntrack
> immediately afterwards, but that got changed during the
> netfilter-ipsec integration. Ideally IPv6 would behave like IPv4."
> 
> This patch makes it so. Sending an oversized frame (e.g. `ping6
> -s64000 -c1 ::1`) will now show up in POSTROUTING as a single skb
> rather than multiple ones.

Looks good to me. I'll wait until next week in case anyone
else has comments on this patch.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE
  2010-03-31 10:38 ` [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-04-01 10:34   ` Patrick McHardy
  2010-04-01 11:39     ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 10:34 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> +static bool
> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
> +{
> +	const struct iphdr *iph = ip_hdr(skb);
> +	struct rtable *rt;
> +	struct flowi fl;
> +	int err;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	fl.iif  = skb->skb_iif;

I'm not sure you really should set iif here. We usually (tunnels, REJECT
etc) packets generated locally as new packets.

> +	fl.mark = skb->mark;

The same applies to mark.

> +	fl.nl_u.ip4_u.daddr = info->gw.ip;
> +	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
> +	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
> +
> +	/* Trying to route the packet using the standard routing table. */
> +	err = ip_route_output_key(dev_net(skb->dev), &rt, &fl);
> +	if (err != 0)
> +		return false;
> +
> +	dst_release(skb_dst(skb));
> +	skb_dst_set(skb, &rt->u.dst);
> +	skb->dev      = rt->u.dst.dev;
> +	skb->protocol = htons(ETH_P_IP);
> +	IPCB(skb)->flags |= IPSKB_REROUTED;
> +	return true;
> +}
> +
> +/*
> + * To detect and deter routed packet loopback when using the --tee option, we
> + * take a page out of the raw.patch book: on the copied skb, we set up a fake
> + * ->nfct entry, pointing to the local &route_tee_track. We skip routing
> + * packets when we see they already have that ->nfct.

So without conntrack, people may create loops? If that's the case,
I'd suggest to simply forbid TEE'ing packets to loopback. That
doesn't seem to be very useful anyways.

> + */
> +static unsigned int
> +tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
> +{
> +	const struct xt_tee_tginfo *info = par->targinfo;
> +	struct iphdr *iph;
> +
> +#ifdef WITH_CONNTRACK
> +	if (skb->nfct == &tee_track.ct_general)
> +		/*
> +		 * Loopback - a packet we already routed, is to be
> +		 * routed another time. Avoid that, now.
> +		 */
> +		return NF_DROP;
> +#endif
> +	/*
> +	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
> +	 * the original skb, which should continue on its way as if nothing has
> +	 * happened. The copy should be independently delivered to the TEE
> +	 * --gateway.
> +	 */
> +	skb = skb_copy(skb, GFP_ATOMIC);
> +	if (skb == NULL)
> +		return XT_CONTINUE;
> +	/*
> +	 * If we are in PREROUTING/INPUT, the checksum must be recalculated
> +	 * since the length could have changed as a result of defragmentation.
> +	 *
> +	 * We also decrease the TTL to mitigate potential TEE loops
> +	 * between two hosts.
> +	 *
> +	 * Set %IP_DF so that the original source is notified of a potentially
> +	 * decreased MTU on the clone route. IPv6 does this too.
> +	 */
> +	iph = ip_hdr(skb);
> +	iph->frag_off |= htons(IP_DF);
> +	if (par->hooknum == NF_INET_PRE_ROUTING ||
> +	    par->hooknum == NF_INET_LOCAL_IN)
> +		--iph->ttl;
> +	ip_send_check(iph);

Shouldn't this only be done in PRE_ROUTING/INPUT as stated above?

> +
> +#ifdef WITH_CONNTRACK
> +	nf_conntrack_put(skb->nfct);
> +	skb->nfct     = &tee_track.ct_general;
> +	skb->nfctinfo = IP_CT_NEW;
> +	nf_conntrack_get(skb->nfct);
> +#endif
> +	/*
> +	 * Xtables is not reentrant currently, so a choice has to be made:
> +	 * 1. return absolute verdict for the original and let the cloned
> +	 *    packet travel through the chains
> +	 * 2. let the original continue travelling and not pass the clone
> +	 *    to Xtables.
> +	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
> +	 * Because iph->check is already correct and we don't pass it to
> +	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
> +	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
> +	 * invoke POSTROUTING on the cloned packet.
> +	 */
> +	IPCB(skb)->flags |= IPSKB_REROUTED;
> +	if (tee_tg_route4(skb, info))
> +		ip_output(skb);
> +
> +	return XT_CONTINUE;
> +}
> +

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-03-31 10:38 ` [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
@ 2010-04-01 10:37   ` Patrick McHardy
  2010-04-01 11:03     ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 10:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> Since Xtables is now reentrant/nestable, the cloned packet can also go
> through Xtables and be subject to rules itself.

That sounds dangerous if conntrack isn't used to prevent loops.
Is that really useful? For filtering, you can simply apply the
rules before deciding to TEE the packet.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 10:37   ` Patrick McHardy
@ 2010-04-01 11:03     ` Jan Engelhardt
  2010-04-01 11:09       ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-01 11:03 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

On Thursday 2010-04-01 12:37, Patrick McHardy wrote:

>Jan Engelhardt wrote:
>> Since Xtables is now reentrant/nestable, the cloned packet can also go
>> through Xtables and be subject to rules itself.
>
>That sounds dangerous if conntrack isn't used to prevent loops.

Conntrack loops are prevented by using a dummy conntrack, just as 
NOTRACK does.

>Is that really useful? For filtering, you can simply apply the
>rules before deciding to TEE the packet.

I can think of a handful of applications:
 - CLASSIFY

 - When the cloned packets gets XFRMed or tunneled, its status switches 
   from "special" to "plain". Doing policy routing on them does not seem 
   so far-fetched.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 11:03     ` Jan Engelhardt
@ 2010-04-01 11:09       ` Patrick McHardy
  2010-04-01 13:15         ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 11:09 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 12:37, Patrick McHardy wrote:
> 
>> Jan Engelhardt wrote:
>>> Since Xtables is now reentrant/nestable, the cloned packet can also go
>>> through Xtables and be subject to rules itself.
>> That sounds dangerous if conntrack isn't used to prevent loops.
> 
> Conntrack loops are prevented by using a dummy conntrack, just as 
> NOTRACK does.

My question was about the case without conntrack.

>> Is that really useful? For filtering, you can simply apply the
>> rules before deciding to TEE the packet.
> 
> I can think of a handful of applications:
>  - CLASSIFY

Good point, you should probably reset a couple of skb members
after the skb_copy().

>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>    from "special" to "plain". Doing policy routing on them does not seem 
>    so far-fetched.

Fair enough, provided we can also handle loops when conntrack
isn't used.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE
  2010-04-01 10:34   ` Patrick McHardy
@ 2010-04-01 11:39     ` Jan Engelhardt
  2010-04-01 11:54       ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-01 11:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Thursday 2010-04-01 12:34, Patrick McHardy wrote:
>> +static bool
>> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
>> +{
>> +	const struct iphdr *iph = ip_hdr(skb);
>> +	struct rtable *rt;
>> +	struct flowi fl;
>> +	int err;
>> +
>> +	memset(&fl, 0, sizeof(fl));
>> +	fl.iif  = skb->skb_iif;
>
>I'm not sure you really should set iif here. We usually (tunnels, REJECT
>etc) packets generated locally as new packets.
>> +	fl.mark = skb->mark;
>
>The same applies to mark.

If you use TEE in PREROUTING or INPUT, teeing acts more like FORWARD than
OUTPUT, though. All TEE does is lookup a route to a new fl.dst, but it keeps
the original src address in fl.src, so if somebody has some source-based policy
routing, it could suddenly behave different. What do you think?

>> +/*
>> + * To detect and deter routed packet loopback when using the --tee option, we
>> + * take a page out of the raw.patch book: on the copied skb, we set up a fake
>> + * ->nfct entry, pointing to the local &route_tee_track. We skip routing
>> + * packets when we see they already have that ->nfct.
>
>So without conntrack, people may create loops? If that's the case,
>I'd suggest to simply forbid TEE'ing packets to loopback. That
>doesn't seem to be very useful anyways.

>> +#ifdef WITH_CONNTRACK
>> +	if (skb->nfct == &tee_track.ct_general)
>> +		/*
>> +		 * Loopback - a packet we already routed, is to be
>> +		 * routed another time. Avoid that, now.
>> +		 */
	printk("loopback - dropped\n");
>> +		return NF_DROP;
>> +#endif

We are looking at a historic piece of code - and comments, which
traces back to when xt_NOTRACK was still in POM.

{
    →   /* Previously seen (loopback)? Ignore. */
    →   if ((*pskb)->nfct != NULL)
    →       →   return IPT_CONTINUE;

    →   /* Attach fake conntrack entry.·
    →      If there is a real ct entry correspondig to this packet,·
    →      it'll hang aroun till timing out. We don't deal with it
    →      for performance reasons. JK */
    →   (*pskb)->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW];
    →   nf_conntrack_get((*pskb)->nfct);

    →   return IPT_CONTINUE;
}

Let's look at the condition "skb->nfct == &tee_track.ct_general" in detail. An
skb can only already have tee_track when it has been teed.

The teed packet however never traversed Xtables at all. Of course that changes
once the nesting patch is applied. But was someone really thinking of this, 6
years ago?

That actually made me wonder and dig in history, and it turns out that
ipt_ROUTE allowed the packet to be fed back into netif_rx (commit
bee4e80167e3d024bdb80f400f4ecc8de47cfb03 in pom-ng.git), which would
explain all the loopback stuff. Since modern xt_TEE does not do
that evil thing, the comment is a walnut-hard remainder of past times.

I shall remove it now that it has been spotted.

>> +	/*
>> +	 * If we are in PREROUTING/INPUT, the checksum must be recalculated
>> +	 * since the length could have changed as a result of defragmentation.
>> +	 *
>> +	 * We also decrease the TTL to mitigate potential TEE loops
>> +	 * between two hosts.
>> +	 *
>> +	 * Set %IP_DF so that the original source is notified of a potentially
>> +	 * decreased MTU on the clone route. IPv6 does this too.
>> +	 */
>> +	iph = ip_hdr(skb);
>> +	iph->frag_off |= htons(IP_DF);
>> +	if (par->hooknum == NF_INET_PRE_ROUTING ||
>> +	    par->hooknum == NF_INET_LOCAL_IN)
>> +		--iph->ttl;
>> +	ip_send_check(iph);
>
>Shouldn't this only be done in PRE_ROUTING/INPUT as stated above?

The csum needs to be recomputed due to the addition of the DF flag.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE
  2010-04-01 11:39     ` Jan Engelhardt
@ 2010-04-01 11:54       ` Patrick McHardy
  0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 11:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 12:34, Patrick McHardy wrote:
>>> +static bool
>>> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
>>> +{
>>> +	const struct iphdr *iph = ip_hdr(skb);
>>> +	struct rtable *rt;
>>> +	struct flowi fl;
>>> +	int err;
>>> +
>>> +	memset(&fl, 0, sizeof(fl));
>>> +	fl.iif  = skb->skb_iif;
>> I'm not sure you really should set iif here. We usually (tunnels, REJECT
>> etc) packets generated locally as new packets.
>>> +	fl.mark = skb->mark;
>> The same applies to mark.
> 
> If you use TEE in PREROUTING or INPUT, teeing acts more like FORWARD than
> OUTPUT, though. All TEE does is lookup a route to a new fl.dst, but it keeps
> the original src address in fl.src, so if somebody has some source-based policy
> routing, it could suddenly behave different. What do you think?

That might make it unnessarily complicated to use src-based routing
when using TEE. I guess you'd usually have a host for logging or IDS
somewhere on a private network and TEE packets there. So specifying
oif and gateway seems most useful to me.

>>> +/*
>>> + * To detect and deter routed packet loopback when using the --tee option, we
>>> + * take a page out of the raw.patch book: on the copied skb, we set up a fake
>>> + * ->nfct entry, pointing to the local &route_tee_track. We skip routing
>>> + * packets when we see they already have that ->nfct.
>> So without conntrack, people may create loops? If that's the case,
>> I'd suggest to simply forbid TEE'ing packets to loopback. That
>> doesn't seem to be very useful anyways.
> 
>>> +#ifdef WITH_CONNTRACK
>>> +	if (skb->nfct == &tee_track.ct_general)
>>> +		/*
>>> +		 * Loopback - a packet we already routed, is to be
>>> +		 * routed another time. Avoid that, now.
>>> +		 */
> 	printk("loopback - dropped\n");
>>> +		return NF_DROP;
>>> +#endif
> 
> We are looking at a historic piece of code - and comments, which
> traces back to when xt_NOTRACK was still in POM.
> 
> {
>     →   /* Previously seen (loopback)? Ignore. */
>     →   if ((*pskb)->nfct != NULL)
>     →       →   return IPT_CONTINUE;
> 
>     →   /* Attach fake conntrack entry.·
>     →      If there is a real ct entry correspondig to this packet,·
>     →      it'll hang aroun till timing out. We don't deal with it
>     →      for performance reasons. JK */
>     →   (*pskb)->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW];
>     →   nf_conntrack_get((*pskb)->nfct);
> 
>     →   return IPT_CONTINUE;
> }
> 
> Let's look at the condition "skb->nfct == &tee_track.ct_general" in detail. An
> skb can only already have tee_track when it has been teed.
> 
> The teed packet however never traversed Xtables at all. Of course that changes
> once the nesting patch is applied. But was someone really thinking of this, 6
> years ago?
> 
> That actually made me wonder and dig in history, and it turns out that
> ipt_ROUTE allowed the packet to be fed back into netif_rx (commit
> bee4e80167e3d024bdb80f400f4ecc8de47cfb03 in pom-ng.git), which would
> explain all the loopback stuff. Since modern xt_TEE does not do
> that evil thing, the comment is a walnut-hard remainder of past times.
> 
> I shall remove it now that it has been spotted.

Yeah, but currently it does allow packets to be looped back. These
packets will also go through the netfilter hooks again.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 11:09       ` Patrick McHardy
@ 2010-04-01 13:15         ` Jan Engelhardt
  2010-04-01 13:22           ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-01 13:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

On Thursday 2010-04-01 13:09, Patrick McHardy wrote:

>> Conntrack loops are prevented by using a dummy conntrack, just as 
>> NOTRACK does.
>[...]
>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>    from "special" to "plain". Doing policy routing on them does not seem 
>>    so far-fetched.
>
>My question was about the case without conntrack.

Hm. Do you have any suggestion in countering a case whereby a user
does -I OUTPUT -j TEE without conntrack?

Perhaps making nesting a feature that requires conntrack, such that the 
non-CT case can't loop?

>> I can think of a handful of applications:
>>  - CLASSIFY
>
>Good point, you should probably reset a couple of skb members
>after the skb_copy().

I take it you mean

 nf_reset(skb)
 skb->mark = 0;
 skb_init_secmark(nskb);

Or should we be using skb_alloc and copying the data portion over, like 
ipt_REJECT does since v2.6.24-2931-g9ba99b0?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:15         ` Jan Engelhardt
@ 2010-04-01 13:22           ` Patrick McHardy
  2010-04-01 13:44             ` Jan Engelhardt
  2010-04-06 16:14             ` Jan Engelhardt
  0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 13:22 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 13:09, Patrick McHardy wrote:
> 
>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>> NOTRACK does.
>> [...]
>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>    so far-fetched.
>> My question was about the case without conntrack.
> 
> Hm. Do you have any suggestion in countering a case whereby a user
> does -I OUTPUT -j TEE without conntrack?
> 
> Perhaps making nesting a feature that requires conntrack, such that the 
> non-CT case can't loop?

If we drop the reentrancy thing, what should work is to prevent
using loopback as output device and using something similar to
the recursion counters tunnel devices used to have.

>>> I can think of a handful of applications:
>>>  - CLASSIFY
>> Good point, you should probably reset a couple of skb members
>> after the skb_copy().
> 
> I take it you mean
> 
>  nf_reset(skb)
>  skb->mark = 0;
>  skb_init_secmark(nskb);

Yes, basically. Although I believe the selinux people would be
happier if you kept the original secmark for the copied packets :)

> Or should we be using skb_alloc and copying the data portion over, like 
> ipt_REJECT does since v2.6.24-2931-g9ba99b0?

I guess pskb_copy() would be most optimal since we can modify
the header, but the non-linear area could be shared

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:22           ` Patrick McHardy
@ 2010-04-01 13:44             ` Jan Engelhardt
  2010-04-01 13:48               ` Patrick McHardy
  2010-04-06 16:14             ` Jan Engelhardt
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-01 13:44 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>> NOTRACK does.
>>> [...]
>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>    so far-fetched.
>>> My question was about the case without conntrack.
>> 
>> Hm. Do you have any suggestion in countering a case whereby a user
>> does -I OUTPUT -j TEE without conntrack?
>> 
>> Perhaps making nesting a feature that requires conntrack, such that the 
>> non-CT case can't loop?
>
>If we drop the reentrancy thing, what should work is to prevent
>using loopback as output device and using something similar to
>the recursion counters tunnel devices used to have.

Nah. I'm going to pick a bit from struct skbuff to indicate the
packet was teed so as to avoid that loop.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:44             ` Jan Engelhardt
@ 2010-04-01 13:48               ` Patrick McHardy
  2010-04-01 13:59                 ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 13:48 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>> NOTRACK does.
>>>> [...]
>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>    so far-fetched.
>>>> My question was about the case without conntrack.
>>> Hm. Do you have any suggestion in countering a case whereby a user
>>> does -I OUTPUT -j TEE without conntrack?
>>>
>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>> non-CT case can't loop?
>> If we drop the reentrancy thing, what should work is to prevent
>> using loopback as output device and using something similar to
>> the recursion counters tunnel devices used to have.
> 
> Nah. I'm going to pick a bit from struct skbuff to indicate the
> packet was teed so as to avoid that loop.

That's a bad idea, we shouldn't be adding new skb members for something
as peripheral as this module.

What's wrong with adding a reentrancy counter?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:48               ` Patrick McHardy
@ 2010-04-01 13:59                 ` Jan Engelhardt
  2010-04-01 14:03                   ` Patrick McHardy
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-01 13:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>> NOTRACK does.
>>>>> [...]
>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>    so far-fetched.
>>>>> My question was about the case without conntrack.
>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>> does -I OUTPUT -j TEE without conntrack?
>>>>
>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>> non-CT case can't loop?
>>> If we drop the reentrancy thing, what should work is to prevent
>>> using loopback as output device and using something similar to
>>> the recursion counters tunnel devices used to have.
>> 
>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>> packet was teed so as to avoid that loop.
>
>That's a bad idea, we shouldn't be adding new skb members for something
>as peripheral as this module.

I would have done this, which does not add a member:

	IP6CB(skb)->flags |= IPSKB_CLONED;

>What's wrong with adding a reentrancy counter?

Sounds like a plan.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:59                 ` Jan Engelhardt
@ 2010-04-01 14:03                   ` Patrick McHardy
  2010-04-02 18:15                     ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-01 14:03 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:48, Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>>>>> Conntrack loops are prevented by using a dummy conntrack, just as 
>>>>>>> NOTRACK does.
>>>>>> [...]
>>>>>>>  - When the cloned packets gets XFRMed or tunneled, its status switches 
>>>>>>>    from "special" to "plain". Doing policy routing on them does not seem 
>>>>>>>    so far-fetched.
>>>>>> My question was about the case without conntrack.
>>>>> Hm. Do you have any suggestion in countering a case whereby a user
>>>>> does -I OUTPUT -j TEE without conntrack?
>>>>>
>>>>> Perhaps making nesting a feature that requires conntrack, such that the 
>>>>> non-CT case can't loop?
>>>> If we drop the reentrancy thing, what should work is to prevent
>>>> using loopback as output device and using something similar to
>>>> the recursion counters tunnel devices used to have.
>>> Nah. I'm going to pick a bit from struct skbuff to indicate the
>>> packet was teed so as to avoid that loop.
>> That's a bad idea, we shouldn't be adding new skb members for something
>> as peripheral as this module.
> 
> I would have done this, which does not add a member:
> 
> 	IP6CB(skb)->flags |= IPSKB_CLONED;

This doesn't work, the CB is not preserved across layers
(which f.i. matters if you allow loopback destinations).
Its also not preserved for clones.

>> What's wrong with adding a reentrancy counter?
> 
> Sounds like a plan.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 14:03                   ` Patrick McHardy
@ 2010-04-02 18:15                     ` Jan Engelhardt
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-02 18:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Thursday 2010-04-01 16:03, Patrick McHardy wrote:
>>>>[detecting teed packets getting teed again by means of
>>>> iptables -A OUTPUT -j TEE]
>>>
>>> What's wrong with adding a reentrancy counter?
>> 
>> Sounds like a plan.

Should we be using a percpu variable, or is a simplistic
array ok too?

static bool tee_active;

target(...)
{
	if (tee_active[smp_processor_id()])
		return XT_CONTINUE;
	...
	if (tee_tg4_route(...)) {
		tee_active[cpu] = true;
		ip_local_out(skb);
		tee_active[cpu] = false;
	}
}



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-01 13:22           ` Patrick McHardy
  2010-04-01 13:44             ` Jan Engelhardt
@ 2010-04-06 16:14             ` Jan Engelhardt
  2010-04-06 16:37               ` Patrick McHardy
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-06 16:14 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>> Or should we be using skb_alloc and copying the data portion over, like 
>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>
>I guess pskb_copy() would be most optimal since we can modify
>the header, but the non-linear area could be shared

Trying to improve my understanding: when doing skb_pull,
does the skb->head that is relevant for pskb_copy move?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-06 16:14             ` Jan Engelhardt
@ 2010-04-06 16:37               ` Patrick McHardy
  2010-04-07 13:26                 ` Jan Engelhardt
  0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2010-04-06 16:37 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, netdev

Jan Engelhardt wrote:
> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>> Or should we be using skb_alloc and copying the data portion over, like 
>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>> I guess pskb_copy() would be most optimal since we can modify
>> the header, but the non-linear area could be shared
> 
> Trying to improve my understanding: when doing skb_pull,
> does the skb->head that is relevant for pskb_copy move?

skb_pull() only changes skb->data.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-06 16:37               ` Patrick McHardy
@ 2010-04-07 13:26                 ` Jan Engelhardt
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Engelhardt @ 2010-04-07 13:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


On Tuesday 2010-04-06 18:37, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> On Thursday 2010-04-01 15:22, Patrick McHardy wrote:
>>>> Or should we be using skb_alloc and copying the data portion over, like 
>>>> ipt_REJECT does since v2.6.24-2931-g9ba99b0?
>>> I guess pskb_copy() would be most optimal since we can modify
>>> the header, but the non-linear area could be shared
>> 
>> Trying to improve my understanding: when doing skb_pull,
>> does the skb->head that is relevant for pskb_copy move?
>
>skb_pull() only changes skb->data.

But how does it interact, with, say, xt_TCPMSS which modifies not
only the L3 header, but also the L4 header?

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-04-07 13:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-31 10:38 nf-next: TEE and nesting Jan Engelhardt
2010-03-31 10:38 ` [PATCH 1/5] netfilter: ipv6: move POSTROUTING invocation before fragmentation Jan Engelhardt
2010-04-01 10:23   ` Patrick McHardy
2010-03-31 10:38 ` [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation Jan Engelhardt
2010-04-01  8:34   ` David Miller
2010-03-31 10:38 ` [PATCH 3/5] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
2010-04-01 10:34   ` Patrick McHardy
2010-04-01 11:39     ` Jan Engelhardt
2010-04-01 11:54       ` Patrick McHardy
2010-03-31 10:38 ` [PATCH 4/5] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
2010-03-31 10:38 ` [PATCH 5/5] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
2010-04-01 10:37   ` Patrick McHardy
2010-04-01 11:03     ` Jan Engelhardt
2010-04-01 11:09       ` Patrick McHardy
2010-04-01 13:15         ` Jan Engelhardt
2010-04-01 13:22           ` Patrick McHardy
2010-04-01 13:44             ` Jan Engelhardt
2010-04-01 13:48               ` Patrick McHardy
2010-04-01 13:59                 ` Jan Engelhardt
2010-04-01 14:03                   ` Patrick McHardy
2010-04-02 18:15                     ` Jan Engelhardt
2010-04-06 16:14             ` Jan Engelhardt
2010-04-06 16:37               ` Patrick McHardy
2010-04-07 13:26                 ` Jan Engelhardt
  -- strict thread matches above, loose matches on Subject: below --
2010-03-31 10:31 nf-next: TEE and nesting Jan Engelhardt
2010-03-31 10:31 ` [PATCH 2/5] net: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation Jan Engelhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).