nf-next: TEE only

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* nf-next: TEE only
@ 2010-04-13 23:21 Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel


Hi,


in this round:
- use IP6SKB_REROUTED in v6 code
- pick_net function: use skb->dev or skb->dst->dev when available
  (or completely fall back to init_net in case there's something
  going on)

----

The following changes since commit 9c6eb28aca52d562f3ffbaebaa56385df9972a43:
  Jan Engelhardt (1):
        netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation

are available in the git repository at:

  git://dev.medozas.de/linux master

Jan Engelhardt (4):
      netfilter: xtables: inclusion of xt_TEE
      netfilter: xtables2: make ip_tables reentrant
      netfilter: xt_TEE: have cloned packet travel through Xtables too
      netfilter: xtables: remove old comments about reentrancy

 include/linux/netfilter/Kbuild     |    1 +
 include/linux/netfilter/x_tables.h |    7 +
 include/linux/netfilter/xt_TEE.h   |    8 ++
 net/ipv4/netfilter/arp_tables.c    |    6 +-
 net/ipv4/netfilter/ip_tables.c     |   67 ++++++-----
 net/ipv4/netfilter/ipt_REJECT.c    |    3 -
 net/ipv6/netfilter/ip6_tables.c    |   58 ++++------
 net/ipv6/netfilter/ip6t_REJECT.c   |    3 -
 net/netfilter/Kconfig              |    7 +
 net/netfilter/Makefile             |    1 +
 net/netfilter/x_tables.c           |   77 ++++++++++++
 net/netfilter/xt_TEE.c             |  228 ++++++++++++++++++++++++++++++++++++
 12 files changed, 390 insertions(+), 76 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
@ 2010-04-13 23:21 ` Jan Engelhardt
  2010-04-14  5:52   ` Eric Dumazet
  2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/Kbuild   |    1 +
 include/linux/netfilter/xt_TEE.h |    8 ++
 net/ipv4/ip_output.c             |    1 +
 net/ipv6/ip6_output.c            |    1 +
 net/netfilter/Kconfig            |    7 +
 net/netfilter/Makefile           |    1 +
 net/netfilter/xt_TEE.c           |  232 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index a5a63e4..48767cd 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -16,6 +16,7 @@ header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
 header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
+header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
diff --git a/include/linux/netfilter/xt_TEE.h b/include/linux/netfilter/xt_TEE.h
new file mode 100644
index 0000000..83fa768
--- /dev/null
+++ b/include/linux/netfilter/xt_TEE.h
@@ -0,0 +1,8 @@
+#ifndef _XT_TEE_TARGET_H
+#define _XT_TEE_TARGET_H
+
+struct xt_tee_tginfo {
+	union nf_inet_addr gw;
+};
+
+#endif /* _XT_TEE_TARGET_H */
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f09135e..0abfdde 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,6 +309,7 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c10a38a..d09be7f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,6 +176,7 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8055786..673a6c8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -502,6 +502,13 @@ config NETFILTER_XT_TARGET_RATEEST
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_TARGET_TEE
+	tristate '"TEE" - packet cloning to alternate destiantion'
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds a "TEE" target with which a packet can be cloned and
+	this clone be rerouted to another nexthop.
+
 config NETFILTER_XT_TARGET_TPROXY
 	tristate '"TPROXY" target support (EXPERIMENTAL)'
 	depends on EXPERIMENTAL
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index cd31afe..14e3a8f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
new file mode 100644
index 0000000..96e3785
--- /dev/null
+++ b/net/netfilter/xt_TEE.c
@@ -0,0 +1,232 @@
+/*
+ *	"TEE" target extension for Xtables
+ *	Copyright Â© Sebastian ClaÃŸen, 2007
+ *	Jan Engelhardt, 2007-2010
+ *
+ *	based on ipt_ROUTE.c from CÃ©dric de Launois
+ *	<delaunois@info.ucl.be>
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	version 2 or later, as published by the Free Software Foundation.
+ */
+#include <linux/ip.h>
+#include <linux/module.h>
+#include <linux/route.h>
+#include <linux/skbuff.h>
+#include <net/checksum.h>
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/route.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_TEE.h>
+
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+#	define WITH_CONNTRACK 1
+#	include <net/netfilter/nf_conntrack.h>
+#endif
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#endif
+
+static const union nf_inet_addr tee_zero_address;
+
+static struct net *pick_net(struct sk_buff *skb)
+{
+	const struct net_device *dev;
+	const struct dst_entry *dst;
+
+	if (skb->dev != NULL)
+		return dev_net(skb->dev);
+	dst = skb_dst(skb);
+	if (dst != NULL && dst->dev != NULL)
+		return dev_net(dst->dev);
+	return &init_net;
+}
+
+static bool
+tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct rtable *rt;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.nl_u.ip4_u.daddr = info->gw.ip;
+	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
+	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
+
+	if (ip_route_output_key(pick_net(skb), &rt, &fl) != 0) {
+		kfree_skb(skb);
+		return false;
+	}
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, &rt->u.dst);
+	skb->dev      = rt->u.dst.dev;
+	skb->protocol = htons(ETH_P_IP);
+	return true;
+}
+
+static unsigned int
+tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+	struct iphdr *iph;
+
+	/*
+	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
+	 * the original skb, which should continue on its way as if nothing has
+	 * happened. The copy should be independently delivered to the TEE
+	 * --gateway.
+	 */
+	skb = skb_copy(skb, GFP_ATOMIC);
+	if (skb == NULL)
+		return XT_CONTINUE;
+
+#ifdef WITH_CONNTRACK
+	/* Avoid counting cloned packets towards the original connection. */
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &nf_conntrack_untracked.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	/*
+	 * If we are in PREROUTING/INPUT, the checksum must be recalculated
+	 * since the length could have changed as a result of defragmentation.
+	 *
+	 * We also decrease the TTL to mitigate potential TEE loops
+	 * between two hosts.
+	 *
+	 * Set %IP_DF so that the original source is notified of a potentially
+	 * decreased MTU on the clone route. IPv6 does this too.
+	 */
+	iph = ip_hdr(skb);
+	iph->frag_off |= htons(IP_DF);
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN)
+		--iph->ttl;
+	ip_send_check(iph);
+
+	/*
+	 * Xtables is not reentrant currently, so a choice has to be made:
+	 * 1. return absolute verdict for the original and let the cloned
+	 *    packet travel through the chains
+	 * 2. let the original continue travelling and not pass the clone
+	 *    to Xtables.
+	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
+	 * Because iph->check is already correct and we don't pass it to
+	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
+	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
+	 * invoke POSTROUTING on the cloned packet.
+	 */
+	IPCB(skb)->flags |= IPSKB_REROUTED;
+	if (tee_tg_route4(skb, info))
+		ip_output(skb);
+
+	return XT_CONTINUE;
+}
+
+#ifdef WITH_IPV6
+static bool
+tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	struct dst_entry *dst;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(fl));
+	fl.nl_u.ip6_u.daddr = info->gw.in6;
+	fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
+				  (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
+
+	dst = ip6_route_output(pick_net(skb), NULL, &fl);
+	if (dst == NULL) {
+		kfree_skb(skb);
+		return false;
+	}
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, dst);
+	skb->dev      = dst->dev;
+	skb->protocol = htons(ETH_P_IPV6);
+	return true;
+}
+
+static unsigned int
+tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+	if ((skb = skb_copy(skb, GFP_ATOMIC)) == NULL)
+		return XT_CONTINUE;
+
+#ifdef WITH_CONNTRACK
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &nf_conntrack_untracked.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN) {
+		struct ipv6hdr *iph = ipv6_hdr(skb);
+		--iph->hop_limit;
+	}
+	IP6CB(skb)->flags |= IP6SKB_REROUTED;
+	if (tee_tg_route6(skb, info))
+		ip6_output(skb);
+
+	return XT_CONTINUE;
+}
+#endif /* WITH_IPV6 */
+
+static int tee_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+	/* 0.0.0.0 and :: not allowed */
+	return (memcmp(&info->gw, &tee_zero_address,
+	       sizeof(tee_zero_address)) == 0) ? -EINVAL : 0;
+}
+
+static struct xt_target tee_tg_reg[] __read_mostly = {
+	{
+		.name       = "TEE",
+		.revision   = 0,
+		.family     = NFPROTO_IPV4,
+		.target     = tee_tg4,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#ifdef WITH_IPV6
+	{
+		.name       = "TEE",
+		.revision   = 0,
+		.family     = NFPROTO_IPV6,
+		.target     = tee_tg6,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init tee_tg_init(void)
+{
+	return xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+static void __exit tee_tg_exit(void)
+{
+	xt_unregister_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+module_init(tee_tg_init);
+module_exit(tee_tg_exit);
+MODULE_AUTHOR("Sebastian ClaÃŸen <sebastian.classen@freenet.ag>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_DESCRIPTION("Xtables: Reroute packet copy");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_TEE");
+MODULE_ALIAS("ip6t_TEE");
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-04-14  5:52   ` Eric Dumazet
  2010-04-14 10:00     ` Jan Engelhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2010-04-14  5:52 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: kaber, netfilter-devel

Le mercredi 14 avril 2010 à 01:21 +0200, Jan Engelhardt a écrit :
> xt_TEE can be used to clone and reroute a packet. This can for
> example be used to copy traffic at a router for logging purposes
> to another dedicated machine.
> 
> References: http://www.gossamer-threads.com/lists/iptables/devel/68781
> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
> ---

Lovely :)


> +
> +static const union nf_inet_addr tee_zero_address;
> +
> +static struct net *pick_net(struct sk_buff *skb)
> +{

#ifdef CONFIG_NET_NS

> +	const struct net_device *dev;
> +	const struct dst_entry *dst;
> +
> +	if (skb->dev != NULL)
> +		return dev_net(skb->dev);
> +	dst = skb_dst(skb);
> +	if (dst != NULL && dst->dev != NULL)
> +		return dev_net(dst->dev);

#endif /* CONFIG_NET_NS */

> +	return &init_net;
> +}
> +
> +
> +static unsigned int
> +tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
> +{
> +	const struct xt_tee_tginfo *info = par->targinfo;
> +	struct iphdr *iph;
> +
> +	/*
> +	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
> +	 * the original skb, which should continue on its way as if nothing has
> +	 * happened. The copy should be independently delivered to the TEE
> +	 * --gateway.
> +	 */
> +	skb = skb_copy(skb, GFP_ATOMIC);
> +	if (skb == NULL)
> +		return XT_CONTINUE;
> +
> +#ifdef WITH_CONNTRACK
> +	/* Avoid counting cloned packets towards the original connection. */
> +	nf_conntrack_put(skb->nfct);
> +	skb->nfct     = &nf_conntrack_untracked.ct_general;
> +	skb->nfctinfo = IP_CT_NEW;
> +	nf_conntrack_get(skb->nfct);

This atomic increment on a global variable worries me... Would it be
possible to avoid it (and the associated decrement and test if null)

I would like to use this TEE facility but with xxx kpps for instance ;)

> +#endif


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-14  5:52   ` Eric Dumazet
@ 2010-04-14 10:00     ` Jan Engelhardt
  2010-04-14 10:56       ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-14 10:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kaber, netfilter-devel


On Wednesday 2010-04-14 07:52, Eric Dumazet wrote:
>> +#ifdef WITH_CONNTRACK
>> +	/* Avoid counting cloned packets towards the original connection. */
>> +	nf_conntrack_put(skb->nfct);
>> +	skb->nfct     = &nf_conntrack_untracked.ct_general;
>> +	skb->nfctinfo = IP_CT_NEW;
>> +	nf_conntrack_get(skb->nfct);
>
>This atomic increment on a global variable worries me... Would it be
>possible to avoid it (and the associated decrement and test if null)
>
>I would like to use this TEE facility but with xxx kpps for instance ;)

Correctness before speed. You're free to send patches on top.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-14 10:00     ` Jan Engelhardt
@ 2010-04-14 10:56       ` Patrick McHardy
  0 siblings, 0 replies; 22+ messages in thread
From: Patrick McHardy @ 2010-04-14 10:56 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Eric Dumazet, netfilter-devel

Jan Engelhardt wrote:
> On Wednesday 2010-04-14 07:52, Eric Dumazet wrote:
>>> +#ifdef WITH_CONNTRACK
>>> +	/* Avoid counting cloned packets towards the original connection. */
>>> +	nf_conntrack_put(skb->nfct);
>>> +	skb->nfct     = &nf_conntrack_untracked.ct_general;
>>> +	skb->nfctinfo = IP_CT_NEW;
>>> +	nf_conntrack_get(skb->nfct);
>> This atomic increment on a global variable worries me... Would it be
>> possible to avoid it (and the associated decrement and test if null)
>>
>> I would like to use this TEE facility but with xxx kpps for instance ;)
> 
> Correctness before speed. You're free to send patches on top.
> 

Pablo suggest to encode "untracked" in one of the nfctinfo bits a
while ago, which would avoid all atomic operations. Anyways, this
can be done later.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-04-13 23:21 ` Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser. The setting is
configurable at runtime through sysfs and will take effect when a
table is replaced by a new one.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/x_tables.h |    7 +++
 net/ipv4/netfilter/arp_tables.c    |    6 ++-
 net/ipv4/netfilter/ip_tables.c     |   65 ++++++++++++++++--------------
 net/ipv6/netfilter/ip6_tables.c    |   56 ++++++++++----------------
 net/netfilter/x_tables.c           |   77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 26ced0c..50c8672 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -401,6 +401,13 @@ struct xt_table_info {
 	unsigned int hook_entry[NF_INET_NUMHOOKS];
 	unsigned int underflow[NF_INET_NUMHOOKS];
 
+	/*
+	 * Number of user chains. Since tables cannot have loops, at most
+	 * @stacksize jumps (number of user chains) can possibly be made.
+	 */
+	unsigned int stacksize;
+	unsigned int *stackptr;
+	void ***jumpstack;
 	/* ipt_entry tables: one per CPU */
 	/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
 	void *entries[1];
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e8e363d..07a6990 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
+		if (strcmp(arpt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 18c5b15..70900ec 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb,
 	     const struct net_device *out,
 	     struct xt_table *table)
 {
-#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct iphdr *ip;
 	bool hotdrop = false;
@@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb,
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ipt_entry *e, *back;
+	struct ipt_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb,
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
+	pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n",
+		 table->name, hook, origptr,
+		 get_entry(table_base, private->underflow[hook]));
 
 	do {
 		const struct ipt_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip_packet_match(ip, indev, outdev,
 		    &e->ip, mtpar.fragoff)) {
  no_match:
@@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0) {
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+					pr_devel("Underflow (this is normal) "
+						 "to %p\n", e);
+				} else {
+					e = jumpstack[--*stackptr];
+					pr_devel("Pulled %p out from pos %u\n",
+						 e, *stackptr);
+					e = ipt_next_entry(e);
+				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ipt_entry *next = ipt_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
+				pr_devel("Pushed %p into pos %u\n",
+					 e, *stackptr - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb,
 		tgpar.targinfo = t->data;
 
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		/* Target might have changed stuff. */
 		ip = ip_hdr(skb);
 		if (verdict == IPT_CONTINUE)
@@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 	xt_info_rdunlock_bh();
-
+	pr_devel("Exiting %s; resetting sp from %u to %u\n",
+		 __func__, *stackptr, origptr);
+	*stackptr = origptr;
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
 #else
@@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ipt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f2b815e..2a2770b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb,
 	      const struct net_device *out,
 	      struct xt_table *table)
 {
-#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	bool hotdrop = false;
 	/* Initializing verdict to NF_DROP keeps gcc happy. */
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ip6t_entry *e, *back;
+	struct ip6t_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb,
 
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
-
 	do {
 		const struct ip6t_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &mtpar.thoff, &mtpar.fragoff, &hotdrop)) {
  no_match:
@@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0)
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+				else
+					e = ip6t_next_entry(jumpstack[--*stackptr]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ip6t_entry *next = ip6t_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb,
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		if (verdict == IP6T_CONTINUE)
 			e = ip6t_next_entry(e);
 		else
@@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 
-#ifdef CONFIG_NETFILTER_DEBUG
-	tb_comefrom = NETFILTER_LINK_POISON;
-#endif
 	xt_info_rdunlock_bh();
+	*stackptr = origptr;
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
@@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ip6t_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8e23d8f..edde5c6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
+/* Allow this many total (re)entries. */
+static const unsigned int xt_jumpstack_multiplier = 2;
+
 /* Registration hooks for targets. */
 int
 xt_register_target(struct xt_target *target)
@@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info)
 		else
 			vfree(info->entries[cpu]);
 	}
+
+	if (info->jumpstack != NULL) {
+		if (sizeof(void *) * info->stacksize > PAGE_SIZE) {
+			for_each_possible_cpu(cpu)
+				vfree(info->jumpstack[cpu]);
+		} else {
+			for_each_possible_cpu(cpu)
+				kfree(info->jumpstack[cpu]);
+		}
+	}
+
+	if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->jumpstack);
+	else
+		kfree(info->jumpstack);
+	if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->stackptr);
+	else
+		kfree(info->stackptr);
+
 	kfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
 
+static int xt_jumpstack_alloc(struct xt_table_info *i)
+{
+	unsigned int size;
+	int cpu;
+
+	size = sizeof(unsigned int) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->stackptr = vmalloc(size);
+	else
+		i->stackptr = kmalloc(size, GFP_KERNEL);
+	if (i->stackptr == NULL)
+		return -ENOMEM;
+	memset(i->stackptr, 0, size);
+
+	size = sizeof(void **) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->jumpstack = vmalloc(size);
+	else
+		i->jumpstack = kmalloc(size, GFP_KERNEL);
+	if (i->jumpstack == NULL)
+		return -ENOMEM;
+	memset(i->jumpstack, 0, size);
+
+	i->stacksize *= xt_jumpstack_multiplier;
+	size = sizeof(void *) * i->stacksize;
+	for_each_possible_cpu(cpu) {
+		if (size > PAGE_SIZE)
+			i->jumpstack[cpu] = vmalloc_node(size,
+				cpu_to_node(cpu));
+		else
+			i->jumpstack[cpu] = kmalloc_node(size,
+				GFP_KERNEL, cpu_to_node(cpu));
+		if (i->jumpstack[cpu] == NULL)
+			/*
+			 * Freeing will be done later on by the callers. The
+			 * chain is: xt_replace_table -> __do_replace ->
+			 * do_replace -> xt_free_table_info.
+			 */
+			return -ENOMEM;
+	}
+
+	return 0;
+}
 
 struct xt_table_info *
 xt_replace_table(struct xt_table *table,
@@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	int ret;
 
 	/* Do the substitution. */
 	local_bh_disable();
@@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
@@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net,
 	struct xt_table_info *private;
 	struct xt_table *t, *table;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	/* Don't add one object to multiple lists. */
 	table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
 	if (!table) {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
@ 2010-04-13 23:21 ` Jan Engelhardt
  2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
  2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy
  4 siblings, 0 replies; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/ip_output.c   |    1 -
 net/ipv6/ip6_output.c  |    1 -
 net/netfilter/xt_TEE.c |   38 +++++++++++++++++---------------------
 3 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0abfdde..f09135e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,7 +309,6 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d09be7f..c10a38a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,7 +176,6 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 96e3785..f193133 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -12,6 +12,7 @@
  */
 #include <linux/ip.h>
 #include <linux/module.h>
+#include <linux/percpu.h>
 #include <linux/route.h>
 #include <linux/skbuff.h>
 #include <net/checksum.h>
@@ -32,6 +33,7 @@
 #endif
 
 static const union nf_inet_addr tee_zero_address;
+static DEFINE_PER_CPU(bool, tee_active);
 
 static struct net *pick_net(struct sk_buff *skb)
 {
@@ -75,13 +77,15 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 	const struct xt_tee_tginfo *info = par->targinfo;
 	struct iphdr *iph;
 
+	if (percpu_read(tee_active))
+		return XT_CONTINUE;
 	/*
 	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
 	 * the original skb, which should continue on its way as if nothing has
 	 * happened. The copy should be independently delivered to the TEE
 	 * --gateway.
 	 */
-	skb = skb_copy(skb, GFP_ATOMIC);
+	skb = pskb_copy(skb, GFP_ATOMIC);
 	if (skb == NULL)
 		return XT_CONTINUE;
 
@@ -109,22 +113,11 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 		--iph->ttl;
 	ip_send_check(iph);
 
-	/*
-	 * Xtables is not reentrant currently, so a choice has to be made:
-	 * 1. return absolute verdict for the original and let the cloned
-	 *    packet travel through the chains
-	 * 2. let the original continue travelling and not pass the clone
-	 *    to Xtables.
-	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
-	 * Because iph->check is already correct and we don't pass it to
-	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
-	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
-	 * invoke POSTROUTING on the cloned packet.
-	 */
-	IPCB(skb)->flags |= IPSKB_REROUTED;
-	if (tee_tg_route4(skb, info))
-		ip_output(skb);
-
+	if (tee_tg_route4(skb, info)) {
+		percpu_write(tee_active, true);
+		ip_local_out(skb);
+		percpu_write(tee_active, false);
+	}
 	return XT_CONTINUE;
 }
 
@@ -158,6 +151,8 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 {
 	const struct xt_tee_tginfo *info = par->targinfo;
 
+	if (percpu_read(tee_active))
+		return XT_CONTINUE;
 	if ((skb = skb_copy(skb, GFP_ATOMIC)) == NULL)
 		return XT_CONTINUE;
 
@@ -172,10 +167,11 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 		struct ipv6hdr *iph = ipv6_hdr(skb);
 		--iph->hop_limit;
 	}
-	IP6CB(skb)->flags |= IP6SKB_REROUTED;
-	if (tee_tg_route6(skb, info))
-		ip6_output(skb);
-
+	if (tee_tg_route6(skb, info)) {
+		percpu_write(tee_active, true);
+		ip6_local_out(skb);
+		percpu_write(tee_active, false);
+	}
 	return XT_CONTINUE;
 }
 #endif /* WITH_IPV6 */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
                   ` (2 preceding siblings ...)
  2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
@ 2010-04-13 23:21 ` Jan Engelhardt
  2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy
  4 siblings, 0 replies; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/netfilter/ip_tables.c   |    2 --
 net/ipv4/netfilter/ipt_REJECT.c  |    3 ---
 net/ipv6/netfilter/ip6_tables.c  |    2 --
 net/ipv6/netfilter/ip6t_REJECT.c |    3 ---
 4 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 70900ec..bb5e0d9 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -434,8 +434,6 @@ ipt_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		   abs. verdicts */
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
index b026014..038fa0b 100644
--- a/net/ipv4/netfilter/ipt_REJECT.c
+++ b/net/ipv4/netfilter/ipt_REJECT.c
@@ -139,9 +139,6 @@ reject_tg(struct sk_buff *skb, const struct xt_target_param *par)
 {
 	const struct ipt_reject_info *reject = par->targinfo;
 
-	/* WARNING: This code causes reentry within iptables.
-	   This means that the iptables jump stack is now crap.  We
-	   must return an absolute verdict. --RR */
 	switch (reject->with) {
 	case IPT_ICMP_NET_UNREACHABLE:
 		send_unreach(skb, ICMP_NET_UNREACH);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 2a2770b..7afa117 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -451,8 +451,6 @@ ip6t_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		   abs. verdicts */
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c
index 55b9b2d..dad9762 100644
--- a/net/ipv6/netfilter/ip6t_REJECT.c
+++ b/net/ipv6/netfilter/ip6t_REJECT.c
@@ -179,9 +179,6 @@ reject_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 	struct net *net = dev_net((par->in != NULL) ? par->in : par->out);
 
 	pr_debug("%s: medium point\n", __func__);
-	/* WARNING: This code causes reentry within ip6tables.
-	   This means that the ip6tables jump stack is now crap.  We
-	   must return an absolute verdict. --RR */
 	switch (reject->with) {
 	case IP6T_ICMP6_NO_ROUTE:
 		send_unreach(net, skb, ICMPV6_NOROUTE, par->hooknum);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
                   ` (3 preceding siblings ...)
  2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
@ 2010-04-14 10:57 ` Patrick McHardy
  2010-04-14 11:15   ` Jan Engelhardt
  4 siblings, 1 reply; 22+ messages in thread
From: Patrick McHardy @ 2010-04-14 10:57 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> in this round:
> - use IP6SKB_REROUTED in v6 code
> - pick_net function: use skb->dev or skb->dst->dev when available
>   (or completely fall back to init_net in case there's something
>   going on)

So what about oif routing which I asked for two times?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy
@ 2010-04-14 11:15   ` Jan Engelhardt
  2010-04-14 11:24     ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-14 11:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

On Wednesday 2010-04-14 12:57, Patrick McHardy wrote:

>Jan Engelhardt wrote:
>> in this round:
>> - use IP6SKB_REROUTED in v6 code
>> - pick_net function: use skb->dev or skb->dst->dev when available
>>   (or completely fall back to init_net in case there's something
>>   going on)
>
>So what about oif routing which I asked for two times?

Guess it must have fallen off somewhere between the resends. We can 
still add it as a patch on top.

>I guess you'd usually have a host for logging or IDS somewhere on a
>private network and TEE packets there. So specifying oif and gateway
>seems most useful to me.

The oif is already determined by the route to the gateway(logging
host). I'd also fear that people abuse TEE as a ROUTE replacement
when they see an --oif.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-14 11:15   ` Jan Engelhardt
@ 2010-04-14 11:24     ` Patrick McHardy
  2010-04-14 11:32       ` Jan Engelhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick McHardy @ 2010-04-14 11:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Wednesday 2010-04-14 12:57, Patrick McHardy wrote:
> 
>> Jan Engelhardt wrote:
>>> in this round:
>>> - use IP6SKB_REROUTED in v6 code
>>> - pick_net function: use skb->dev or skb->dst->dev when available
>>>   (or completely fall back to init_net in case there's something
>>>   going on)
>> So what about oif routing which I asked for two times?
> 
> Guess it must have fallen off somewhere between the resends. We can 
> still add it as a patch on top.

Please add it before I apply it. Should be a fairly trivial change.

>> I guess you'd usually have a host for logging or IDS somewhere on a
>> private network and TEE packets there. So specifying oif and gateway
>> seems most useful to me.
> 
> The oif is already determined by the route to the gateway(logging
> host). I'd also fear that people abuse TEE as a ROUTE replacement
> when they see an --oif.

That's something different. The oif forces use of a specific output
device, independant of the routing tables. F.i.:

# ip l l dummy0
4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff

# ip a s dummy0
4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    li# ip a s dummy0
4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::947f:5eff:fed2:d6c9/64 scope link
       valid_lft forever preferred_lft forever

# ip r | grep dummy0
#

# ping 10.0.0.1 -I dummy0
IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 1, length 64
IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 2, length 64

This is quite useful since your logging host doesn't have to be
reachable through normal routing.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-14 11:24     ` Patrick McHardy
@ 2010-04-14 11:32       ` Jan Engelhardt
  2010-04-15 11:41         ` Jan Engelhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-14 11:32 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

On Wednesday 2010-04-14 13:24, Patrick McHardy wrote:
>>> So what about oif routing which I asked for two times?
>> 
>> Guess it must have fallen off somewhere between the resends. We can 
>> still add it as a patch on top.
>
>Please add it before I apply it. Should be a fairly trivial change.
>
>>> I guess you'd usually have a host for logging or IDS somewhere on a
>>> private network and TEE packets there. So specifying oif and gateway
>>> seems most useful to me.
>> 
>> The oif is already determined by the route to the gateway(logging
>> host). I'd also fear that people abuse TEE as a ROUTE replacement
>> when they see an --oif.
>
>That's something different. The oif forces use of a specific output
>device, independant of the routing tables. F.i.:

You should be able to use a specific output device by use of a routing 
table, and selecting that table with fwmark.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-14 11:32       ` Jan Engelhardt
@ 2010-04-15 11:41         ` Jan Engelhardt
  2010-04-15 11:56           ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-15 11:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

On Wednesday 2010-04-14 13:32, Jan Engelhardt wrote:
>On Wednesday 2010-04-14 13:24, Patrick McHardy wrote:
>>>> So what about oif routing which I asked for two times?
>>> 
>>> Guess it must have fallen off somewhere between the resends. We can 
>>> still add it as a patch on top.
>>
>>Please add it before I apply it. Should be a fairly trivial change.
>>
>>>> I guess you'd usually have a host for logging or IDS somewhere on a
>>>> private network and TEE packets there. So specifying oif and gateway
>>>> seems most useful to me.
>>> 
>>> The oif is already determined by the route to the gateway(logging
>>> host). I'd also fear that people abuse TEE as a ROUTE replacement
>>> when they see an --oif.
>>
>>That's something different. The oif forces use of a specific output
>>device, independant of the routing tables. F.i.:
>
>You should be able to use a specific output device by use of a routing 
>table, and selecting that table with fwmark.

>This is quite useful since your logging host doesn't have to be
>reachable through normal routing.
>4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
>    link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff
>    inet6 fe80::947f:5eff:fed2:d6c9/64 scope link
>       valid_lft forever preferred_lft forever
>
># ping 10.0.0.1 -I dummy0
>IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
>    192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 1, length 64
>IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84)
>    192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 2, length 64

I don't see why one would want the log server to be unreachable.

What _does_ look reasonable however is an encapsulation device for transporting
teed packets across multiple real hops:

14: gre1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN
    link/gre 5.6.7.8 peer 1.2.3.4

and this case can be solved with standard(/fwmarking) policy routing
('default via gre1').

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: nf-next: TEE only
  2010-04-15 11:41         ` Jan Engelhardt
@ 2010-04-15 11:56           ` Patrick McHardy
  0 siblings, 0 replies; 22+ messages in thread
From: Patrick McHardy @ 2010-04-15 11:56 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Wednesday 2010-04-14 13:32, Jan Engelhardt wrote:
> I don't see why one would want the log server to be unreachable.

I don't want that, in fact I don't have use for TEE personally.

> What _does_ look reasonable however is an encapsulation device for transporting
> teed packets across multiple real hops:
> 
> 14: gre1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN
>     link/gre 5.6.7.8 peer 1.2.3.4
> 
> and this case can be solved with standard(/fwmarking) policy routing
> ('default via gre1').

Sure. And it can be solved using standard routing by specifying
the oif. I'm not interested in debatting this endlessly, this is
a standard routing key and there's no reason at all to not support
it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* nf-next: TEE 20100215
@ 2010-04-15 23:25 Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel


Hi,


xt_TEE now has routing by oif. Please review. Does it look right to you?


The following changes since commit 9c6eb28aca52d562f3ffbaebaa56385df9972a43:
  Jan Engelhardt (1):
        netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation

are available in the git repository at:

  git://dev.medozas.de/linux master

Jan Engelhardt (4):
      netfilter: xtables: inclusion of xt_TEE
      netfilter: xtables2: make ip_tables reentrant
      netfilter: xt_TEE: have cloned packet travel through Xtables too
      netfilter: xtables: remove old comments about reentrancy

 include/linux/netfilter/Kbuild     |    1 +
 include/linux/netfilter/x_tables.h |    7 +
 include/linux/netfilter/xt_TEE.h   |    9 ++
 net/ipv4/netfilter/arp_tables.c    |    6 +-
 net/ipv4/netfilter/ip_tables.c     |   67 +++++-----
 net/ipv4/netfilter/ipt_REJECT.c    |    3 -
 net/ipv6/netfilter/ip6_tables.c    |   58 +++-----
 net/ipv6/netfilter/ip6t_REJECT.c   |    3 -
 net/netfilter/Kconfig              |    7 +
 net/netfilter/Makefile             |    1 +
 net/netfilter/x_tables.c           |   77 +++++++++++
 net/netfilter/xt_TEE.c             |  252 ++++++++++++++++++++++++++++++++++++
 12 files changed, 415 insertions(+), 76 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
@ 2010-04-15 23:25 ` Jan Engelhardt
  2010-04-19 12:22   ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser. The setting is
configurable at runtime through sysfs and will take effect when a
table is replaced by a new one.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/x_tables.h |    7 +++
 net/ipv4/netfilter/arp_tables.c    |    6 ++-
 net/ipv4/netfilter/ip_tables.c     |   65 ++++++++++++++++--------------
 net/ipv6/netfilter/ip6_tables.c    |   56 ++++++++++----------------
 net/netfilter/x_tables.c           |   77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 26ced0c..50c8672 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -401,6 +401,13 @@ struct xt_table_info {
 	unsigned int hook_entry[NF_INET_NUMHOOKS];
 	unsigned int underflow[NF_INET_NUMHOOKS];
 
+	/*
+	 * Number of user chains. Since tables cannot have loops, at most
+	 * @stacksize jumps (number of user chains) can possibly be made.
+	 */
+	unsigned int stacksize;
+	unsigned int *stackptr;
+	void ***jumpstack;
 	/* ipt_entry tables: one per CPU */
 	/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
 	void *entries[1];
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e8e363d..07a6990 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
+		if (strcmp(arpt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 18c5b15..70900ec 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb,
 	     const struct net_device *out,
 	     struct xt_table *table)
 {
-#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct iphdr *ip;
 	bool hotdrop = false;
@@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb,
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ipt_entry *e, *back;
+	struct ipt_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb,
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
+	pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n",
+		 table->name, hook, origptr,
+		 get_entry(table_base, private->underflow[hook]));
 
 	do {
 		const struct ipt_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip_packet_match(ip, indev, outdev,
 		    &e->ip, mtpar.fragoff)) {
  no_match:
@@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0) {
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+					pr_devel("Underflow (this is normal) "
+						 "to %p\n", e);
+				} else {
+					e = jumpstack[--*stackptr];
+					pr_devel("Pulled %p out from pos %u\n",
+						 e, *stackptr);
+					e = ipt_next_entry(e);
+				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ipt_entry *next = ipt_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
+				pr_devel("Pushed %p into pos %u\n",
+					 e, *stackptr - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb,
 		tgpar.targinfo = t->data;
 
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		/* Target might have changed stuff. */
 		ip = ip_hdr(skb);
 		if (verdict == IPT_CONTINUE)
@@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 	xt_info_rdunlock_bh();
-
+	pr_devel("Exiting %s; resetting sp from %u to %u\n",
+		 __func__, *stackptr, origptr);
+	*stackptr = origptr;
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
 #else
@@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ipt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f2b815e..2a2770b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb,
 	      const struct net_device *out,
 	      struct xt_table *table)
 {
-#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	bool hotdrop = false;
 	/* Initializing verdict to NF_DROP keeps gcc happy. */
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ip6t_entry *e, *back;
+	struct ip6t_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb,
 
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
-
 	do {
 		const struct ip6t_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &mtpar.thoff, &mtpar.fragoff, &hotdrop)) {
  no_match:
@@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0)
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+				else
+					e = ip6t_next_entry(jumpstack[--*stackptr]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ip6t_entry *next = ip6t_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb,
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		if (verdict == IP6T_CONTINUE)
 			e = ip6t_next_entry(e);
 		else
@@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 
-#ifdef CONFIG_NETFILTER_DEBUG
-	tb_comefrom = NETFILTER_LINK_POISON;
-#endif
 	xt_info_rdunlock_bh();
+	*stackptr = origptr;
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
@@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ip6t_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8e23d8f..edde5c6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
+/* Allow this many total (re)entries. */
+static const unsigned int xt_jumpstack_multiplier = 2;
+
 /* Registration hooks for targets. */
 int
 xt_register_target(struct xt_target *target)
@@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info)
 		else
 			vfree(info->entries[cpu]);
 	}
+
+	if (info->jumpstack != NULL) {
+		if (sizeof(void *) * info->stacksize > PAGE_SIZE) {
+			for_each_possible_cpu(cpu)
+				vfree(info->jumpstack[cpu]);
+		} else {
+			for_each_possible_cpu(cpu)
+				kfree(info->jumpstack[cpu]);
+		}
+	}
+
+	if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->jumpstack);
+	else
+		kfree(info->jumpstack);
+	if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->stackptr);
+	else
+		kfree(info->stackptr);
+
 	kfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
 
+static int xt_jumpstack_alloc(struct xt_table_info *i)
+{
+	unsigned int size;
+	int cpu;
+
+	size = sizeof(unsigned int) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->stackptr = vmalloc(size);
+	else
+		i->stackptr = kmalloc(size, GFP_KERNEL);
+	if (i->stackptr == NULL)
+		return -ENOMEM;
+	memset(i->stackptr, 0, size);
+
+	size = sizeof(void **) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->jumpstack = vmalloc(size);
+	else
+		i->jumpstack = kmalloc(size, GFP_KERNEL);
+	if (i->jumpstack == NULL)
+		return -ENOMEM;
+	memset(i->jumpstack, 0, size);
+
+	i->stacksize *= xt_jumpstack_multiplier;
+	size = sizeof(void *) * i->stacksize;
+	for_each_possible_cpu(cpu) {
+		if (size > PAGE_SIZE)
+			i->jumpstack[cpu] = vmalloc_node(size,
+				cpu_to_node(cpu));
+		else
+			i->jumpstack[cpu] = kmalloc_node(size,
+				GFP_KERNEL, cpu_to_node(cpu));
+		if (i->jumpstack[cpu] == NULL)
+			/*
+			 * Freeing will be done later on by the callers. The
+			 * chain is: xt_replace_table -> __do_replace ->
+			 * do_replace -> xt_free_table_info.
+			 */
+			return -ENOMEM;
+	}
+
+	return 0;
+}
 
 struct xt_table_info *
 xt_replace_table(struct xt_table *table,
@@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	int ret;
 
 	/* Do the substitution. */
 	local_bh_disable();
@@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
@@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net,
 	struct xt_table_info *private;
 	struct xt_table *t, *table;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	/* Don't add one object to multiple lists. */
 	table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
 	if (!table) {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
@ 2010-04-19 12:22   ` Patrick McHardy
  2010-04-19 12:54     ` Jan Engelhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick McHardy @ 2010-04-19 12:22 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> Currently, the table traverser stores return addresses in the ruleset
> itself (struct ip6t_entry->comefrom). This has a well-known drawback:
> the jumpstack is overwritten on reentry, making it necessary for
> targets to return absolute verdicts. Also, the ruleset (which might
> be heavy memory-wise) needs to be replicated for each CPU that can
> possibly invoke ip6t_do_table.
> 
> This patch decouples the jumpstack from struct ip6t_entry and instead
> puts it into xt_table_info. Not being restricted by 'comefrom'
> anymore, we can set up a stack as needed. By default, there is room
> allocated for two entries into the traverser. The setting is
> configurable at runtime through sysfs and will take effect when a
> table is replaced by a new one.

The changelog is not up to date anymore, but ...

> 
> diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
> index 26ced0c..50c8672 100644
> --- a/include/linux/netfilter/x_tables.h
> +++ b/include/linux/netfilter/x_tables.h
> @@ -401,6 +401,13 @@ struct xt_table_info {
>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>  	unsigned int underflow[NF_INET_NUMHOOKS];
>  
> +	/*
> +	 * Number of user chains. Since tables cannot have loops, at most
> +	 * @stacksize jumps (number of user chains) can possibly be made.
> +	 */
> +	unsigned int stacksize;
> +	unsigned int *stackptr;
> +	void ***jumpstack;
...
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8e23d8f..edde5c6 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>  	[NFPROTO_IPV6]   = "ip6",
>  };
>  
> +/* Allow this many total (re)entries. */
> +static const unsigned int xt_jumpstack_multiplier = 2;
> +

Why aren't you using a define instead of saving the stack size
in the table info?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 12:22   ` Patrick McHardy
@ 2010-04-19 12:54     ` Jan Engelhardt
  2010-04-19 14:06       ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-19 12:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel


On Monday 2010-04-19 14:22, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> This patch decouples the jumpstack from struct ip6t_entry and instead
>> puts it into xt_table_info. Not being restricted by 'comefrom'
>> anymore, we can set up a stack as needed. By default, there is room
>> allocated for two entries into the traverser. The setting is
>> configurable at runtime through sysfs and will take effect when a
>> table is replaced by a new one.
>
>The changelog is not up to date anymore, but ...

Oops ;-)

>> --- a/include/linux/netfilter/x_tables.h
>> +++ b/include/linux/netfilter/x_tables.h
>> @@ -401,6 +401,13 @@ struct xt_table_info {
>>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>>  	unsigned int underflow[NF_INET_NUMHOOKS];
>>  
>> +	/*
>> +	 * Number of user chains. Since tables cannot have loops, at most
>> +	 * @stacksize jumps (number of user chains) can possibly be made.
>> +	 */
>> +	unsigned int stacksize;
>> +	unsigned int *stackptr;
>> +	void ***jumpstack;
>...
>> --- a/net/netfilter/x_tables.c
>> +++ b/net/netfilter/x_tables.c
>> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>>  	[NFPROTO_IPV6]   = "ip6",
>>  };
>>  
>> +/* Allow this many total (re)entries. */
>> +static const unsigned int xt_jumpstack_multiplier = 2;
>> +
>
>Why aren't you using a define instead of saving the stack size
>in the table info?

I don't see how a define does any good here. Since you were quoting
the multiplier line, I guess you could be confusing the multiplier
with stored stacksize. FTR, the definition is:

table->stacksize := number_of_user_chains(#UC) * multiplier;

Since #UC is variable, so is stacksize, and so stacksize cannot
be replaced by a constant.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 12:54     ` Jan Engelhardt
@ 2010-04-19 14:06       ` Patrick McHardy
  2010-04-20 13:18         ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick McHardy @ 2010-04-19 14:06 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Monday 2010-04-19 14:22, Patrick McHardy wrote:
>>> --- a/include/linux/netfilter/x_tables.h
>>> +++ b/include/linux/netfilter/x_tables.h
>>> @@ -401,6 +401,13 @@ struct xt_table_info {
>>>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>>>  	unsigned int underflow[NF_INET_NUMHOOKS];
>>>  
>>> +	/*
>>> +	 * Number of user chains. Since tables cannot have loops, at most
>>> +	 * @stacksize jumps (number of user chains) can possibly be made.
>>> +	 */
>>> +	unsigned int stacksize;
>>> +	unsigned int *stackptr;
>>> +	void ***jumpstack;
>> ...
>>> --- a/net/netfilter/x_tables.c
>>> +++ b/net/netfilter/x_tables.c
>>> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>>>  	[NFPROTO_IPV6]   = "ip6",
>>>  };
>>>  
>>> +/* Allow this many total (re)entries. */
>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>> +
>> Why aren't you using a define instead of saving the stack size
>> in the table info?
> 
> I don't see how a define does any good here. Since you were quoting
> the multiplier line, I guess you could be confusing the multiplier
> with stored stacksize. FTR, the definition is:
> 
> table->stacksize := number_of_user_chains(#UC) * multiplier;
> 
> Since #UC is variable, so is stacksize, and so stacksize cannot
> be replaced by a constant.

Right, thanks for the explanation. Applied.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 14:06       ` Patrick McHardy
@ 2010-04-20 13:18         ` Patrick McHardy
  2010-04-20 13:21           ` Patrick McHardy
  2010-04-20 18:26           ` Jan Engelhardt
  0 siblings, 2 replies; 22+ messages in thread
From: Patrick McHardy @ 2010-04-20 13:18 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Patrick McHardy wrote:
> Jan Engelhardt wrote:
>>>> +/* Allow this many total (re)entries. */
>>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>>> +
>>> Why aren't you using a define instead of saving the stack size
>>> in the table info?
>> I don't see how a define does any good here. Since you were quoting
>> the multiplier line, I guess you could be confusing the multiplier
>> with stored stacksize. FTR, the definition is:
>>
>> table->stacksize := number_of_user_chains(#UC) * multiplier;
>>
>> Since #UC is variable, so is stacksize, and so stacksize cannot
>> be replaced by a constant.
> 
> Right, thanks for the explanation. Applied.

I just noticed a problem with this patch:

[  428.295752] BUG: sleeping function called from invalid context at
mm/slub.c:1705
[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
[  428.295776] Call Trace:
[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
[  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
[  428.295851]  [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables]
[  428.295864]  [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables]
[  428.295876]  [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables]
[  428.295922]  [<c037388f>] nf_sockopt+0x167/0x17c
[  428.295931]  [<c03738d8>] nf_setsockopt+0x1a/0x1f
[  428.295940]  [<c037dda4>] ip_setsockopt+0x60/0x84
[  428.295951]  [<c039260a>] raw_setsockopt+0x1f/0x62
[  428.295960]  [<c034d909>] sock_common_setsockopt+0x18/0x1d
[  428.295968]  [<c034bfb9>] sys_setsockopt+0x5e/0x79
[  428.295977]  [<c034d0a0>] sys_socketcall+0x12d/0x190
[  428.295987]  [<c0102a57>] sysenter_do_call+0x12/0x26

You probably shouldn't be allocating the jumpstack while BHs are
disabled.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 13:18         ` Patrick McHardy
@ 2010-04-20 13:21           ` Patrick McHardy
  2010-04-20 18:26           ` Jan Engelhardt
  1 sibling, 0 replies; 22+ messages in thread
From: Patrick McHardy @ 2010-04-20 13:21 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Patrick McHardy wrote:
> Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>>>> +/* Allow this many total (re)entries. */
>>>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>>>> +
>>>> Why aren't you using a define instead of saving the stack size
>>>> in the table info?
>>> I don't see how a define does any good here. Since you were quoting
>>> the multiplier line, I guess you could be confusing the multiplier
>>> with stored stacksize. FTR, the definition is:
>>>
>>> table->stacksize := number_of_user_chains(#UC) * multiplier;
>>>
>>> Since #UC is variable, so is stacksize, and so stacksize cannot
>>> be replaced by a constant.
>> Right, thanks for the explanation. Applied.
> 
> I just noticed a problem with this patch:
> 
> [  428.295752] BUG: sleeping function called from invalid context at
> mm/slub.c:1705
> [  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
> [  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
> [  428.295776] Call Trace:
> [  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
> [  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
> [  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
> [  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
> [  428.295851]  [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables]
> [  428.295864]  [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables]
> [  428.295876]  [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables]
> [  428.295922]  [<c037388f>] nf_sockopt+0x167/0x17c
> [  428.295931]  [<c03738d8>] nf_setsockopt+0x1a/0x1f
> [  428.295940]  [<c037dda4>] ip_setsockopt+0x60/0x84
> [  428.295951]  [<c039260a>] raw_setsockopt+0x1f/0x62
> [  428.295960]  [<c034d909>] sock_common_setsockopt+0x18/0x1d
> [  428.295968]  [<c034bfb9>] sys_setsockopt+0x5e/0x79
> [  428.295977]  [<c034d0a0>] sys_socketcall+0x12d/0x190
> [  428.295987]  [<c0102a57>] sysenter_do_call+0x12/0x26
> 
> You probably shouldn't be allocating the jumpstack while BHs are
> disabled.

I pushed the entire patchset out, please send a fix on top.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 13:18         ` Patrick McHardy
  2010-04-20 13:21           ` Patrick McHardy
@ 2010-04-20 18:26           ` Jan Engelhardt
  2010-04-21 12:47             ` Patrick McHardy
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Engelhardt @ 2010-04-20 18:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel


On Tuesday 2010-04-20 15:18, Patrick McHardy wrote:
>
>I just noticed a problem with this patch:
>
>[  428.295752] BUG: sleeping function called from invalid context at
>mm/slub.c:1705
>[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
>[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
>[  428.295776] Call Trace:
>[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
>[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
>[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
>[  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
>
>You probably shouldn't be allocating the jumpstack while BHs are
>disabled.


The following patch should address this. It's easily movable
as it only does work on newinfo itself.



The following changes since commit 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350:
  Bart De Schuymer (1):
        netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in PPPoE traffic

are available in the git repository at:

  git://dev.medozas.de/linux master

Jan Engelhardt (1):
      netfilter: x_tables: move sleeping allocation outside BH-disabled region

 net/netfilter/x_tables.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

parent 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 (v2.6.34-rc3-1335-g6c79bf0)
commit 3109d6b7342a5c8bf23697a86be12e733472d820
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Tue Apr 20 20:11:15 2010 +0200

netfilter: x_tables: move sleeping allocation outside BH-disabled region

The jumpstack allocation needs to be moved out of the critical region.
Corrects this notice:

BUG: sleeping function called from invalid context at mm/slub.c:1705
[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
[  428.295776] Call Trace:
[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/netfilter/x_tables.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 3ae3234..445de70 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -801,6 +801,12 @@ xt_replace_table(struct xt_table *table,
 	struct xt_table_info *private;
 	int ret;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	/* Do the substitution. */
 	local_bh_disable();
 	private = table->private;
@@ -814,12 +820,6 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
-	ret = xt_jumpstack_alloc(newinfo);
-	if (ret < 0) {
-		*error = ret;
-		return NULL;
-	}
-
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
-- 
# Created with git-export-patch

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 18:26           ` Jan Engelhardt
@ 2010-04-21 12:47             ` Patrick McHardy
  0 siblings, 0 replies; 22+ messages in thread
From: Patrick McHardy @ 2010-04-21 12:47 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> netfilter: x_tables: move sleeping allocation outside BH-disabled region
> 
> The jumpstack allocation needs to be moved out of the critical region.
> Corrects this notice:
> 
> BUG: sleeping function called from invalid context at mm/slub.c:1705
> [  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
> [  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
> [  428.295776] Call Trace:
> [  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
> [  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
> [  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]

Applied, thanks Jan.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-04-21 12:47 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
2010-04-14  5:52   ` Eric Dumazet
2010-04-14 10:00     ` Jan Engelhardt
2010-04-14 10:56       ` Patrick McHardy
2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy
2010-04-14 11:15   ` Jan Engelhardt
2010-04-14 11:24     ` Patrick McHardy
2010-04-14 11:32       ` Jan Engelhardt
2010-04-15 11:41         ` Jan Engelhardt
2010-04-15 11:56           ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
2010-04-19 12:22   ` Patrick McHardy
2010-04-19 12:54     ` Jan Engelhardt
2010-04-19 14:06       ` Patrick McHardy
2010-04-20 13:18         ` Patrick McHardy
2010-04-20 13:21           ` Patrick McHardy
2010-04-20 18:26           ` Jan Engelhardt
2010-04-21 12:47             ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).