* nf-next: TEE only @ 2010-04-13 23:21 Jan Engelhardt 2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt ` (4 more replies) 0 siblings, 5 replies; 22+ messages in thread From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Hi, in this round: - use IP6SKB_REROUTED in v6 code - pick_net function: use skb->dev or skb->dst->dev when available (or completely fall back to init_net in case there's something going on) ---- The following changes since commit 9c6eb28aca52d562f3ffbaebaa56385df9972a43: Jan Engelhardt (1): netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation are available in the git repository at: git://dev.medozas.de/linux master Jan Engelhardt (4): netfilter: xtables: inclusion of xt_TEE netfilter: xtables2: make ip_tables reentrant netfilter: xt_TEE: have cloned packet travel through Xtables too netfilter: xtables: remove old comments about reentrancy include/linux/netfilter/Kbuild | 1 + include/linux/netfilter/x_tables.h | 7 + include/linux/netfilter/xt_TEE.h | 8 ++ net/ipv4/netfilter/arp_tables.c | 6 +- net/ipv4/netfilter/ip_tables.c | 67 ++++++----- net/ipv4/netfilter/ipt_REJECT.c | 3 - net/ipv6/netfilter/ip6_tables.c | 58 ++++------ net/ipv6/netfilter/ip6t_REJECT.c | 3 - net/netfilter/Kconfig | 7 + net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 77 ++++++++++++ net/netfilter/xt_TEE.c | 228 ++++++++++++++++++++++++++++++++++++ 12 files changed, 390 insertions(+), 76 deletions(-) create mode 100644 include/linux/netfilter/xt_TEE.h create mode 100644 net/netfilter/xt_TEE.c ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt @ 2010-04-13 23:21 ` Jan Engelhardt 2010-04-14 5:52 ` Eric Dumazet 2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt ` (3 subsequent siblings) 4 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel xt_TEE can be used to clone and reroute a packet. This can for example be used to copy traffic at a router for logging purposes to another dedicated machine. References: http://www.gossamer-threads.com/lists/iptables/devel/68781 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- include/linux/netfilter/Kbuild | 1 + include/linux/netfilter/xt_TEE.h | 8 ++ net/ipv4/ip_output.c | 1 + net/ipv6/ip6_output.c | 1 + net/netfilter/Kconfig | 7 + net/netfilter/Makefile | 1 + net/netfilter/xt_TEE.c | 232 ++++++++++++++++++++++++++++++++++++++ 7 files changed, 251 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_TEE.h create mode 100644 net/netfilter/xt_TEE.c diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild index a5a63e4..48767cd 100644 --- a/include/linux/netfilter/Kbuild +++ b/include/linux/netfilter/Kbuild @@ -16,6 +16,7 @@ header-y += xt_RATEEST.h header-y += xt_SECMARK.h header-y += xt_TCPMSS.h header-y += xt_TCPOPTSTRIP.h +header-y += xt_TEE.h header-y += xt_TPROXY.h header-y += xt_comment.h header-y += xt_connbytes.h diff --git a/include/linux/netfilter/xt_TEE.h b/include/linux/netfilter/xt_TEE.h new file mode 100644 index 0000000..83fa768 --- /dev/null +++ b/include/linux/netfilter/xt_TEE.h @@ -0,0 +1,8 @@ +#ifndef _XT_TEE_TARGET_H +#define _XT_TEE_TARGET_H + +struct xt_tee_tginfo { + union nf_inet_addr gw; +}; + +#endif /* _XT_TEE_TARGET_H */ diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f09135e..0abfdde 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -309,6 +309,7 @@ int ip_output(struct sk_buff *skb) ip_finish_output, !(IPCB(skb)->flags & IPSKB_REROUTED)); } +EXPORT_SYMBOL_GPL(ip_output); int ip_queue_xmit(struct sk_buff *skb, int ipfragok) { diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index c10a38a..d09be7f 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -176,6 +176,7 @@ int ip6_output(struct sk_buff *skb) ip6_finish_output, !(IP6CB(skb)->flags & IP6SKB_REROUTED)); } +EXPORT_SYMBOL_GPL(ip6_output); /* * xmit an sk_buff (used by TCP) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 8055786..673a6c8 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -502,6 +502,13 @@ config NETFILTER_XT_TARGET_RATEEST To compile it as a module, choose M here. If unsure, say N. +config NETFILTER_XT_TARGET_TEE + tristate '"TEE" - packet cloning to alternate destiantion' + depends on NETFILTER_ADVANCED + ---help--- + This option adds a "TEE" target with which a packet can be cloned and + this clone be rerouted to another nexthop. + config NETFILTER_XT_TARGET_TPROXY tristate '"TPROXY" target support (EXPERIMENTAL)' depends on EXPERIMENTAL diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index cd31afe..14e3a8f 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o +obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o # matches diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c new file mode 100644 index 0000000..96e3785 --- /dev/null +++ b/net/netfilter/xt_TEE.c @@ -0,0 +1,232 @@ +/* + * "TEE" target extension for Xtables + * Copyright © Sebastian ClaÃen, 2007 + * Jan Engelhardt, 2007-2010 + * + * based on ipt_ROUTE.c from Cédric de Launois + * <delaunois@info.ucl.be> + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * version 2 or later, as published by the Free Software Foundation. + */ +#include <linux/ip.h> +#include <linux/module.h> +#include <linux/route.h> +#include <linux/skbuff.h> +#include <net/checksum.h> +#include <net/icmp.h> +#include <net/ip.h> +#include <net/ipv6.h> +#include <net/ip6_route.h> +#include <net/route.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_TEE.h> + +#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) +# define WITH_CONNTRACK 1 +# include <net/netfilter/nf_conntrack.h> +#endif +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) +# define WITH_IPV6 1 +#endif + +static const union nf_inet_addr tee_zero_address; + +static struct net *pick_net(struct sk_buff *skb) +{ + const struct net_device *dev; + const struct dst_entry *dst; + + if (skb->dev != NULL) + return dev_net(skb->dev); + dst = skb_dst(skb); + if (dst != NULL && dst->dev != NULL) + return dev_net(dst->dev); + return &init_net; +} + +static bool +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info) +{ + const struct iphdr *iph = ip_hdr(skb); + struct rtable *rt; + struct flowi fl; + + memset(&fl, 0, sizeof(fl)); + fl.nl_u.ip4_u.daddr = info->gw.ip; + fl.nl_u.ip4_u.tos = RT_TOS(iph->tos); + fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE; + + if (ip_route_output_key(pick_net(skb), &rt, &fl) != 0) { + kfree_skb(skb); + return false; + } + dst_release(skb_dst(skb)); + skb_dst_set(skb, &rt->u.dst); + skb->dev = rt->u.dst.dev; + skb->protocol = htons(ETH_P_IP); + return true; +} + +static unsigned int +tee_tg4(struct sk_buff *skb, const struct xt_target_param *par) +{ + const struct xt_tee_tginfo *info = par->targinfo; + struct iphdr *iph; + + /* + * Copy the skb, and route the copy. Will later return %XT_CONTINUE for + * the original skb, which should continue on its way as if nothing has + * happened. The copy should be independently delivered to the TEE + * --gateway. + */ + skb = skb_copy(skb, GFP_ATOMIC); + if (skb == NULL) + return XT_CONTINUE; + +#ifdef WITH_CONNTRACK + /* Avoid counting cloned packets towards the original connection. */ + nf_conntrack_put(skb->nfct); + skb->nfct = &nf_conntrack_untracked.ct_general; + skb->nfctinfo = IP_CT_NEW; + nf_conntrack_get(skb->nfct); +#endif + /* + * If we are in PREROUTING/INPUT, the checksum must be recalculated + * since the length could have changed as a result of defragmentation. + * + * We also decrease the TTL to mitigate potential TEE loops + * between two hosts. + * + * Set %IP_DF so that the original source is notified of a potentially + * decreased MTU on the clone route. IPv6 does this too. + */ + iph = ip_hdr(skb); + iph->frag_off |= htons(IP_DF); + if (par->hooknum == NF_INET_PRE_ROUTING || + par->hooknum == NF_INET_LOCAL_IN) + --iph->ttl; + ip_send_check(iph); + + /* + * Xtables is not reentrant currently, so a choice has to be made: + * 1. return absolute verdict for the original and let the cloned + * packet travel through the chains + * 2. let the original continue travelling and not pass the clone + * to Xtables. + * #2 is chosen. Normally, we would use ip_local_out for the clone. + * Because iph->check is already correct and we don't pass it to + * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can + * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not + * invoke POSTROUTING on the cloned packet. + */ + IPCB(skb)->flags |= IPSKB_REROUTED; + if (tee_tg_route4(skb, info)) + ip_output(skb); + + return XT_CONTINUE; +} + +#ifdef WITH_IPV6 +static bool +tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info) +{ + const struct ipv6hdr *iph = ipv6_hdr(skb); + struct dst_entry *dst; + struct flowi fl; + + memset(&fl, 0, sizeof(fl)); + fl.nl_u.ip6_u.daddr = info->gw.in6; + fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) | + (iph->flow_lbl[1] << 8) | iph->flow_lbl[2]; + + dst = ip6_route_output(pick_net(skb), NULL, &fl); + if (dst == NULL) { + kfree_skb(skb); + return false; + } + dst_release(skb_dst(skb)); + skb_dst_set(skb, dst); + skb->dev = dst->dev; + skb->protocol = htons(ETH_P_IPV6); + return true; +} + +static unsigned int +tee_tg6(struct sk_buff *skb, const struct xt_target_param *par) +{ + const struct xt_tee_tginfo *info = par->targinfo; + + if ((skb = skb_copy(skb, GFP_ATOMIC)) == NULL) + return XT_CONTINUE; + +#ifdef WITH_CONNTRACK + nf_conntrack_put(skb->nfct); + skb->nfct = &nf_conntrack_untracked.ct_general; + skb->nfctinfo = IP_CT_NEW; + nf_conntrack_get(skb->nfct); +#endif + if (par->hooknum == NF_INET_PRE_ROUTING || + par->hooknum == NF_INET_LOCAL_IN) { + struct ipv6hdr *iph = ipv6_hdr(skb); + --iph->hop_limit; + } + IP6CB(skb)->flags |= IP6SKB_REROUTED; + if (tee_tg_route6(skb, info)) + ip6_output(skb); + + return XT_CONTINUE; +} +#endif /* WITH_IPV6 */ + +static int tee_tg_check(const struct xt_tgchk_param *par) +{ + const struct xt_tee_tginfo *info = par->targinfo; + + /* 0.0.0.0 and :: not allowed */ + return (memcmp(&info->gw, &tee_zero_address, + sizeof(tee_zero_address)) == 0) ? -EINVAL : 0; +} + +static struct xt_target tee_tg_reg[] __read_mostly = { + { + .name = "TEE", + .revision = 0, + .family = NFPROTO_IPV4, + .target = tee_tg4, + .targetsize = sizeof(struct xt_tee_tginfo), + .checkentry = tee_tg_check, + .me = THIS_MODULE, + }, +#ifdef WITH_IPV6 + { + .name = "TEE", + .revision = 0, + .family = NFPROTO_IPV6, + .target = tee_tg6, + .targetsize = sizeof(struct xt_tee_tginfo), + .checkentry = tee_tg_check, + .me = THIS_MODULE, + }, +#endif +}; + +static int __init tee_tg_init(void) +{ + return xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg)); +} + +static void __exit tee_tg_exit(void) +{ + xt_unregister_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg)); +} + +module_init(tee_tg_init); +module_exit(tee_tg_exit); +MODULE_AUTHOR("Sebastian ClaÃen <sebastian.classen@freenet.ag>"); +MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>"); +MODULE_DESCRIPTION("Xtables: Reroute packet copy"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_TEE"); +MODULE_ALIAS("ip6t_TEE"); -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE 2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt @ 2010-04-14 5:52 ` Eric Dumazet 2010-04-14 10:00 ` Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2010-04-14 5:52 UTC (permalink / raw) To: Jan Engelhardt; +Cc: kaber, netfilter-devel Le mercredi 14 avril 2010 à 01:21 +0200, Jan Engelhardt a écrit : > xt_TEE can be used to clone and reroute a packet. This can for > example be used to copy traffic at a router for logging purposes > to another dedicated machine. > > References: http://www.gossamer-threads.com/lists/iptables/devel/68781 > Signed-off-by: Jan Engelhardt <jengelh@medozas.de> > --- Lovely :) > + > +static const union nf_inet_addr tee_zero_address; > + > +static struct net *pick_net(struct sk_buff *skb) > +{ #ifdef CONFIG_NET_NS > + const struct net_device *dev; > + const struct dst_entry *dst; > + > + if (skb->dev != NULL) > + return dev_net(skb->dev); > + dst = skb_dst(skb); > + if (dst != NULL && dst->dev != NULL) > + return dev_net(dst->dev); #endif /* CONFIG_NET_NS */ > + return &init_net; > +} > + > + > +static unsigned int > +tee_tg4(struct sk_buff *skb, const struct xt_target_param *par) > +{ > + const struct xt_tee_tginfo *info = par->targinfo; > + struct iphdr *iph; > + > + /* > + * Copy the skb, and route the copy. Will later return %XT_CONTINUE for > + * the original skb, which should continue on its way as if nothing has > + * happened. The copy should be independently delivered to the TEE > + * --gateway. > + */ > + skb = skb_copy(skb, GFP_ATOMIC); > + if (skb == NULL) > + return XT_CONTINUE; > + > +#ifdef WITH_CONNTRACK > + /* Avoid counting cloned packets towards the original connection. */ > + nf_conntrack_put(skb->nfct); > + skb->nfct = &nf_conntrack_untracked.ct_general; > + skb->nfctinfo = IP_CT_NEW; > + nf_conntrack_get(skb->nfct); This atomic increment on a global variable worries me... Would it be possible to avoid it (and the associated decrement and test if null) I would like to use this TEE facility but with xxx kpps for instance ;) > +#endif -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE 2010-04-14 5:52 ` Eric Dumazet @ 2010-04-14 10:00 ` Jan Engelhardt 2010-04-14 10:56 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-14 10:00 UTC (permalink / raw) To: Eric Dumazet; +Cc: kaber, netfilter-devel On Wednesday 2010-04-14 07:52, Eric Dumazet wrote: >> +#ifdef WITH_CONNTRACK >> + /* Avoid counting cloned packets towards the original connection. */ >> + nf_conntrack_put(skb->nfct); >> + skb->nfct = &nf_conntrack_untracked.ct_general; >> + skb->nfctinfo = IP_CT_NEW; >> + nf_conntrack_get(skb->nfct); > >This atomic increment on a global variable worries me... Would it be >possible to avoid it (and the associated decrement and test if null) > >I would like to use this TEE facility but with xxx kpps for instance ;) Correctness before speed. You're free to send patches on top. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE 2010-04-14 10:00 ` Jan Engelhardt @ 2010-04-14 10:56 ` Patrick McHardy 0 siblings, 0 replies; 22+ messages in thread From: Patrick McHardy @ 2010-04-14 10:56 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Eric Dumazet, netfilter-devel Jan Engelhardt wrote: > On Wednesday 2010-04-14 07:52, Eric Dumazet wrote: >>> +#ifdef WITH_CONNTRACK >>> + /* Avoid counting cloned packets towards the original connection. */ >>> + nf_conntrack_put(skb->nfct); >>> + skb->nfct = &nf_conntrack_untracked.ct_general; >>> + skb->nfctinfo = IP_CT_NEW; >>> + nf_conntrack_get(skb->nfct); >> This atomic increment on a global variable worries me... Would it be >> possible to avoid it (and the associated decrement and test if null) >> >> I would like to use this TEE facility but with xxx kpps for instance ;) > > Correctness before speed. You're free to send patches on top. > Pablo suggest to encode "untracked" in one of the nfctinfo bits a while ago, which would avoid all atomic operations. Anyways, this can be done later. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt 2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt @ 2010-04-13 23:21 ` Jan Engelhardt 2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt ` (2 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Currently, the table traverser stores return addresses in the ruleset itself (struct ip6t_entry->comefrom). This has a well-known drawback: the jumpstack is overwritten on reentry, making it necessary for targets to return absolute verdicts. Also, the ruleset (which might be heavy memory-wise) needs to be replicated for each CPU that can possibly invoke ip6t_do_table. This patch decouples the jumpstack from struct ip6t_entry and instead puts it into xt_table_info. Not being restricted by 'comefrom' anymore, we can set up a stack as needed. By default, there is room allocated for two entries into the traverser. The setting is configurable at runtime through sysfs and will take effect when a table is replaced by a new one. arp_tables is not touched though, because there is just one/two modules and further patches seek to collapse the table traverser anyhow. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- include/linux/netfilter/x_tables.h | 7 +++ net/ipv4/netfilter/arp_tables.c | 6 ++- net/ipv4/netfilter/ip_tables.c | 65 ++++++++++++++++-------------- net/ipv6/netfilter/ip6_tables.c | 56 ++++++++++---------------- net/netfilter/x_tables.c | 77 ++++++++++++++++++++++++++++++++++++ 5 files changed, 145 insertions(+), 66 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 26ced0c..50c8672 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -401,6 +401,13 @@ struct xt_table_info { unsigned int hook_entry[NF_INET_NUMHOOKS]; unsigned int underflow[NF_INET_NUMHOOKS]; + /* + * Number of user chains. Since tables cannot have loops, at most + * @stacksize jumps (number of user chains) can possibly be made. + */ + unsigned int stacksize; + unsigned int *stackptr; + void ***jumpstack; /* ipt_entry tables: one per CPU */ /* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */ void *entries[1]; diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index e8e363d..07a6990 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0, if (ret != 0) break; ++i; + if (strcmp(arpt_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret); if (ret != 0) @@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 18c5b15..70900ec 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb, const struct net_device *out, struct xt_table *table) { -#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom - static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); const struct iphdr *ip; bool hotdrop = false; @@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb, unsigned int verdict = NF_DROP; const char *indev, *outdev; const void *table_base; - struct ipt_entry *e, *back; + struct ipt_entry *e, **jumpstack; + unsigned int *stackptr, origptr, cpu; const struct xt_table_info *private; struct xt_match_param mtpar; struct xt_target_param tgpar; @@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb, IP_NF_ASSERT(table->valid_hooks & (1 << hook)); xt_info_rdlock_bh(); private = table->private; - table_base = private->entries[smp_processor_id()]; + cpu = smp_processor_id(); + table_base = private->entries[cpu]; + jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; + stackptr = &private->stackptr[cpu]; + origptr = *stackptr; e = get_entry(table_base, private->hook_entry[hook]); - /* For return from builtin chain */ - back = get_entry(table_base, private->underflow[hook]); + pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n", + table->name, hook, origptr, + get_entry(table_base, private->underflow[hook])); do { const struct ipt_entry_target *t; const struct xt_entry_match *ematch; IP_NF_ASSERT(e); - IP_NF_ASSERT(back); if (!ip_packet_match(ip, indev, outdev, &e->ip, mtpar.fragoff)) { no_match: @@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb, verdict = (unsigned)(-v) - 1; break; } - e = back; - back = get_entry(table_base, back->comefrom); + if (*stackptr == 0) { + e = get_entry(table_base, + private->underflow[hook]); + pr_devel("Underflow (this is normal) " + "to %p\n", e); + } else { + e = jumpstack[--*stackptr]; + pr_devel("Pulled %p out from pos %u\n", + e, *stackptr); + e = ipt_next_entry(e); + } continue; } if (table_base + v != ipt_next_entry(e) && !(e->ip.flags & IPT_F_GOTO)) { - /* Save old back ptr in next entry */ - struct ipt_entry *next = ipt_next_entry(e); - next->comefrom = (void *)back - table_base; - /* set back pointer to next entry */ - back = next; + if (*stackptr >= private->stacksize) { + verdict = NF_DROP; + break; + } + jumpstack[(*stackptr)++] = e; + pr_devel("Pushed %p into pos %u\n", + e, *stackptr - 1); } e = get_entry(table_base, v); @@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb, tgpar.targinfo = t->data; -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = 0xeeeeeeec; -#endif verdict = t->u.kernel.target->target(skb, &tgpar); -#ifdef CONFIG_NETFILTER_DEBUG - if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) { - printk("Target %s reentered!\n", - t->u.kernel.target->name); - verdict = NF_DROP; - } - tb_comefrom = 0x57acc001; -#endif /* Target might have changed stuff. */ ip = ip_hdr(skb); if (verdict == IPT_CONTINUE) @@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb, break; } while (!hotdrop); xt_info_rdunlock_bh(); - + pr_devel("Exiting %s; resetting sp from %u to %u\n", + __func__, *stackptr, origptr); + *stackptr = origptr; #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; #else @@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb, return NF_DROP; else return verdict; #endif - -#undef tb_comefrom } /* Figures out from what hook each rule can be called: returns 0 if @@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0, if (ret != 0) return ret; ++i; + if (strcmp(ipt_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } if (i != repl->num_entries) { @@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index f2b815e..2a2770b 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb, const struct net_device *out, struct xt_table *table) { -#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom - static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); bool hotdrop = false; /* Initializing verdict to NF_DROP keeps gcc happy. */ unsigned int verdict = NF_DROP; const char *indev, *outdev; const void *table_base; - struct ip6t_entry *e, *back; + struct ip6t_entry *e, **jumpstack; + unsigned int *stackptr, origptr, cpu; const struct xt_table_info *private; struct xt_match_param mtpar; struct xt_target_param tgpar; @@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb, xt_info_rdlock_bh(); private = table->private; - table_base = private->entries[smp_processor_id()]; + cpu = smp_processor_id(); + table_base = private->entries[cpu]; + jumpstack = (struct ip6t_entry **)private->jumpstack[cpu]; + stackptr = &private->stackptr[cpu]; + origptr = *stackptr; e = get_entry(table_base, private->hook_entry[hook]); - /* For return from builtin chain */ - back = get_entry(table_base, private->underflow[hook]); - do { const struct ip6t_entry_target *t; const struct xt_entry_match *ematch; IP_NF_ASSERT(e); - IP_NF_ASSERT(back); if (!ip6_packet_match(skb, indev, outdev, &e->ipv6, &mtpar.thoff, &mtpar.fragoff, &hotdrop)) { no_match: @@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb, verdict = (unsigned)(-v) - 1; break; } - e = back; - back = get_entry(table_base, back->comefrom); + if (*stackptr == 0) + e = get_entry(table_base, + private->underflow[hook]); + else + e = ip6t_next_entry(jumpstack[--*stackptr]); continue; } if (table_base + v != ip6t_next_entry(e) && !(e->ipv6.flags & IP6T_F_GOTO)) { - /* Save old back ptr in next entry */ - struct ip6t_entry *next = ip6t_next_entry(e); - next->comefrom = (void *)back - table_base; - /* set back pointer to next entry */ - back = next; + if (*stackptr >= private->stacksize) { + verdict = NF_DROP; + break; + } + jumpstack[(*stackptr)++] = e; } e = get_entry(table_base, v); @@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb, tgpar.target = t->u.kernel.target; tgpar.targinfo = t->data; -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = 0xeeeeeeec; -#endif verdict = t->u.kernel.target->target(skb, &tgpar); - -#ifdef CONFIG_NETFILTER_DEBUG - if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) { - printk("Target %s reentered!\n", - t->u.kernel.target->name); - verdict = NF_DROP; - } - tb_comefrom = 0x57acc001; -#endif if (verdict == IP6T_CONTINUE) e = ip6t_next_entry(e); else @@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb, break; } while (!hotdrop); -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = NETFILTER_LINK_POISON; -#endif xt_info_rdunlock_bh(); + *stackptr = origptr; #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; @@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb, return NF_DROP; else return verdict; #endif - -#undef tb_comefrom } /* Figures out from what hook each rule can be called: returns 0 if @@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0, if (ret != 0) return ret; ++i; + if (strcmp(ip6t_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } if (i != repl->num_entries) { @@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8e23d8f..edde5c6 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = { [NFPROTO_IPV6] = "ip6", }; +/* Allow this many total (re)entries. */ +static const unsigned int xt_jumpstack_multiplier = 2; + /* Registration hooks for targets. */ int xt_register_target(struct xt_target *target) @@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info) else vfree(info->entries[cpu]); } + + if (info->jumpstack != NULL) { + if (sizeof(void *) * info->stacksize > PAGE_SIZE) { + for_each_possible_cpu(cpu) + vfree(info->jumpstack[cpu]); + } else { + for_each_possible_cpu(cpu) + kfree(info->jumpstack[cpu]); + } + } + + if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE) + vfree(info->jumpstack); + else + kfree(info->jumpstack); + if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE) + vfree(info->stackptr); + else + kfree(info->stackptr); + kfree(info); } EXPORT_SYMBOL(xt_free_table_info); @@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock); DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks); EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks); +static int xt_jumpstack_alloc(struct xt_table_info *i) +{ + unsigned int size; + int cpu; + + size = sizeof(unsigned int) * nr_cpu_ids; + if (size > PAGE_SIZE) + i->stackptr = vmalloc(size); + else + i->stackptr = kmalloc(size, GFP_KERNEL); + if (i->stackptr == NULL) + return -ENOMEM; + memset(i->stackptr, 0, size); + + size = sizeof(void **) * nr_cpu_ids; + if (size > PAGE_SIZE) + i->jumpstack = vmalloc(size); + else + i->jumpstack = kmalloc(size, GFP_KERNEL); + if (i->jumpstack == NULL) + return -ENOMEM; + memset(i->jumpstack, 0, size); + + i->stacksize *= xt_jumpstack_multiplier; + size = sizeof(void *) * i->stacksize; + for_each_possible_cpu(cpu) { + if (size > PAGE_SIZE) + i->jumpstack[cpu] = vmalloc_node(size, + cpu_to_node(cpu)); + else + i->jumpstack[cpu] = kmalloc_node(size, + GFP_KERNEL, cpu_to_node(cpu)); + if (i->jumpstack[cpu] == NULL) + /* + * Freeing will be done later on by the callers. The + * chain is: xt_replace_table -> __do_replace -> + * do_replace -> xt_free_table_info. + */ + return -ENOMEM; + } + + return 0; +} struct xt_table_info * xt_replace_table(struct xt_table *table, @@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table, int *error) { struct xt_table_info *private; + int ret; /* Do the substitution. */ local_bh_disable(); @@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table, return NULL; } + ret = xt_jumpstack_alloc(newinfo); + if (ret < 0) { + *error = ret; + return NULL; + } + table->private = newinfo; newinfo->initial_entries = private->initial_entries; @@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net, struct xt_table_info *private; struct xt_table *t, *table; + ret = xt_jumpstack_alloc(newinfo); + if (ret < 0) + return ERR_PTR(ret); + /* Don't add one object to multiple lists. */ table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL); if (!table) { -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt 2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt 2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt @ 2010-04-13 23:21 ` Jan Engelhardt 2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt 2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy 4 siblings, 0 replies; 22+ messages in thread From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Since Xtables is now reentrant/nestable, the cloned packet can also go through Xtables and be subject to rules itself. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- net/ipv4/ip_output.c | 1 - net/ipv6/ip6_output.c | 1 - net/netfilter/xt_TEE.c | 38 +++++++++++++++++--------------------- 3 files changed, 17 insertions(+), 23 deletions(-) diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 0abfdde..f09135e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -309,7 +309,6 @@ int ip_output(struct sk_buff *skb) ip_finish_output, !(IPCB(skb)->flags & IPSKB_REROUTED)); } -EXPORT_SYMBOL_GPL(ip_output); int ip_queue_xmit(struct sk_buff *skb, int ipfragok) { diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d09be7f..c10a38a 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -176,7 +176,6 @@ int ip6_output(struct sk_buff *skb) ip6_finish_output, !(IP6CB(skb)->flags & IP6SKB_REROUTED)); } -EXPORT_SYMBOL_GPL(ip6_output); /* * xmit an sk_buff (used by TCP) diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c index 96e3785..f193133 100644 --- a/net/netfilter/xt_TEE.c +++ b/net/netfilter/xt_TEE.c @@ -12,6 +12,7 @@ */ #include <linux/ip.h> #include <linux/module.h> +#include <linux/percpu.h> #include <linux/route.h> #include <linux/skbuff.h> #include <net/checksum.h> @@ -32,6 +33,7 @@ #endif static const union nf_inet_addr tee_zero_address; +static DEFINE_PER_CPU(bool, tee_active); static struct net *pick_net(struct sk_buff *skb) { @@ -75,13 +77,15 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par) const struct xt_tee_tginfo *info = par->targinfo; struct iphdr *iph; + if (percpu_read(tee_active)) + return XT_CONTINUE; /* * Copy the skb, and route the copy. Will later return %XT_CONTINUE for * the original skb, which should continue on its way as if nothing has * happened. The copy should be independently delivered to the TEE * --gateway. */ - skb = skb_copy(skb, GFP_ATOMIC); + skb = pskb_copy(skb, GFP_ATOMIC); if (skb == NULL) return XT_CONTINUE; @@ -109,22 +113,11 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par) --iph->ttl; ip_send_check(iph); - /* - * Xtables is not reentrant currently, so a choice has to be made: - * 1. return absolute verdict for the original and let the cloned - * packet travel through the chains - * 2. let the original continue travelling and not pass the clone - * to Xtables. - * #2 is chosen. Normally, we would use ip_local_out for the clone. - * Because iph->check is already correct and we don't pass it to - * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can - * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not - * invoke POSTROUTING on the cloned packet. - */ - IPCB(skb)->flags |= IPSKB_REROUTED; - if (tee_tg_route4(skb, info)) - ip_output(skb); - + if (tee_tg_route4(skb, info)) { + percpu_write(tee_active, true); + ip_local_out(skb); + percpu_write(tee_active, false); + } return XT_CONTINUE; } @@ -158,6 +151,8 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par) { const struct xt_tee_tginfo *info = par->targinfo; + if (percpu_read(tee_active)) + return XT_CONTINUE; if ((skb = skb_copy(skb, GFP_ATOMIC)) == NULL) return XT_CONTINUE; @@ -172,10 +167,11 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par) struct ipv6hdr *iph = ipv6_hdr(skb); --iph->hop_limit; } - IP6CB(skb)->flags |= IP6SKB_REROUTED; - if (tee_tg_route6(skb, info)) - ip6_output(skb); - + if (tee_tg_route6(skb, info)) { + percpu_write(tee_active, true); + ip6_local_out(skb); + percpu_write(tee_active, false); + } return XT_CONTINUE; } #endif /* WITH_IPV6 */ -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt ` (2 preceding siblings ...) 2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt @ 2010-04-13 23:21 ` Jan Engelhardt 2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy 4 siblings, 0 replies; 22+ messages in thread From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- net/ipv4/netfilter/ip_tables.c | 2 -- net/ipv4/netfilter/ipt_REJECT.c | 3 --- net/ipv6/netfilter/ip6_tables.c | 2 -- net/ipv6/netfilter/ip6t_REJECT.c | 3 --- 4 files changed, 0 insertions(+), 10 deletions(-) diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 70900ec..bb5e0d9 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -434,8 +434,6 @@ ipt_do_table(struct sk_buff *skb, continue; } - /* Targets which reenter must return - abs. verdicts */ tgpar.target = t->u.kernel.target; tgpar.targinfo = t->data; diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c index b026014..038fa0b 100644 --- a/net/ipv4/netfilter/ipt_REJECT.c +++ b/net/ipv4/netfilter/ipt_REJECT.c @@ -139,9 +139,6 @@ reject_tg(struct sk_buff *skb, const struct xt_target_param *par) { const struct ipt_reject_info *reject = par->targinfo; - /* WARNING: This code causes reentry within iptables. - This means that the iptables jump stack is now crap. We - must return an absolute verdict. --RR */ switch (reject->with) { case IPT_ICMP_NET_UNREACHABLE: send_unreach(skb, ICMP_NET_UNREACH); diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 2a2770b..7afa117 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -451,8 +451,6 @@ ip6t_do_table(struct sk_buff *skb, continue; } - /* Targets which reenter must return - abs. verdicts */ tgpar.target = t->u.kernel.target; tgpar.targinfo = t->data; diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c index 55b9b2d..dad9762 100644 --- a/net/ipv6/netfilter/ip6t_REJECT.c +++ b/net/ipv6/netfilter/ip6t_REJECT.c @@ -179,9 +179,6 @@ reject_tg6(struct sk_buff *skb, const struct xt_target_param *par) struct net *net = dev_net((par->in != NULL) ? par->in : par->out); pr_debug("%s: medium point\n", __func__); - /* WARNING: This code causes reentry within ip6tables. - This means that the ip6tables jump stack is now crap. We - must return an absolute verdict. --RR */ switch (reject->with) { case IP6T_ICMP6_NO_ROUTE: send_unreach(net, skb, ICMPV6_NOROUTE, par->hooknum); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt ` (3 preceding siblings ...) 2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt @ 2010-04-14 10:57 ` Patrick McHardy 2010-04-14 11:15 ` Jan Engelhardt 4 siblings, 1 reply; 22+ messages in thread From: Patrick McHardy @ 2010-04-14 10:57 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > in this round: > - use IP6SKB_REROUTED in v6 code > - pick_net function: use skb->dev or skb->dst->dev when available > (or completely fall back to init_net in case there's something > going on) So what about oif routing which I asked for two times? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy @ 2010-04-14 11:15 ` Jan Engelhardt 2010-04-14 11:24 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-14 11:15 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel On Wednesday 2010-04-14 12:57, Patrick McHardy wrote: >Jan Engelhardt wrote: >> in this round: >> - use IP6SKB_REROUTED in v6 code >> - pick_net function: use skb->dev or skb->dst->dev when available >> (or completely fall back to init_net in case there's something >> going on) > >So what about oif routing which I asked for two times? Guess it must have fallen off somewhere between the resends. We can still add it as a patch on top. >I guess you'd usually have a host for logging or IDS somewhere on a >private network and TEE packets there. So specifying oif and gateway >seems most useful to me. The oif is already determined by the route to the gateway(logging host). I'd also fear that people abuse TEE as a ROUTE replacement when they see an --oif. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-14 11:15 ` Jan Engelhardt @ 2010-04-14 11:24 ` Patrick McHardy 2010-04-14 11:32 ` Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Patrick McHardy @ 2010-04-14 11:24 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > On Wednesday 2010-04-14 12:57, Patrick McHardy wrote: > >> Jan Engelhardt wrote: >>> in this round: >>> - use IP6SKB_REROUTED in v6 code >>> - pick_net function: use skb->dev or skb->dst->dev when available >>> (or completely fall back to init_net in case there's something >>> going on) >> So what about oif routing which I asked for two times? > > Guess it must have fallen off somewhere between the resends. We can > still add it as a patch on top. Please add it before I apply it. Should be a fairly trivial change. >> I guess you'd usually have a host for logging or IDS somewhere on a >> private network and TEE packets there. So specifying oif and gateway >> seems most useful to me. > > The oif is already determined by the route to the gateway(logging > host). I'd also fear that people abuse TEE as a ROUTE replacement > when they see an --oif. That's something different. The oif forces use of a specific output device, independant of the routing tables. F.i.: # ip l l dummy0 4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff # ip a s dummy0 4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN li# ip a s dummy0 4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff inet6 fe80::947f:5eff:fed2:d6c9/64 scope link valid_lft forever preferred_lft forever # ip r | grep dummy0 # # ping 10.0.0.1 -I dummy0 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 1, length 64 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 2, length 64 This is quite useful since your logging host doesn't have to be reachable through normal routing. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-14 11:24 ` Patrick McHardy @ 2010-04-14 11:32 ` Jan Engelhardt 2010-04-15 11:41 ` Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-14 11:32 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel On Wednesday 2010-04-14 13:24, Patrick McHardy wrote: >>> So what about oif routing which I asked for two times? >> >> Guess it must have fallen off somewhere between the resends. We can >> still add it as a patch on top. > >Please add it before I apply it. Should be a fairly trivial change. > >>> I guess you'd usually have a host for logging or IDS somewhere on a >>> private network and TEE packets there. So specifying oif and gateway >>> seems most useful to me. >> >> The oif is already determined by the route to the gateway(logging >> host). I'd also fear that people abuse TEE as a ROUTE replacement >> when they see an --oif. > >That's something different. The oif forces use of a specific output >device, independant of the routing tables. F.i.: You should be able to use a specific output device by use of a routing table, and selecting that table with fwmark. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-14 11:32 ` Jan Engelhardt @ 2010-04-15 11:41 ` Jan Engelhardt 2010-04-15 11:56 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-15 11:41 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel On Wednesday 2010-04-14 13:32, Jan Engelhardt wrote: >On Wednesday 2010-04-14 13:24, Patrick McHardy wrote: >>>> So what about oif routing which I asked for two times? >>> >>> Guess it must have fallen off somewhere between the resends. We can >>> still add it as a patch on top. >> >>Please add it before I apply it. Should be a fairly trivial change. >> >>>> I guess you'd usually have a host for logging or IDS somewhere on a >>>> private network and TEE packets there. So specifying oif and gateway >>>> seems most useful to me. >>> >>> The oif is already determined by the route to the gateway(logging >>> host). I'd also fear that people abuse TEE as a ROUTE replacement >>> when they see an --oif. >> >>That's something different. The oif forces use of a specific output >>device, independant of the routing tables. F.i.: > >You should be able to use a specific output device by use of a routing >table, and selecting that table with fwmark. >This is quite useful since your logging host doesn't have to be >reachable through normal routing. >4: dummy0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN > link/ether 96:7f:5e:d2:d6:c9 brd ff:ff:ff:ff:ff:ff > inet6 fe80::947f:5eff:fed2:d6c9/64 scope link > valid_lft forever preferred_lft forever > ># ping 10.0.0.1 -I dummy0 >IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) > 192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 1, length 64 >IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 84) > 192.168.0.100 > 10.0.0.1: ICMP echo request, id 25874, seq 2, length 64 I don't see why one would want the log server to be unreachable. What _does_ look reasonable however is an encapsulation device for transporting teed packets across multiple real hops: 14: gre1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN link/gre 5.6.7.8 peer 1.2.3.4 and this case can be solved with standard(/fwmarking) policy routing ('default via gre1'). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: nf-next: TEE only 2010-04-15 11:41 ` Jan Engelhardt @ 2010-04-15 11:56 ` Patrick McHardy 0 siblings, 0 replies; 22+ messages in thread From: Patrick McHardy @ 2010-04-15 11:56 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > On Wednesday 2010-04-14 13:32, Jan Engelhardt wrote: > I don't see why one would want the log server to be unreachable. I don't want that, in fact I don't have use for TEE personally. > What _does_ look reasonable however is an encapsulation device for transporting > teed packets across multiple real hops: > > 14: gre1: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1476 qdisc noqueue state UNKNOWN > link/gre 5.6.7.8 peer 1.2.3.4 > > and this case can be solved with standard(/fwmarking) policy routing > ('default via gre1'). Sure. And it can be solved using standard routing by specifying the oif. I'm not interested in debatting this endlessly, this is a standard routing key and there's no reason at all to not support it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* nf-next: TEE 20100215 @ 2010-04-15 23:25 Jan Engelhardt 2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Hi, xt_TEE now has routing by oif. Please review. Does it look right to you? The following changes since commit 9c6eb28aca52d562f3ffbaebaa56385df9972a43: Jan Engelhardt (1): netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation are available in the git repository at: git://dev.medozas.de/linux master Jan Engelhardt (4): netfilter: xtables: inclusion of xt_TEE netfilter: xtables2: make ip_tables reentrant netfilter: xt_TEE: have cloned packet travel through Xtables too netfilter: xtables: remove old comments about reentrancy include/linux/netfilter/Kbuild | 1 + include/linux/netfilter/x_tables.h | 7 + include/linux/netfilter/xt_TEE.h | 9 ++ net/ipv4/netfilter/arp_tables.c | 6 +- net/ipv4/netfilter/ip_tables.c | 67 +++++----- net/ipv4/netfilter/ipt_REJECT.c | 3 - net/ipv6/netfilter/ip6_tables.c | 58 +++----- net/ipv6/netfilter/ip6t_REJECT.c | 3 - net/netfilter/Kconfig | 7 + net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 77 +++++++++++ net/netfilter/xt_TEE.c | 252 ++++++++++++++++++++++++++++++++++++ 12 files changed, 415 insertions(+), 76 deletions(-) create mode 100644 include/linux/netfilter/xt_TEE.h create mode 100644 net/netfilter/xt_TEE.c ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt @ 2010-04-15 23:25 ` Jan Engelhardt 2010-04-19 12:22 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw) To: kaber; +Cc: netfilter-devel Currently, the table traverser stores return addresses in the ruleset itself (struct ip6t_entry->comefrom). This has a well-known drawback: the jumpstack is overwritten on reentry, making it necessary for targets to return absolute verdicts. Also, the ruleset (which might be heavy memory-wise) needs to be replicated for each CPU that can possibly invoke ip6t_do_table. This patch decouples the jumpstack from struct ip6t_entry and instead puts it into xt_table_info. Not being restricted by 'comefrom' anymore, we can set up a stack as needed. By default, there is room allocated for two entries into the traverser. The setting is configurable at runtime through sysfs and will take effect when a table is replaced by a new one. arp_tables is not touched though, because there is just one/two modules and further patches seek to collapse the table traverser anyhow. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- include/linux/netfilter/x_tables.h | 7 +++ net/ipv4/netfilter/arp_tables.c | 6 ++- net/ipv4/netfilter/ip_tables.c | 65 ++++++++++++++++-------------- net/ipv6/netfilter/ip6_tables.c | 56 ++++++++++---------------- net/netfilter/x_tables.c | 77 ++++++++++++++++++++++++++++++++++++ 5 files changed, 145 insertions(+), 66 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 26ced0c..50c8672 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -401,6 +401,13 @@ struct xt_table_info { unsigned int hook_entry[NF_INET_NUMHOOKS]; unsigned int underflow[NF_INET_NUMHOOKS]; + /* + * Number of user chains. Since tables cannot have loops, at most + * @stacksize jumps (number of user chains) can possibly be made. + */ + unsigned int stacksize; + unsigned int *stackptr; + void ***jumpstack; /* ipt_entry tables: one per CPU */ /* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */ void *entries[1]; diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index e8e363d..07a6990 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0, if (ret != 0) break; ++i; + if (strcmp(arpt_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret); if (ret != 0) @@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 18c5b15..70900ec 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb, const struct net_device *out, struct xt_table *table) { -#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom - static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); const struct iphdr *ip; bool hotdrop = false; @@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb, unsigned int verdict = NF_DROP; const char *indev, *outdev; const void *table_base; - struct ipt_entry *e, *back; + struct ipt_entry *e, **jumpstack; + unsigned int *stackptr, origptr, cpu; const struct xt_table_info *private; struct xt_match_param mtpar; struct xt_target_param tgpar; @@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb, IP_NF_ASSERT(table->valid_hooks & (1 << hook)); xt_info_rdlock_bh(); private = table->private; - table_base = private->entries[smp_processor_id()]; + cpu = smp_processor_id(); + table_base = private->entries[cpu]; + jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; + stackptr = &private->stackptr[cpu]; + origptr = *stackptr; e = get_entry(table_base, private->hook_entry[hook]); - /* For return from builtin chain */ - back = get_entry(table_base, private->underflow[hook]); + pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n", + table->name, hook, origptr, + get_entry(table_base, private->underflow[hook])); do { const struct ipt_entry_target *t; const struct xt_entry_match *ematch; IP_NF_ASSERT(e); - IP_NF_ASSERT(back); if (!ip_packet_match(ip, indev, outdev, &e->ip, mtpar.fragoff)) { no_match: @@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb, verdict = (unsigned)(-v) - 1; break; } - e = back; - back = get_entry(table_base, back->comefrom); + if (*stackptr == 0) { + e = get_entry(table_base, + private->underflow[hook]); + pr_devel("Underflow (this is normal) " + "to %p\n", e); + } else { + e = jumpstack[--*stackptr]; + pr_devel("Pulled %p out from pos %u\n", + e, *stackptr); + e = ipt_next_entry(e); + } continue; } if (table_base + v != ipt_next_entry(e) && !(e->ip.flags & IPT_F_GOTO)) { - /* Save old back ptr in next entry */ - struct ipt_entry *next = ipt_next_entry(e); - next->comefrom = (void *)back - table_base; - /* set back pointer to next entry */ - back = next; + if (*stackptr >= private->stacksize) { + verdict = NF_DROP; + break; + } + jumpstack[(*stackptr)++] = e; + pr_devel("Pushed %p into pos %u\n", + e, *stackptr - 1); } e = get_entry(table_base, v); @@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb, tgpar.targinfo = t->data; -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = 0xeeeeeeec; -#endif verdict = t->u.kernel.target->target(skb, &tgpar); -#ifdef CONFIG_NETFILTER_DEBUG - if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) { - printk("Target %s reentered!\n", - t->u.kernel.target->name); - verdict = NF_DROP; - } - tb_comefrom = 0x57acc001; -#endif /* Target might have changed stuff. */ ip = ip_hdr(skb); if (verdict == IPT_CONTINUE) @@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb, break; } while (!hotdrop); xt_info_rdunlock_bh(); - + pr_devel("Exiting %s; resetting sp from %u to %u\n", + __func__, *stackptr, origptr); + *stackptr = origptr; #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; #else @@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb, return NF_DROP; else return verdict; #endif - -#undef tb_comefrom } /* Figures out from what hook each rule can be called: returns 0 if @@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0, if (ret != 0) return ret; ++i; + if (strcmp(ipt_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } if (i != repl->num_entries) { @@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index f2b815e..2a2770b 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb, const struct net_device *out, struct xt_table *table) { -#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom - static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long)))); bool hotdrop = false; /* Initializing verdict to NF_DROP keeps gcc happy. */ unsigned int verdict = NF_DROP; const char *indev, *outdev; const void *table_base; - struct ip6t_entry *e, *back; + struct ip6t_entry *e, **jumpstack; + unsigned int *stackptr, origptr, cpu; const struct xt_table_info *private; struct xt_match_param mtpar; struct xt_target_param tgpar; @@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb, xt_info_rdlock_bh(); private = table->private; - table_base = private->entries[smp_processor_id()]; + cpu = smp_processor_id(); + table_base = private->entries[cpu]; + jumpstack = (struct ip6t_entry **)private->jumpstack[cpu]; + stackptr = &private->stackptr[cpu]; + origptr = *stackptr; e = get_entry(table_base, private->hook_entry[hook]); - /* For return from builtin chain */ - back = get_entry(table_base, private->underflow[hook]); - do { const struct ip6t_entry_target *t; const struct xt_entry_match *ematch; IP_NF_ASSERT(e); - IP_NF_ASSERT(back); if (!ip6_packet_match(skb, indev, outdev, &e->ipv6, &mtpar.thoff, &mtpar.fragoff, &hotdrop)) { no_match: @@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb, verdict = (unsigned)(-v) - 1; break; } - e = back; - back = get_entry(table_base, back->comefrom); + if (*stackptr == 0) + e = get_entry(table_base, + private->underflow[hook]); + else + e = ip6t_next_entry(jumpstack[--*stackptr]); continue; } if (table_base + v != ip6t_next_entry(e) && !(e->ipv6.flags & IP6T_F_GOTO)) { - /* Save old back ptr in next entry */ - struct ip6t_entry *next = ip6t_next_entry(e); - next->comefrom = (void *)back - table_base; - /* set back pointer to next entry */ - back = next; + if (*stackptr >= private->stacksize) { + verdict = NF_DROP; + break; + } + jumpstack[(*stackptr)++] = e; } e = get_entry(table_base, v); @@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb, tgpar.target = t->u.kernel.target; tgpar.targinfo = t->data; -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = 0xeeeeeeec; -#endif verdict = t->u.kernel.target->target(skb, &tgpar); - -#ifdef CONFIG_NETFILTER_DEBUG - if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) { - printk("Target %s reentered!\n", - t->u.kernel.target->name); - verdict = NF_DROP; - } - tb_comefrom = 0x57acc001; -#endif if (verdict == IP6T_CONTINUE) e = ip6t_next_entry(e); else @@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb, break; } while (!hotdrop); -#ifdef CONFIG_NETFILTER_DEBUG - tb_comefrom = NETFILTER_LINK_POISON; -#endif xt_info_rdunlock_bh(); + *stackptr = origptr; #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; @@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb, return NF_DROP; else return verdict; #endif - -#undef tb_comefrom } /* Figures out from what hook each rule can be called: returns 0 if @@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0, if (ret != 0) return ret; ++i; + if (strcmp(ip6t_get_target(iter)->u.user.name, + XT_ERROR_TARGET) == 0) + ++newinfo->stacksize; } if (i != repl->num_entries) { @@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net, { int ret; struct xt_table_info *newinfo; - struct xt_table_info bootstrap - = { 0, 0, 0, { 0 }, { 0 }, { } }; + struct xt_table_info bootstrap = {0}; void *loc_cpu_entry; struct xt_table *new_table; diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8e23d8f..edde5c6 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = { [NFPROTO_IPV6] = "ip6", }; +/* Allow this many total (re)entries. */ +static const unsigned int xt_jumpstack_multiplier = 2; + /* Registration hooks for targets. */ int xt_register_target(struct xt_target *target) @@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info) else vfree(info->entries[cpu]); } + + if (info->jumpstack != NULL) { + if (sizeof(void *) * info->stacksize > PAGE_SIZE) { + for_each_possible_cpu(cpu) + vfree(info->jumpstack[cpu]); + } else { + for_each_possible_cpu(cpu) + kfree(info->jumpstack[cpu]); + } + } + + if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE) + vfree(info->jumpstack); + else + kfree(info->jumpstack); + if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE) + vfree(info->stackptr); + else + kfree(info->stackptr); + kfree(info); } EXPORT_SYMBOL(xt_free_table_info); @@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock); DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks); EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks); +static int xt_jumpstack_alloc(struct xt_table_info *i) +{ + unsigned int size; + int cpu; + + size = sizeof(unsigned int) * nr_cpu_ids; + if (size > PAGE_SIZE) + i->stackptr = vmalloc(size); + else + i->stackptr = kmalloc(size, GFP_KERNEL); + if (i->stackptr == NULL) + return -ENOMEM; + memset(i->stackptr, 0, size); + + size = sizeof(void **) * nr_cpu_ids; + if (size > PAGE_SIZE) + i->jumpstack = vmalloc(size); + else + i->jumpstack = kmalloc(size, GFP_KERNEL); + if (i->jumpstack == NULL) + return -ENOMEM; + memset(i->jumpstack, 0, size); + + i->stacksize *= xt_jumpstack_multiplier; + size = sizeof(void *) * i->stacksize; + for_each_possible_cpu(cpu) { + if (size > PAGE_SIZE) + i->jumpstack[cpu] = vmalloc_node(size, + cpu_to_node(cpu)); + else + i->jumpstack[cpu] = kmalloc_node(size, + GFP_KERNEL, cpu_to_node(cpu)); + if (i->jumpstack[cpu] == NULL) + /* + * Freeing will be done later on by the callers. The + * chain is: xt_replace_table -> __do_replace -> + * do_replace -> xt_free_table_info. + */ + return -ENOMEM; + } + + return 0; +} struct xt_table_info * xt_replace_table(struct xt_table *table, @@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table, int *error) { struct xt_table_info *private; + int ret; /* Do the substitution. */ local_bh_disable(); @@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table, return NULL; } + ret = xt_jumpstack_alloc(newinfo); + if (ret < 0) { + *error = ret; + return NULL; + } + table->private = newinfo; newinfo->initial_entries = private->initial_entries; @@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net, struct xt_table_info *private; struct xt_table *t, *table; + ret = xt_jumpstack_alloc(newinfo); + if (ret < 0) + return ERR_PTR(ret); + /* Don't add one object to multiple lists. */ table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL); if (!table) { -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt @ 2010-04-19 12:22 ` Patrick McHardy 2010-04-19 12:54 ` Jan Engelhardt 0 siblings, 1 reply; 22+ messages in thread From: Patrick McHardy @ 2010-04-19 12:22 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > Currently, the table traverser stores return addresses in the ruleset > itself (struct ip6t_entry->comefrom). This has a well-known drawback: > the jumpstack is overwritten on reentry, making it necessary for > targets to return absolute verdicts. Also, the ruleset (which might > be heavy memory-wise) needs to be replicated for each CPU that can > possibly invoke ip6t_do_table. > > This patch decouples the jumpstack from struct ip6t_entry and instead > puts it into xt_table_info. Not being restricted by 'comefrom' > anymore, we can set up a stack as needed. By default, there is room > allocated for two entries into the traverser. The setting is > configurable at runtime through sysfs and will take effect when a > table is replaced by a new one. The changelog is not up to date anymore, but ... > > diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h > index 26ced0c..50c8672 100644 > --- a/include/linux/netfilter/x_tables.h > +++ b/include/linux/netfilter/x_tables.h > @@ -401,6 +401,13 @@ struct xt_table_info { > unsigned int hook_entry[NF_INET_NUMHOOKS]; > unsigned int underflow[NF_INET_NUMHOOKS]; > > + /* > + * Number of user chains. Since tables cannot have loops, at most > + * @stacksize jumps (number of user chains) can possibly be made. > + */ > + unsigned int stacksize; > + unsigned int *stackptr; > + void ***jumpstack; ... > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c > index 8e23d8f..edde5c6 100644 > --- a/net/netfilter/x_tables.c > +++ b/net/netfilter/x_tables.c > @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = { > [NFPROTO_IPV6] = "ip6", > }; > > +/* Allow this many total (re)entries. */ > +static const unsigned int xt_jumpstack_multiplier = 2; > + Why aren't you using a define instead of saving the stack size in the table info? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-19 12:22 ` Patrick McHardy @ 2010-04-19 12:54 ` Jan Engelhardt 2010-04-19 14:06 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-19 12:54 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel On Monday 2010-04-19 14:22, Patrick McHardy wrote: >Jan Engelhardt wrote: >> This patch decouples the jumpstack from struct ip6t_entry and instead >> puts it into xt_table_info. Not being restricted by 'comefrom' >> anymore, we can set up a stack as needed. By default, there is room >> allocated for two entries into the traverser. The setting is >> configurable at runtime through sysfs and will take effect when a >> table is replaced by a new one. > >The changelog is not up to date anymore, but ... Oops ;-) >> --- a/include/linux/netfilter/x_tables.h >> +++ b/include/linux/netfilter/x_tables.h >> @@ -401,6 +401,13 @@ struct xt_table_info { >> unsigned int hook_entry[NF_INET_NUMHOOKS]; >> unsigned int underflow[NF_INET_NUMHOOKS]; >> >> + /* >> + * Number of user chains. Since tables cannot have loops, at most >> + * @stacksize jumps (number of user chains) can possibly be made. >> + */ >> + unsigned int stacksize; >> + unsigned int *stackptr; >> + void ***jumpstack; >... >> --- a/net/netfilter/x_tables.c >> +++ b/net/netfilter/x_tables.c >> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = { >> [NFPROTO_IPV6] = "ip6", >> }; >> >> +/* Allow this many total (re)entries. */ >> +static const unsigned int xt_jumpstack_multiplier = 2; >> + > >Why aren't you using a define instead of saving the stack size >in the table info? I don't see how a define does any good here. Since you were quoting the multiplier line, I guess you could be confusing the multiplier with stored stacksize. FTR, the definition is: table->stacksize := number_of_user_chains(#UC) * multiplier; Since #UC is variable, so is stacksize, and so stacksize cannot be replaced by a constant. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-19 12:54 ` Jan Engelhardt @ 2010-04-19 14:06 ` Patrick McHardy 2010-04-20 13:18 ` Patrick McHardy 0 siblings, 1 reply; 22+ messages in thread From: Patrick McHardy @ 2010-04-19 14:06 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > On Monday 2010-04-19 14:22, Patrick McHardy wrote: >>> --- a/include/linux/netfilter/x_tables.h >>> +++ b/include/linux/netfilter/x_tables.h >>> @@ -401,6 +401,13 @@ struct xt_table_info { >>> unsigned int hook_entry[NF_INET_NUMHOOKS]; >>> unsigned int underflow[NF_INET_NUMHOOKS]; >>> >>> + /* >>> + * Number of user chains. Since tables cannot have loops, at most >>> + * @stacksize jumps (number of user chains) can possibly be made. >>> + */ >>> + unsigned int stacksize; >>> + unsigned int *stackptr; >>> + void ***jumpstack; >> ... >>> --- a/net/netfilter/x_tables.c >>> +++ b/net/netfilter/x_tables.c >>> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = { >>> [NFPROTO_IPV6] = "ip6", >>> }; >>> >>> +/* Allow this many total (re)entries. */ >>> +static const unsigned int xt_jumpstack_multiplier = 2; >>> + >> Why aren't you using a define instead of saving the stack size >> in the table info? > > I don't see how a define does any good here. Since you were quoting > the multiplier line, I guess you could be confusing the multiplier > with stored stacksize. FTR, the definition is: > > table->stacksize := number_of_user_chains(#UC) * multiplier; > > Since #UC is variable, so is stacksize, and so stacksize cannot > be replaced by a constant. Right, thanks for the explanation. Applied. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-19 14:06 ` Patrick McHardy @ 2010-04-20 13:18 ` Patrick McHardy 2010-04-20 13:21 ` Patrick McHardy 2010-04-20 18:26 ` Jan Engelhardt 0 siblings, 2 replies; 22+ messages in thread From: Patrick McHardy @ 2010-04-20 13:18 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Patrick McHardy wrote: > Jan Engelhardt wrote: >>>> +/* Allow this many total (re)entries. */ >>>> +static const unsigned int xt_jumpstack_multiplier = 2; >>>> + >>> Why aren't you using a define instead of saving the stack size >>> in the table info? >> I don't see how a define does any good here. Since you were quoting >> the multiplier line, I guess you could be confusing the multiplier >> with stored stacksize. FTR, the definition is: >> >> table->stacksize := number_of_user_chains(#UC) * multiplier; >> >> Since #UC is variable, so is stacksize, and so stacksize cannot >> be replaced by a constant. > > Right, thanks for the explanation. Applied. I just noticed a problem with this patch: [ 428.295752] BUG: sleeping function called from invalid context at mm/slub.c:1705 [ 428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables [ 428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2 [ 428.295776] Call Trace: [ 428.295791] [<c012138e>] __might_sleep+0xe5/0xed [ 428.295801] [<c019e8ca>] __kmalloc+0x92/0xfc [ 428.295825] [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables] [ 428.295839] [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables] [ 428.295851] [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables] [ 428.295864] [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables] [ 428.295876] [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables] [ 428.295922] [<c037388f>] nf_sockopt+0x167/0x17c [ 428.295931] [<c03738d8>] nf_setsockopt+0x1a/0x1f [ 428.295940] [<c037dda4>] ip_setsockopt+0x60/0x84 [ 428.295951] [<c039260a>] raw_setsockopt+0x1f/0x62 [ 428.295960] [<c034d909>] sock_common_setsockopt+0x18/0x1d [ 428.295968] [<c034bfb9>] sys_setsockopt+0x5e/0x79 [ 428.295977] [<c034d0a0>] sys_socketcall+0x12d/0x190 [ 428.295987] [<c0102a57>] sysenter_do_call+0x12/0x26 You probably shouldn't be allocating the jumpstack while BHs are disabled. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-20 13:18 ` Patrick McHardy @ 2010-04-20 13:21 ` Patrick McHardy 2010-04-20 18:26 ` Jan Engelhardt 1 sibling, 0 replies; 22+ messages in thread From: Patrick McHardy @ 2010-04-20 13:21 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Patrick McHardy wrote: > Patrick McHardy wrote: >> Jan Engelhardt wrote: >>>>> +/* Allow this many total (re)entries. */ >>>>> +static const unsigned int xt_jumpstack_multiplier = 2; >>>>> + >>>> Why aren't you using a define instead of saving the stack size >>>> in the table info? >>> I don't see how a define does any good here. Since you were quoting >>> the multiplier line, I guess you could be confusing the multiplier >>> with stored stacksize. FTR, the definition is: >>> >>> table->stacksize := number_of_user_chains(#UC) * multiplier; >>> >>> Since #UC is variable, so is stacksize, and so stacksize cannot >>> be replaced by a constant. >> Right, thanks for the explanation. Applied. > > I just noticed a problem with this patch: > > [ 428.295752] BUG: sleeping function called from invalid context at > mm/slub.c:1705 > [ 428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables > [ 428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2 > [ 428.295776] Call Trace: > [ 428.295791] [<c012138e>] __might_sleep+0xe5/0xed > [ 428.295801] [<c019e8ca>] __kmalloc+0x92/0xfc > [ 428.295825] [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables] > [ 428.295839] [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables] > [ 428.295851] [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables] > [ 428.295864] [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables] > [ 428.295876] [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables] > [ 428.295922] [<c037388f>] nf_sockopt+0x167/0x17c > [ 428.295931] [<c03738d8>] nf_setsockopt+0x1a/0x1f > [ 428.295940] [<c037dda4>] ip_setsockopt+0x60/0x84 > [ 428.295951] [<c039260a>] raw_setsockopt+0x1f/0x62 > [ 428.295960] [<c034d909>] sock_common_setsockopt+0x18/0x1d > [ 428.295968] [<c034bfb9>] sys_setsockopt+0x5e/0x79 > [ 428.295977] [<c034d0a0>] sys_socketcall+0x12d/0x190 > [ 428.295987] [<c0102a57>] sysenter_do_call+0x12/0x26 > > You probably shouldn't be allocating the jumpstack while BHs are > disabled. I pushed the entire patchset out, please send a fix on top. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-20 13:18 ` Patrick McHardy 2010-04-20 13:21 ` Patrick McHardy @ 2010-04-20 18:26 ` Jan Engelhardt 2010-04-21 12:47 ` Patrick McHardy 1 sibling, 1 reply; 22+ messages in thread From: Jan Engelhardt @ 2010-04-20 18:26 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel On Tuesday 2010-04-20 15:18, Patrick McHardy wrote: > >I just noticed a problem with this patch: > >[ 428.295752] BUG: sleeping function called from invalid context at >mm/slub.c:1705 >[ 428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables >[ 428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2 >[ 428.295776] Call Trace: >[ 428.295791] [<c012138e>] __might_sleep+0xe5/0xed >[ 428.295801] [<c019e8ca>] __kmalloc+0x92/0xfc >[ 428.295825] [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables] >[ 428.295839] [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables] > >You probably shouldn't be allocating the jumpstack while BHs are >disabled. The following patch should address this. It's easily movable as it only does work on newinfo itself. The following changes since commit 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350: Bart De Schuymer (1): netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in PPPoE traffic are available in the git repository at: git://dev.medozas.de/linux master Jan Engelhardt (1): netfilter: x_tables: move sleeping allocation outside BH-disabled region net/netfilter/x_tables.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) parent 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 (v2.6.34-rc3-1335-g6c79bf0) commit 3109d6b7342a5c8bf23697a86be12e733472d820 Author: Jan Engelhardt <jengelh@medozas.de> Date: Tue Apr 20 20:11:15 2010 +0200 netfilter: x_tables: move sleeping allocation outside BH-disabled region The jumpstack allocation needs to be moved out of the critical region. Corrects this notice: BUG: sleeping function called from invalid context at mm/slub.c:1705 [ 428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables [ 428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2 [ 428.295776] Call Trace: [ 428.295791] [<c012138e>] __might_sleep+0xe5/0xed [ 428.295801] [<c019e8ca>] __kmalloc+0x92/0xfc [ 428.295825] [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables] Signed-off-by: Jan Engelhardt <jengelh@medozas.de> --- net/netfilter/x_tables.c | 12 ++++++------ 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 3ae3234..445de70 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -801,6 +801,12 @@ xt_replace_table(struct xt_table *table, struct xt_table_info *private; int ret; + ret = xt_jumpstack_alloc(newinfo); + if (ret < 0) { + *error = ret; + return NULL; + } + /* Do the substitution. */ local_bh_disable(); private = table->private; @@ -814,12 +820,6 @@ xt_replace_table(struct xt_table *table, return NULL; } - ret = xt_jumpstack_alloc(newinfo); - if (ret < 0) { - *error = ret; - return NULL; - } - table->private = newinfo; newinfo->initial_entries = private->initial_entries; -- # Created with git-export-patch ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant 2010-04-20 18:26 ` Jan Engelhardt @ 2010-04-21 12:47 ` Patrick McHardy 0 siblings, 0 replies; 22+ messages in thread From: Patrick McHardy @ 2010-04-21 12:47 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel Jan Engelhardt wrote: > netfilter: x_tables: move sleeping allocation outside BH-disabled region > > The jumpstack allocation needs to be moved out of the critical region. > Corrects this notice: > > BUG: sleeping function called from invalid context at mm/slub.c:1705 > [ 428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables > [ 428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2 > [ 428.295776] Call Trace: > [ 428.295791] [<c012138e>] __might_sleep+0xe5/0xed > [ 428.295801] [<c019e8ca>] __kmalloc+0x92/0xfc > [ 428.295825] [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables] Applied, thanks Jan. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-04-21 12:47 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-13 23:21 nf-next: TEE only Jan Engelhardt 2010-04-13 23:21 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt 2010-04-14 5:52 ` Eric Dumazet 2010-04-14 10:00 ` Jan Engelhardt 2010-04-14 10:56 ` Patrick McHardy 2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt 2010-04-13 23:21 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt 2010-04-13 23:21 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt 2010-04-14 10:57 ` nf-next: TEE only Patrick McHardy 2010-04-14 11:15 ` Jan Engelhardt 2010-04-14 11:24 ` Patrick McHardy 2010-04-14 11:32 ` Jan Engelhardt 2010-04-15 11:41 ` Jan Engelhardt 2010-04-15 11:56 ` Patrick McHardy -- strict thread matches above, loose matches on Subject: below -- 2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt 2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt 2010-04-19 12:22 ` Patrick McHardy 2010-04-19 12:54 ` Jan Engelhardt 2010-04-19 14:06 ` Patrick McHardy 2010-04-20 13:18 ` Patrick McHardy 2010-04-20 13:21 ` Patrick McHardy 2010-04-20 18:26 ` Jan Engelhardt 2010-04-21 12:47 ` Patrick McHardy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).