* [PATCH 2.4] raw table and NOTRACK support
@ 2005-11-21 10:26 Roberto Nibali
2005-11-22 14:14 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-21 10:26 UTC (permalink / raw)
To: netfilter-devel; +Cc: Willy Tarreau
[-- Attachment #1: Type: text/plain, Size: 1538 bytes --]
Hello,
This is a re-diffed patch to have raw table and NOTRACK support in 2.4.x
kernel. I've kept the IPv6 part in this patch although it's untested and
most probably it's never been used by anyone before.
Caveats: Currently we get an oops on SMP iif:
o NOTRACK rule loaded, active and used (refcnt>0)
o SMP kernel
o connection tracking is enabled
o a normal rule hitting the conntrack table during lookup
o iptables -X; iptables -F, rmmod <all netfilter related modules>
Earlier attempts to address this issue with Pablo Neira have resulted in
a misplaced nf_reset(skb) patch, which I have removed again, because it
broke masquerading (IIRC). I will enable KDB and report back once I get
time some decent stack trace.
On normal reconfiguration all works perfectly, but to be safe we need to
flush the conntrack table or already established connection without a
filter rule would continue to be passed through the firewall. This will
go into production on our packet filters, however with the note to our
support team, that reconfiguration including connection tracking table
flushing is forbidden under death penalty :).
Cheers,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
[-- Attachment #2: linux-2.4.32-raw-table-NOTRACK-2.diff --]
[-- Type: text/plain, Size: 24235 bytes --]
diff -Nur linux-2.4.31-orig/Documentation/Configure.help linux-2.4.31-pab2/Documentation/Configure.help
--- linux-2.4.31-orig/Documentation/Configure.help 2005-04-04 03:42:19 +0200
+++ linux-2.4.31-pab2/Documentation/Configure.help 2005-06-29 14:33:32 +0200
@@ -3016,6 +3016,34 @@
If you want to compile it as a module, say M here and read
<file:Documentation/modules.txt>. If unsure, say `N'.
+raw table support (required for NOTRACK/TRACE)
+CONFIG_IP_NF_RAW
+ This option adds a `raw' table to iptables. This table is the very
+ first in the netfilter framework and hooks in at the PREROUTING
+ and OUTPUT chains.
+
+ If you want to compile it as a module, say M here and read
+ <file:Documentation/modules.txt>. If unsure, say `N'.
+
+NOTRACK target support
+CONFIG_IP_NF_TARGET_NOTRACK
+ The NOTRACK target allows a select rule to specify
+ which packets *not* to enter the conntrack/NAT
+ subsystem with all the consequences (no ICMP error tracking,
+ no protocol helpers for the selected packets).
+
+ If you want to compile it as a module, say M here and read
+ <file:Documentation/modules.txt>. If unsure, say `N'.
+
+raw table support (required for TRACE)
+CONFIG_IP6_NF_RAW
+ This option adds a `raw' table to ip6tables. This table is the very
+ first in the netfilter framework and hooks in at the PREROUTING
+ and OUTPUT chains.
+
+ If you want to compile it as a module, say M here and read
+ <file:Documentation/modules.txt>. If unsure, say `N'.
+
Packet filtering
CONFIG_IP_NF_FILTER
Packet filtering defines a table `filter', which has a series of
diff -Nur linux-2.4.31-orig/include/linux/netfilter_ipv4/ip_conntrack.h linux-2.4.31-pab2/include/linux/netfilter_ipv4/ip_conntrack.h
--- linux-2.4.31-orig/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-29 14:32:41 +0200
+++ linux-2.4.31-pab2/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-29 14:33:32 +0200
@@ -254,6 +254,9 @@
/* Call me when a conntrack is destroyed. */
extern void (*ip_conntrack_destroyed)(struct ip_conntrack *conntrack);
+/* Fake conntrack entry for untracked connections */
+extern struct ip_conntrack ip_conntrack_untracked;
+
/* Returns new sk_buff, or NULL */
struct sk_buff *
ip_ct_gather_frags(struct sk_buff *skb, u_int32_t user);
diff -Nur linux-2.4.31-orig/include/linux/netfilter_ipv4/ipt_conntrack.h linux-2.4.31-pab2/include/linux/netfilter_ipv4/ipt_conntrack.h
--- linux-2.4.31-orig/include/linux/netfilter_ipv4/ipt_conntrack.h 2002-11-29 00:53:15 +0100
+++ linux-2.4.31-pab2/include/linux/netfilter_ipv4/ipt_conntrack.h 2005-06-29 14:33:32 +0200
@@ -10,6 +10,7 @@
#define IPT_CONNTRACK_STATE_SNAT (1 << (IP_CT_NUMBER + 1))
#define IPT_CONNTRACK_STATE_DNAT (1 << (IP_CT_NUMBER + 2))
+#define IPT_CONNTRACK_STATE_UNTRACKED (1 << (IP_CT_NUMBER + 3))
/* flags, invflags: */
#define IPT_CONNTRACK_STATE 0x01
diff -Nur linux-2.4.31-orig/include/linux/netfilter_ipv4/ipt_state.h linux-2.4.31-pab2/include/linux/netfilter_ipv4/ipt_state.h
--- linux-2.4.31-orig/include/linux/netfilter_ipv4/ipt_state.h 2000-04-14 18:37:20 +0200
+++ linux-2.4.31-pab2/include/linux/netfilter_ipv4/ipt_state.h 2005-06-29 14:33:32 +0200
@@ -3,6 +3,7 @@
#define IPT_STATE_BIT(ctinfo) (1 << ((ctinfo)%IP_CT_IS_REPLY+1))
#define IPT_STATE_INVALID (1 << 0)
+#define IPT_STATE_UNTRACKED (1 << (IP_CT_NUMBER + 1))
struct ipt_state_info
{
diff -Nur linux-2.4.31-orig/include/linux/netfilter_ipv4.h linux-2.4.31-pab2/include/linux/netfilter_ipv4.h
--- linux-2.4.31-orig/include/linux/netfilter_ipv4.h 2002-02-25 20:38:13 +0100
+++ linux-2.4.31-pab2/include/linux/netfilter_ipv4.h 2005-06-29 14:33:32 +0200
@@ -51,6 +51,8 @@
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
+ NF_IP_PRI_CONNTRACK_DEFRAG = -400,
+ NF_IP_PRI_RAW = -300,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/Config.in linux-2.4.31-pab2/net/ipv4/netfilter/Config.in
--- linux-2.4.31-orig/net/ipv4/netfilter/Config.in 2005-01-19 15:10:13 +0100
+++ linux-2.4.31-pab2/net/ipv4/netfilter/Config.in 2005-06-29 14:33:32 +0200
@@ -107,6 +107,15 @@
dep_tristate ' LOG target support' CONFIG_IP_NF_TARGET_LOG $CONFIG_IP_NF_IPTABLES
dep_tristate ' ULOG target support' CONFIG_IP_NF_TARGET_ULOG $CONFIG_IP_NF_IPTABLES
dep_tristate ' TCPMSS target support' CONFIG_IP_NF_TARGET_TCPMSS $CONFIG_IP_NF_IPTABLES
+ if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ tristate ' raw table support (required for NOTRACK/TRACE)' CONFIG_IP_NF_RAW $CONFIG_IP_NF_IPTABLES
+ fi
+ if [ "$CONFIG_IP_NF_RAW" != "n" ]; then
+ if [ "$CONFIG_IP_NF_CONNTRACK" != "n" ]; then
+ dep_tristate ' NOTRACK target support' CONFIG_IP_NF_TARGET_NOTRACK $CONFIG_IP_NF_RAW
+ fi
+ # Marker for TRACE target
+ fi
fi
tristate 'ARP tables support' CONFIG_IP_NF_ARPTABLES
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/Makefile linux-2.4.31-pab2/net/ipv4/netfilter/Makefile
--- linux-2.4.31-orig/net/ipv4/netfilter/Makefile 2005-06-29 14:32:41 +0200
+++ linux-2.4.31-pab2/net/ipv4/netfilter/Makefile 2005-06-29 14:33:32 +0200
@@ -65,6 +65,7 @@
obj-$(CONFIG_IP_NF_FILTER) += iptable_filter.o
obj-$(CONFIG_IP_NF_MANGLE) += iptable_mangle.o
obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o
+obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o
# matches
obj-$(CONFIG_IP_NF_MATCH_HELPER) += ipt_helper.o
@@ -90,6 +91,7 @@
obj-$(CONFIG_IP_NF_MATCH_CONNTRACK) += ipt_conntrack.o
obj-$(CONFIG_IP_NF_MATCH_UNCLEAN) += ipt_unclean.o
obj-$(CONFIG_IP_NF_MATCH_TCPMSS) += ipt_tcpmss.o
+obj-$(CONFIG_IP_NF_TARGET_NOTRACK) += ipt_NOTRACK.o
# targets
obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.31-pab2/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-29 14:32:41 +0200
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-29 14:33:32 +0200
@@ -65,6 +65,7 @@
struct list_head *ip_conntrack_hash;
static kmem_cache_t *ip_conntrack_cachep;
static LIST_HEAD(unconfirmed);
+struct ip_conntrack ip_conntrack_untracked;
extern struct ip_conntrack_protocol ip_conntrack_generic_protocol;
@@ -823,6 +824,19 @@
int set_reply;
int ret;
+ /* Previously seen (loopback or untracked)? Ignore. */
+ if ((*pskb)->nfct)
+ return NF_ACCEPT;
+
+ /* Never happen */
+ if ((*pskb)->nh.iph->frag_off & htons(IP_OFFSET)) {
+ if (net_ratelimit()) {
+ printk(KERN_ERR "ip_conntrack_in: Frag of proto %u (hook=%u)\n",
+ (*pskb)->nh.iph->protocol, hooknum);
+ }
+ return NF_DROP;
+ }
+
/* FIXME: Do this right please. --RR */
(*pskb)->nfcache |= NFC_UNKNOWN;
@@ -841,21 +855,6 @@
}
#endif
- /* Previously seen (loopback)? Ignore. Do this before
- fragment check. */
- if ((*pskb)->nfct)
- return NF_ACCEPT;
-
- /* Gather fragments. */
- if ((*pskb)->nh.iph->frag_off & htons(IP_MF|IP_OFFSET)) {
- *pskb = ip_ct_gather_frags(*pskb,
- hooknum == NF_IP_PRE_ROUTING ?
- IP_DEFRAG_CONNTRACK_IN :
- IP_DEFRAG_CONNTRACK_OUT);
- if (!*pskb)
- return NF_STOLEN;
- }
-
proto = ip_ct_find_proto((*pskb)->nh.iph->protocol);
/* It may be an icmp error... */
@@ -1393,6 +1392,8 @@
schedule();
goto i_see_dead_people;
}
+ while (atomic_read(&ip_conntrack_untracked.ct_general.use) > 1)
+ schedule();
kmem_cache_destroy(ip_conntrack_cachep);
vfree(ip_conntrack_hash);
@@ -1460,6 +1461,18 @@
/* For use by ipt_REJECT */
ip_ct_attach = ip_conntrack_attach;
+
+ /* Set up fake conntrack:
+ - never to be deleted, not in any hashes */
+ atomic_set(&ip_conntrack_untracked.ct_general.use, 1);
+ /* - and let it look as if it's a confirmed connection */
+ set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status);
+ /* - and prepare the ctinfo field for REJECT/NAT. */
+ ip_conntrack_untracked.infos[IP_CT_NEW].master =
+ ip_conntrack_untracked.infos[IP_CT_RELATED].master =
+ ip_conntrack_untracked.infos[IP_CT_RELATED + IP_CT_IS_REPLY].master =
+ &ip_conntrack_untracked.ct_general;
+
return ret;
err_free_hash:
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ip_conntrack_standalone.c linux-2.4.31-pab2/net/ipv4/netfilter/ip_conntrack_standalone.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-29 14:32:42 +0200
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-29 14:33:32 +0200
@@ -189,6 +189,29 @@
return ip_conntrack_confirm(*pskb);
}
+static unsigned int ip_conntrack_defrag(unsigned int hooknum,
+ struct sk_buff **pskb,
+ const struct net_device *in,
+ const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ /* Previously seen (loopback)? Ignore. Do this before
+ * fragment check. */
+ if ((*pskb)->nfct)
+ return NF_ACCEPT;
+
+ /* Gather fragments. */
+ if ((*pskb)->nh.iph->frag_off & htons(IP_MF|IP_OFFSET)) {
+ *pskb = ip_ct_gather_frags(*pskb,
+ hooknum == NF_IP_PRE_ROUTING ?
+ IP_DEFRAG_CONNTRACK_IN :
+ IP_DEFRAG_CONNTRACK_OUT);
+ if (!*pskb)
+ return NF_STOLEN;
+ }
+ return NF_ACCEPT;
+}
+
static unsigned int ip_refrag(unsigned int hooknum,
struct sk_buff **pskb,
const struct net_device *in,
@@ -230,9 +253,15 @@
/* Connection tracking may drop packets, but never alters them, so
make it the first hook. */
+static struct nf_hook_ops ip_conntrack_defrag_ops
+= { { NULL, NULL }, ip_conntrack_defrag, PF_INET, NF_IP_PRE_ROUTING,
+ NF_IP_PRI_CONNTRACK_DEFRAG };
static struct nf_hook_ops ip_conntrack_in_ops
= { { NULL, NULL }, ip_conntrack_in, PF_INET, NF_IP_PRE_ROUTING,
NF_IP_PRI_CONNTRACK };
+static struct nf_hook_ops ip_conntrack_defrag_local_out_ops
+= { { NULL, NULL }, ip_conntrack_defrag, PF_INET, NF_IP_LOCAL_OUT,
+ NF_IP_PRI_CONNTRACK_DEFRAG };
static struct nf_hook_ops ip_conntrack_local_out_ops
= { { NULL, NULL }, ip_conntrack_local, PF_INET, NF_IP_LOCAL_OUT,
NF_IP_PRI_CONNTRACK };
@@ -373,10 +402,21 @@
if (!proc) goto cleanup_init;
proc->owner = THIS_MODULE;
+ ret = nf_register_hook(&ip_conntrack_defrag_ops);
+ if (ret < 0) {
+ printk("ip_conntrack: can't register pre-routing defrag hook.\n");
+ goto cleanup_proc;
+ }
+ ret = nf_register_hook(&ip_conntrack_defrag_local_out_ops);
+ if (ret < 0) {
+ printk("ip_conntrack: can't register local_out defrag hook.\n");
+ goto cleanup_defragops;
+ }
+
ret = nf_register_hook(&ip_conntrack_in_ops);
if (ret < 0) {
printk("ip_conntrack: can't register pre-routing hook.\n");
- goto cleanup_proc;
+ goto cleanup_defraglocalops;
}
ret = nf_register_hook(&ip_conntrack_local_out_ops);
if (ret < 0) {
@@ -414,6 +454,10 @@
nf_unregister_hook(&ip_conntrack_local_out_ops);
cleanup_inops:
nf_unregister_hook(&ip_conntrack_in_ops);
+ cleanup_defraglocalops:
+ nf_unregister_hook(&ip_conntrack_defrag_local_out_ops);
+ cleanup_defragops:
+ nf_unregister_hook(&ip_conntrack_defrag_ops);
cleanup_proc:
proc_net_remove("ip_conntrack");
cleanup_init:
@@ -503,5 +547,6 @@
EXPORT_SYMBOL(ip_conntrack_expect_list);
EXPORT_SYMBOL(ip_conntrack_lock);
EXPORT_SYMBOL(ip_conntrack_hash);
+EXPORT_SYMBOL(ip_conntrack_untracked);
EXPORT_SYMBOL_GPL(ip_conntrack_find_get);
EXPORT_SYMBOL_GPL(ip_conntrack_put);
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ip_nat_core.c linux-2.4.31-pab2/net/ipv4/netfilter/ip_nat_core.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ip_nat_core.c 2005-04-04 03:42:20 +0200
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ip_nat_core.c 2005-06-29 14:33:32 +0200
@@ -1024,6 +1024,10 @@
IP_NF_ASSERT(ip_conntrack_destroyed == NULL);
ip_conntrack_destroyed = &ip_nat_cleanup_conntrack;
+ /* Initialize fake conntrack so that NAT will skip it */
+ ip_conntrack_untracked.nat.info.initialized |=
+ (1 << IP_NAT_MANIP_SRC) | (1 << IP_NAT_MANIP_DST);
+
return 0;
}
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ipt_NOTRACK.c linux-2.4.31-pab2/net/ipv4/netfilter/ipt_NOTRACK.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ipt_NOTRACK.c 1970-01-01 01:00:00 +0100
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ipt_NOTRACK.c 2005-06-29 14:33:32 +0200
@@ -0,0 +1,75 @@
+/* This is a module which is used for setting up fake conntracks
+ * on packets so that they are not seen by the conntrack/NAT code.
+ */
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter_ipv4/ip_conntrack.h>
+
+static unsigned int
+target(struct sk_buff **pskb,
+ unsigned int hooknum,
+ const struct net_device *in,
+ const struct net_device *out,
+ const void *targinfo,
+ void *userinfo)
+{
+ /* Previously seen (loopback)? Ignore. */
+ if ((*pskb)->nfct != NULL)
+ return IPT_CONTINUE;
+
+ /* Attach fake conntrack entry.
+ If there is a real ct entry correspondig to this packet,
+ it'll hang aroun till timing out. We don't deal with it
+ for performance reasons. JK */
+ (*pskb)->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW];
+ nf_conntrack_get((*pskb)->nfct);
+
+ return IPT_CONTINUE;
+}
+
+static int
+checkentry(const char *tablename,
+ const struct ipt_entry *e,
+ void *targinfo,
+ unsigned int targinfosize,
+ unsigned int hook_mask)
+{
+ if (targinfosize != 0) {
+ printk(KERN_WARNING "NOTRACK: targinfosize %u != 0\n",
+ targinfosize);
+ return 0;
+ }
+
+ if (strcmp(tablename, "raw") != 0) {
+ printk(KERN_WARNING "NOTRACK: can only be called from \"raw\" table, not \"%s\"\n", tablename);
+ return 0;
+ }
+
+ return 1;
+}
+
+static struct ipt_target ipt_notrack_reg = {
+ .name = "NOTRACK",
+ .target = target,
+ .checkentry = checkentry,
+ .me = THIS_MODULE
+};
+
+static int __init init(void)
+{
+ if (ipt_register_target(&ipt_notrack_reg))
+ return -EINVAL;
+
+ return 0;
+}
+
+static void __exit fini(void)
+{
+ ipt_unregister_target(&ipt_notrack_reg);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ipt_conntrack.c linux-2.4.31-pab2/net/ipv4/netfilter/ipt_conntrack.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ipt_conntrack.c 2004-02-18 14:36:32 +0100
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ipt_conntrack.c 2005-06-29 14:33:32 +0200
@@ -27,7 +27,9 @@
#define FWINV(bool,invflg) ((bool) ^ !!(sinfo->invflags & invflg))
- if (ct)
+ if (skb->nfct == &ip_conntrack_untracked.infos[IP_CT_NEW])
+ statebit = IPT_CONNTRACK_STATE_UNTRACKED;
+ else if (ct)
statebit = IPT_CONNTRACK_STATE_BIT(ctinfo);
else
statebit = IPT_CONNTRACK_STATE_INVALID;
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/ipt_state.c linux-2.4.31-pab2/net/ipv4/netfilter/ipt_state.c
--- linux-2.4.31-orig/net/ipv4/netfilter/ipt_state.c 2004-02-18 14:36:32 +0100
+++ linux-2.4.31-pab2/net/ipv4/netfilter/ipt_state.c 2005-06-29 14:33:32 +0200
@@ -21,7 +21,9 @@
enum ip_conntrack_info ctinfo;
unsigned int statebit;
- if (!ip_conntrack_get((struct sk_buff *)skb, &ctinfo))
+ if (skb->nfct == &ip_conntrack_untracked.infos[IP_CT_NEW])
+ statebit = IPT_STATE_UNTRACKED;
+ else if (!ip_conntrack_get((struct sk_buff *)skb, &ctinfo))
statebit = IPT_STATE_INVALID;
else
statebit = IPT_STATE_BIT(ctinfo);
diff -Nur linux-2.4.31-orig/net/ipv4/netfilter/iptable_raw.c linux-2.4.31-pab2/net/ipv4/netfilter/iptable_raw.c
--- linux-2.4.31-orig/net/ipv4/netfilter/iptable_raw.c 1970-01-01 01:00:00 +0100
+++ linux-2.4.31-pab2/net/ipv4/netfilter/iptable_raw.c 2005-06-29 14:33:32 +0200
@@ -0,0 +1,149 @@
+/*
+ * 'raw' table, which is the very first hooked in at PRE_ROUTING and LOCAL_OUT .
+ *
+ * Copyright (C) 2003 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
+ */
+#include <linux/module.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#define RAW_VALID_HOOKS ((1 << NF_IP_PRE_ROUTING) | (1 << NF_IP_LOCAL_OUT))
+
+/* Standard entry. */
+struct ipt_standard
+{
+ struct ipt_entry entry;
+ struct ipt_standard_target target;
+};
+
+struct ipt_error_target
+{
+ struct ipt_entry_target target;
+ char errorname[IPT_FUNCTION_MAXNAMELEN];
+};
+
+struct ipt_error
+{
+ struct ipt_entry entry;
+ struct ipt_error_target target;
+};
+
+static struct
+{
+ struct ipt_replace repl;
+ struct ipt_standard entries[2];
+ struct ipt_error term;
+} initial_table __initdata
+= { { "raw", RAW_VALID_HOOKS, 3,
+ sizeof(struct ipt_standard) * 2 + sizeof(struct ipt_error),
+ { [NF_IP_PRE_ROUTING] 0,
+ [NF_IP_LOCAL_OUT] sizeof(struct ipt_standard) },
+ { [NF_IP_PRE_ROUTING] 0,
+ [NF_IP_LOCAL_OUT] sizeof(struct ipt_standard) },
+ 0, NULL, { } },
+ {
+ /* PRE_ROUTING */
+ { { { { 0 }, { 0 }, { 0 }, { 0 }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ipt_entry),
+ sizeof(struct ipt_standard),
+ 0, { 0, 0 }, { } },
+ { { { { IPT_ALIGN(sizeof(struct ipt_standard_target)), "" } }, { } },
+ -NF_ACCEPT - 1 } },
+ /* LOCAL_OUT */
+ { { { { 0 }, { 0 }, { 0 }, { 0 }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ipt_entry),
+ sizeof(struct ipt_standard),
+ 0, { 0, 0 }, { } },
+ { { { { IPT_ALIGN(sizeof(struct ipt_standard_target)), "" } }, { } },
+ -NF_ACCEPT - 1 } }
+ },
+ /* ERROR */
+ { { { { 0 }, { 0 }, { 0 }, { 0 }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ipt_entry),
+ sizeof(struct ipt_error),
+ 0, { 0, 0 }, { } },
+ { { { { IPT_ALIGN(sizeof(struct ipt_error_target)), IPT_ERROR_TARGET } },
+ { } },
+ "ERROR"
+ }
+ }
+};
+
+static struct ipt_table packet_raw = {
+ .name = "raw",
+ .table = &initial_table.repl,
+ .valid_hooks = RAW_VALID_HOOKS,
+ .lock = RW_LOCK_UNLOCKED,
+ .me = THIS_MODULE
+};
+
+/* The work comes in here from netfilter.c. */
+static unsigned int
+ipt_hook(unsigned int hook,
+ struct sk_buff **pskb,
+ const struct net_device *in,
+ const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ return ipt_do_table(pskb, hook, in, out, &packet_raw, NULL);
+}
+
+/* 'raw' is the very first table. */
+static struct nf_hook_ops ipt_ops[] = {
+ {
+ .hook = ipt_hook,
+ .pf = PF_INET,
+ .hooknum = NF_IP_PRE_ROUTING,
+ .priority = NF_IP_PRI_RAW
+ },
+ {
+ .hook = ipt_hook,
+ .pf = PF_INET,
+ .hooknum = NF_IP_LOCAL_OUT,
+ .priority = NF_IP_PRI_RAW
+ },
+};
+
+static int __init init(void)
+{
+ int ret;
+
+ /* Register table */
+ ret = ipt_register_table(&packet_raw);
+ if (ret < 0)
+ return ret;
+
+ /* Register hooks */
+ ret = nf_register_hook(&ipt_ops[0]);
+ if (ret < 0)
+ goto cleanup_table;
+
+ ret = nf_register_hook(&ipt_ops[1]);
+ if (ret < 0)
+ goto cleanup_hook0;
+
+ return ret;
+
+ cleanup_hook0:
+ nf_unregister_hook(&ipt_ops[0]);
+ cleanup_table:
+ ipt_unregister_table(&packet_raw);
+
+ return ret;
+}
+
+static void __exit fini(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < sizeof(ipt_ops)/sizeof(struct nf_hook_ops); i++)
+ nf_unregister_hook(&ipt_ops[i]);
+
+ ipt_unregister_table(&packet_raw);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
diff -Nur linux-2.4.31-orig/net/ipv6/netfilter/Config.in linux-2.4.31-pab2/net/ipv6/netfilter/Config.in
--- linux-2.4.31-orig/net/ipv6/netfilter/Config.in 2003-06-13 16:51:39 +0200
+++ linux-2.4.31-pab2/net/ipv6/netfilter/Config.in 2005-06-29 14:33:32 +0200
@@ -75,4 +75,9 @@
#dep_tristate ' LOG target support' CONFIG_IP6_NF_TARGET_LOG $CONFIG_IP6_NF_IPTABLES
fi
+ if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ tristate ' raw table support (required for TRACE)' CONFIG_IP6_NF_RAW $CONFIG_IP6_NF_IPTABLES
+ fi
+ # Marker for TRACE target
+
endmenu
diff -Nur linux-2.4.31-orig/net/ipv6/netfilter/Makefile linux-2.4.31-pab2/net/ipv6/netfilter/Makefile
--- linux-2.4.31-orig/net/ipv6/netfilter/Makefile 2003-06-13 16:51:39 +0200
+++ linux-2.4.31-pab2/net/ipv6/netfilter/Makefile 2005-06-29 14:33:32 +0200
@@ -30,6 +30,7 @@
obj-$(CONFIG_IP6_NF_TARGET_MARK) += ip6t_MARK.o
obj-$(CONFIG_IP6_NF_QUEUE) += ip6_queue.o
obj-$(CONFIG_IP6_NF_TARGET_LOG) += ip6t_LOG.o
+obj-$(CONFIG_IP6_NF_RAW) += ip6table_raw.o
obj-$(CONFIG_IP6_NF_MATCH_HL) += ip6t_hl.o
include $(TOPDIR)/Rules.make
diff -Nur linux-2.4.31-orig/net/ipv6/netfilter/ip6table_raw.c linux-2.4.31-pab2/net/ipv6/netfilter/ip6table_raw.c
--- linux-2.4.31-orig/net/ipv6/netfilter/ip6table_raw.c 1970-01-01 01:00:00 +0100
+++ linux-2.4.31-pab2/net/ipv6/netfilter/ip6table_raw.c 2005-06-29 14:33:32 +0200
@@ -0,0 +1,154 @@
+/*
+ * IPv6 raw table, a port of the IPv4 raw table to IPv6
+ *
+ * Copyright (C) 2003 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
+ */
+#include <linux/module.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
+
+#define RAW_VALID_HOOKS ((1 << NF_IP6_PRE_ROUTING) | (1 << NF_IP6_LOCAL_OUT))
+
+#if 0
+#define DEBUGP(x, args...) printk(KERN_DEBUG x, ## args)
+#else
+#define DEBUGP(x, args...)
+#endif
+
+/* Standard entry. */
+struct ip6t_standard
+{
+ struct ip6t_entry entry;
+ struct ip6t_standard_target target;
+};
+
+struct ip6t_error_target
+{
+ struct ip6t_entry_target target;
+ char errorname[IP6T_FUNCTION_MAXNAMELEN];
+};
+
+struct ip6t_error
+{
+ struct ip6t_entry entry;
+ struct ip6t_error_target target;
+};
+
+static struct
+{
+ struct ip6t_replace repl;
+ struct ip6t_standard entries[2];
+ struct ip6t_error term;
+} initial_table __initdata
+= { { "raw", RAW_VALID_HOOKS, 3,
+ sizeof(struct ip6t_standard) * 2 + sizeof(struct ip6t_error),
+ { [NF_IP6_PRE_ROUTING] 0,
+ [NF_IP6_LOCAL_OUT] sizeof(struct ip6t_standard) },
+ { [NF_IP6_PRE_ROUTING] 0,
+ [NF_IP6_LOCAL_OUT] sizeof(struct ip6t_standard) },
+ 0, NULL, { } },
+ {
+ /* PRE_ROUTING */
+ { { { { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ip6t_entry),
+ sizeof(struct ip6t_standard),
+ 0, { 0, 0 }, { } },
+ { { { { IP6T_ALIGN(sizeof(struct ip6t_standard_target)), "" } }, { } },
+ -NF_ACCEPT - 1 } },
+ /* LOCAL_OUT */
+ { { { { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ip6t_entry),
+ sizeof(struct ip6t_standard),
+ 0, { 0, 0 }, { } },
+ { { { { IP6T_ALIGN(sizeof(struct ip6t_standard_target)), "" } }, { } },
+ -NF_ACCEPT - 1 } },
+ },
+ /* ERROR */
+ { { { { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, { { { 0 } } }, "", "", { 0 }, { 0 }, 0, 0, 0 },
+ 0,
+ sizeof(struct ip6t_entry),
+ sizeof(struct ip6t_error),
+ 0, { 0, 0 }, { } },
+ { { { { IP6T_ALIGN(sizeof(struct ip6t_error_target)), IP6T_ERROR_TARGET } },
+ { } },
+ "ERROR"
+ }
+ }
+};
+
+static struct ip6t_table packet_raw = {
+ .name = "raw",
+ .table = &initial_table.repl,
+ .valid_hooks = RAW_VALID_HOOKS,
+ .lock = RW_LOCK_UNLOCKED,
+ .me = THIS_MODULE
+};
+
+/* The work comes in here from netfilter.c. */
+static unsigned int
+ip6t_hook(unsigned int hook,
+ struct sk_buff **pskb,
+ const struct net_device *in,
+ const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ return ip6t_do_table(pskb, hook, in, out, &packet_raw, NULL);
+}
+
+static struct nf_hook_ops ip6t_ops[] = {
+ {
+ .hook = ip6t_hook,
+ .pf = PF_INET6,
+ .hooknum = NF_IP6_PRE_ROUTING,
+ .priority = NF_IP6_PRI_FIRST
+ },
+ {
+ .hook = ip6t_hook,
+ .pf = PF_INET6,
+ .hooknum = NF_IP6_LOCAL_OUT,
+ .priority = NF_IP6_PRI_FIRST
+ },
+};
+
+static int __init init(void)
+{
+ int ret;
+
+ /* Register table */
+ ret = ip6t_register_table(&packet_raw);
+ if (ret < 0)
+ return ret;
+
+ /* Register hooks */
+ ret = nf_register_hook(&ip6t_ops[0]);
+ if (ret < 0)
+ goto cleanup_table;
+
+ ret = nf_register_hook(&ip6t_ops[1]);
+ if (ret < 0)
+ goto cleanup_hook0;
+
+ return ret;
+
+ cleanup_hook0:
+ nf_unregister_hook(&ip6t_ops[0]);
+ cleanup_table:
+ ip6t_unregister_table(&packet_raw);
+
+ return ret;
+}
+
+static void __exit fini(void)
+{
+ unsigned int i;
+
+ for (i = 0; i < sizeof(ip6t_ops)/sizeof(struct nf_hook_ops); i++)
+ nf_unregister_hook(&ip6t_ops[i]);
+
+ ip6t_unregister_table(&packet_raw);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-21 10:26 [PATCH 2.4] raw table and NOTRACK support Roberto Nibali
@ 2005-11-22 14:14 ` Roberto Nibali
2005-11-22 15:40 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-22 14:14 UTC (permalink / raw)
To: netfilter-devel
> Caveats: Currently we get an oops on SMP iif:
No oops with kdb, a busyloop instead now.
> o NOTRACK rule loaded, active and used (refcnt>0)
> o SMP kernel
> o connection tracking is enabled
> o a normal rule hitting the conntrack table during lookup
> o iptables -X; iptables -F, rmmod <all netfilter related modules>
>
> Earlier attempts to address this issue with Pablo Neira have resulted in
> a misplaced nf_reset(skb) patch, which I have removed again, because it
> broke masquerading (IIRC). I will enable KDB and report back once I get
> time some decent stack trace.
Hmm, with kdb I get following trace (check the bt at the end):
Entering kdb (current=0xc0494000, pid 0) on processor 0 due to cpu switch
[0]kdb> cpu
Currently on cpu 0
Available cpus: 0, 1, 2, 3
[0]kdb> cpu 3
Entering kdb (current=0xf5c72000, pid 7064) on processor 3 due to cpu switch
[3]kdb> ssb
0xf89a7232 get_next_corpse+0xb2: mov (%esi),%eax
0xf89a7234 get_next_corpse+0xb4: inc %eax
0xf89a7235 get_next_corpse+0xb5: mov %eax,%ecx
0xf89a7237 get_next_corpse+0xb7: mov %eax,(%esi)
0xf89a7239 get_next_corpse+0xb9: cmp 0xf89aad04,%ecx
0xf89a723f get_next_corpse+0xbf: jb 0xf89a71e0 get_next_corpse+0x60
[3]kdb>
0xf89a71e0 get_next_corpse+0x60: movl $0x0,0xfffffff0(%ebp)
0xf89a71e7 get_next_corpse+0x67: mov 0xf89aae64,%eax
0xf89a71ec get_next_corpse+0x6c: mov (%eax,%ecx,8),%ebx
0xf89a71ef get_next_corpse+0x6f: mov (%ebx),%edx
0xf89a71f1 get_next_corpse+0x71: prefetchnta (%edx)
0xf89a71f4 get_next_corpse+0x74: lea (%eax,%ecx,8),%eax
0xf89a71f7 get_next_corpse+0x77: jmp 0xf89a7227 get_next_corpse+0xa7
[3]kdb>
0xf89a7227 get_next_corpse+0xa7: cmp %ebx,%eax
0xf89a7229 get_next_corpse+0xa9: jne 0xf89a7200 get_next_corpse+0x80
[3]kdb>
0xf89a722b get_next_corpse+0xab: mov 0xfffffff0(%ebp),%ecx
0xf89a722e get_next_corpse+0xae: test %ecx,%ecx
0xf89a7230 get_next_corpse+0xb0: jne 0xf89a7276 get_next_corpse+0xf6
[3]kdb>
0xf89a7232 get_next_corpse+0xb2: mov (%esi),%eax
0xf89a7234 get_next_corpse+0xb4: inc %eax
0xf89a7235 get_next_corpse+0xb5: mov %eax,%ecx
0xf89a7237 get_next_corpse+0xb7: mov %eax,(%esi)
0xf89a7239 get_next_corpse+0xb9: cmp 0xf89aad04,%ecx
0xf89a723f get_next_corpse+0xbf: jb 0xf89a71e0 get_next_corpse+0x60
[3]kdb> ss
SS trap at 0xf89a7227 ([ip_conntrack]get_next_corpse+0xa7)
0xf89a7227 get_next_corpse+0xa7: cmp %ebx,%eax
[3]kdb> rd
eax = 0xf89d9f08 ebx = 0xf89d9f08 ecx = 0x00001fe1 edx = 0xf89d9f08
esi = 0xf5c73f20 edi = 0x00000000 esp = 0xf5c73ef4 eip = 0xf89a7227
ebp = 0xf5c73f0c xss = 0xc0350018 xcs = 0x00000010 eflags = 0x00000287
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xf5c73ec0
[3]kdb> ss
SS trap at 0xf89a7229 ([ip_conntrack]get_next_corpse+0xa9)
0xf89a7229 get_next_corpse+0xa9: jne 0xf89a7200 get_next_corpse+0x80
[3]kdb> rd
eax = 0xf89d9f08 ebx = 0xf89d9f08 ecx = 0x00001fe1 edx = 0xf89d9f08
esi = 0xf5c73f20 edi = 0x00000000 esp = 0xf5c73ef4 eip = 0xf89a7229
ebp = 0xf5c73f0c xss = 0xc0350018 xcs = 0x00000010 eflags = 0x00000246
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xf5c73ec0
[3]kdb> ss
SS trap at 0xf89a722b ([ip_conntrack]get_next_corpse+0xab)
0xf89a722b get_next_corpse+0xab: mov 0xfffffff0(%ebp),%ecx
[3]kdb> rd
eax = 0xf89d9f08 ebx = 0xf89d9f08 ecx = 0x00001fe1 edx = 0xf89d9f08
esi = 0xf5c73f20 edi = 0x00000000 esp = 0xf5c73ef4 eip = 0xf89a722b
ebp = 0xf5c73f0c xss = 0xc0350018 xcs = 0x00000010 eflags = 0x00000246
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xf5c73ec0
[3]kdb> bt
Stack traceback for pid 7064
0xf5c72000 7064 7011 1 3 R 0xf5c722b0 *rmmod
EBP EIP Function (args)
0xf5c73f0c 0xf89a723f [ip_conntrack]get_next_corpse+0xbf (0xf89a7470,
0x0, 0xf5c73f20, 0x1fe2, 0xf5c72000)
ip_conntrack .text 0xf89a4060 0xf89a7180
0xf89a72d0
0xf5c73f30 0xf89a7303
[ip_conntrack]ip_ct_iterate_cleanup_Rsmp_4ff11842+0x33 (0xf89a7470, 0x0,
0x0)
ip_conntrack .text 0xf89a4060 0xf89a72d0
0xf89a7370
0xf5c73f44 0xf89a74f7 [ip_conntrack]ip_conntrack_cleanup+0x77
(0xf89a990f, 0xc2a7bd20, 0xc0471e20, 0xf89a4000)
ip_conntrack .text 0xf89a4060 0xf89a7480
0xf89a7550
0xf5c73f5c 0xf89a479f [ip_conntrack]init_or_cleanup+0x17f (0x0)
ip_conntrack .text 0xf89a4060 0xf89a4620
0xf89a4810
0xf5c73f68 0xf89a4a22 [ip_conntrack]fini+0x12 (0xf89a4000, 0xfffffff0,
0xf5d8b000, 0xf5c73f84, 0xf89a4000)
ip_conntrack .text 0xf89a4060 0xf89a4a10
0xf89a4a24
0xf5c73f8c 0xc0120641 free_module+0x111 (0xf89a4000, 0x0, 0x1000,
0xbfffde18, 0xf5c72000)
kernel .text 0xc0100000 0xc0120530 0xc0120660
0xf5c73fbc 0xc011f639 sys_delete_module+0x129 (0xbffffcd9, 0xbfffefd4,
0xbfffdf2c, 0x1, 0xbfffdf2c)
kernel .text 0xc0100000 0xc011f510 0xc011f940
0xc010774f system_call+0x33
kernel .text 0xc0100000 0xc010771c 0xc0107754
[3]kdb>
Tell me if you need more info.
Cheers,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-22 14:14 ` Roberto Nibali
@ 2005-11-22 15:40 ` Roberto Nibali
2005-11-22 15:54 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-22 15:40 UTC (permalink / raw)
To: Netfilter Developers
> [3]kdb> bt
> Stack traceback for pid 7064
> 0xf5c72000 7064 7011 1 3 R 0xf5c722b0 *rmmod
> EBP EIP Function (args)
> 0xf5c73f0c 0xf89a723f [ip_conntrack]get_next_corpse+0xbf (0xf89a7470,
> 0x0, 0xf5c73f20, 0x1fe2, 0xf5c72000)
> ip_conntrack .text 0xf89a4060 0xf89a7180
> 0xf89a72d0
> 0xf5c73f30 0xf89a7303
> [ip_conntrack]ip_ct_iterate_cleanup_Rsmp_4ff11842+0x33 (0xf89a7470, 0x0,
> 0x0)
> ip_conntrack .text 0xf89a4060 0xf89a72d0
> 0xf89a7370
> 0xf5c73f44 0xf89a74f7 [ip_conntrack]ip_conntrack_cleanup+0x77
> (0xf89a990f, 0xc2a7bd20, 0xc0471e20, 0xf89a4000)
> ip_conntrack .text 0xf89a4060 0xf89a7480
> 0xf89a7550
> 0xf5c73f5c 0xf89a479f [ip_conntrack]init_or_cleanup+0x17f (0x0)
> ip_conntrack .text 0xf89a4060 0xf89a4620
> 0xf89a4810
> 0xf5c73f68 0xf89a4a22 [ip_conntrack]fini+0x12 (0xf89a4000, 0xfffffff0,
> 0xf5d8b000, 0xf5c73f84, 0xf89a4000)
> ip_conntrack .text 0xf89a4060 0xf89a4a10
> 0xf89a4a24
> 0xf5c73f8c 0xc0120641 free_module+0x111 (0xf89a4000, 0x0, 0x1000,
> 0xbfffde18, 0xf5c72000)
> kernel .text 0xc0100000 0xc0120530 0xc0120660
> 0xf5c73fbc 0xc011f639 sys_delete_module+0x129 (0xbffffcd9, 0xbfffefd4,
> 0xbfffdf2c, 0x1, 0xbfffdf2c)
> kernel .text 0xc0100000 0xc011f510 0xc011f940
> 0xc010774f system_call+0x33
> kernel .text 0xc0100000 0xc010771c 0xc0107754
> [3]kdb>
I don't get it. It's looping in:
void
ip_ct_iterate_cleanup(int (*iter)(struct ip_conntrack *i, void *), void
*data)
{
struct ip_conntrack_tuple_hash *h;
unsigned int bucket = 0;
while ((h = get_next_corpse(iter, data, &bucket)) != NULL) {
/* Time to push up daises... */
if (del_timer(&h->ctrack->timeout))
death_by_timeout((unsigned long)h->ctrack);
/* ... else the timer will get him soon. */
ip_conntrack_put(h->ctrack);
}
}
which is called from:
void ip_conntrack_cleanup(void)
{
ip_ct_attach = NULL;
/* This makes sure all current packets have passed through
netfilter framework. Roll on, two-stage module
delete... */
br_write_lock_bh(BR_NETPROTO_LOCK);
br_write_unlock_bh(BR_NETPROTO_LOCK);
i_see_dead_people:
ip_ct_iterate_cleanup(kill_all, NULL);
if (atomic_read(&ip_conntrack_count) != 0) {
schedule();
goto i_see_dead_people;
}
while (atomic_read(&ip_conntrack_untracked.ct_general.use) > 1)
schedule();
kmem_cache_destroy(ip_conntrack_cachep);
vfree(ip_conntrack_hash);
nf_unregister_sockopt(&so_getorigdst);
}
I don't see where ip_conntrack_untracked.ct_general.use is > 1, ever ...
I'm completely puzzled,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-22 15:40 ` Roberto Nibali
@ 2005-11-22 15:54 ` Roberto Nibali
2005-11-23 13:04 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-22 15:54 UTC (permalink / raw)
To: Netfilter Developers
> void ip_conntrack_cleanup(void)
> {
> ip_ct_attach = NULL;
> /* This makes sure all current packets have passed through
> netfilter framework. Roll on, two-stage module
> delete... */
> br_write_lock_bh(BR_NETPROTO_LOCK);
> br_write_unlock_bh(BR_NETPROTO_LOCK);
>
> i_see_dead_people:
> ip_ct_iterate_cleanup(kill_all, NULL);
> if (atomic_read(&ip_conntrack_count) != 0) {
> schedule();
> goto i_see_dead_people;
> }
> while (atomic_read(&ip_conntrack_untracked.ct_general.use) > 1)
> schedule();
>
> kmem_cache_destroy(ip_conntrack_cachep);
> vfree(ip_conntrack_hash);
> nf_unregister_sockopt(&so_getorigdst);
> }
>
> I don't see where ip_conntrack_untracked.ct_general.use is > 1, ever ...
SS trap at 0xf89a7227 ([ip_conntrack]get_next_corpse+0xa7)
0xf89a7227 get_next_corpse+0xa7: cmp %ebx,%eax
[0]kdb> mm4 ip_conntrack_count 0
0xf89aae68 = 0x0
[0]kdb> go
lb-lb0-phys:~#
So forcing ip_conntrack_count to be 0 of course breaks the endless
schedule(). And naturally after a fw reconfiguration we oops:
kernel BUG at slab.c:815!
invalid operand: 0000
ip_conntrack ipt_limit ip_vs_wlc ip_vs ipt_LOG iptable_raw
iptable_mangle iptable_filter ip_tables
CPU: 0
EIP: 0010:[<c013bb32>] Not tainted
EFLAGS: 00010246
EIP is at kmem_cache_create+0x262/0x3d0 [kernel]
eax: 00000000 ebx: f7ae6a98 ecx: f7ae6ba0 edx: f7295fc8
esi: f7ae6b99 edi: f89a9945 ebp: f5b1deac esp: f5b1de84
ds: 0018 es: 0018 ss: 0018
Process modprobe (pid: 7457, stackpage=f5b1d000)
Stack: f7ae6a98 00000160 00002000 f5b1de9c f7ae6ab8 ffffffe0 00000080
00000000
00000000 00000060 f5b1ded0 f89a7660 f89a9938 00000160 00000020
00022000
00000000 00000000 00000000 f5b1dee8 f89a4639 ffffffea 00000000
00000060
Call Trace:
[<f89a7660>] ip_conntrack_init+0x110/0x298 [ip_conntrack]
[<f89a9938>] .rodata.str1.1+0x198/0x2e0 [ip_conntrack]
[<f89a4639>] init_or_cleanup+0x19/0x1f0 [ip_conntrack]
[<f89a4a02>] init_module+0x12/0x20 [ip_conntrack]
[<c011f40e>] sys_init_module+0x85e/0x8c0 [kernel]
[<f89a4060>] kill_proto+0x0/0x20 [ip_conntrack]
[<f89ad1cc>] E ip_conntrack_hash_Rsmp_386855a5+0x2368/0xfffffebc
[ip_conntrack]
[<f89aa168>]
__ksymtab_ip_conntrack_protocol_register_Rsmp_6e500e17+0x0/0x8
[ip_conntrack]
[<f89a4060>] kill_proto+0x0/0x20 [ip_conntrack]
[<c010774f>] system_call+0x33/0x38 [kernel]
Code: 0f 0b 2f 03 57 c8 37 c0 89 d0 8b 12 0f 18 02 3d 90 1b 47 c0
Entering kdb (current=0xf5b1c000, pid 7457) on processor 0 Oops: invalid
operand
due to oops @ 0xc013bb32
eax = 0x00000000 ebx = 0xf7ae6a98 ecx = 0xf7ae6ba0 edx = 0xf7295fc8
esi = 0xf7ae6b99 edi = 0xf89a9945 esp = 0xf5b1de84 eip = 0xc013bb32
ebp = 0xf5b1deac xss = 0xc0350018 xcs = 0x00000010 eflags = 0x00010246
xds = 0xf7ae0018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xf5b1de50
[0]kdb> bt
Stack traceback for pid 7457
0xf5b1c000 7457 7455 1 0 R 0xf5b1c2b0 *modprobe
EBP EIP Function (args)
0xf5b1deac 0xc013bb32 kmem_cache_create+0x262 (0xf89a9938, 0x160, 0x20,
0x22000, 0x0)
kernel .text 0xc0100000 0xc013b8d0 0xc013bca0
0xf5b1ded0 0xf89a7660 [ip_conntrack]ip_conntrack_init+0x110 (0xffffffea,
0x0, 0x60, 0xffffffea)
ip_conntrack .text 0xf89a4060 0xf89a7550
0xf89a77e8
0xf5b1dee8 0xf89a4639 [ip_conntrack]init_or_cleanup+0x19 (0x1)
ip_conntrack .text 0xf89a4060 0xf89a4620
0xf89a4810
0xf5b1def4 0xf89a4a02 [ip_conntrack]init_module+0x12 (0xf89a4060,
0x8096a20, 0x916c, 0xf89ad1cc, 0xf89aa168)
ip_conntrack .text 0xf89a4060 0xf89a49f0
0xf89a4a10
0xf5b1dfbc 0xc011f40e sys_init_module+0x85e (0x806ab70, 0x80969c0,
0x80969c0, 0x400191d8, 0xbfffb0fc)
kernel .text 0xc0100000 0xc011ebb0 0xc011f470
0xc010774f system_call+0x33
kernel .text 0xc0100000 0xc010771c 0xc0107754
[0]kdb> go
Catastrophic error detected
kdb_continue_catastrophic=0, type go a second time if you really want to
continue
[0]kdb> mm4 sysrq_enabled 1
0xc047bf20 = 0x1
[0]kdb> sr 7
<6>SysRq : Changing Loglevel
Loglevel set to 7
[0]kdb> sr s
SysRq : Emergency Sync
[0]kdb> sr u
SysRq : Emergency Remount R/O
[0]kdb> sr s
SysRq : Emergency Sync
[0]kdb> sr b
SysRq : Resetting
Damn! I wish I understood that conntrack stuff better ...
Cheers,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-22 15:54 ` Roberto Nibali
@ 2005-11-23 13:04 ` Roberto Nibali
2005-11-27 15:36 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-23 13:04 UTC (permalink / raw)
To: Netfilter Developers; +Cc: Patrick McHardy, Willy Tarreau
> Damn! I wish I understood that conntrack stuff better ...
Ok, so NOTRACK registers itself into the conntrack table upon target
entry using nf_conntrack_get((*pskb)->nfct). And each skb updates the
nfct counter, but when deregistering the conntrack we still have
references of the fake connection tracking entry of the NOTRACK hook.
This was discussed already and a Patrick submitted a patchset:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b31e5b1bb53b99dfd5e890aa07e943aff114ae1c
Patrick, in the thread leading to this patch we concluded that you would
forward the nf_reset patch to Marcelo for 2.4.x inclusion. I only
realised now that this did not happen and thus the following patch is
needed for 2.4.x to have rmmod ip_conntrack working correctly when
having either bridging or NOTRACK (both not in vanilla) loaded and used
in the kernel:
--- linux-2.4.32-orig/net/ipv4/ip_output.c 2005-11-21 11:29:41 +0100
+++ linux-2.4.32-pab2/net/ipv4/ip_output.c 2005-11-23 11:42:13 +0100
@@ -167,6 +167,9 @@
nf_debug_ip_finish_output2(skb);
#endif /*CONFIG_NETFILTER_DEBUG*/
+ /* Drop conntrack reference when packet leaves IP */
+ nf_reset(skb);
+
if (hh) {
int hh_alen;
Is there a reason not to include this patch in 2.4.x?
Thanks and regards,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-23 13:04 ` Roberto Nibali
@ 2005-11-27 15:36 ` Patrick McHardy
2005-11-27 18:22 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2005-11-27 15:36 UTC (permalink / raw)
To: Roberto Nibali; +Cc: Netfilter Developers, Willy Tarreau
Roberto Nibali wrote:
> Patrick, in the thread leading to this patch we concluded that you would
> forward the nf_reset patch to Marcelo for 2.4.x inclusion. I only
> realised now that this did not happen and thus the following patch is
> needed for 2.4.x to have rmmod ip_conntrack working correctly when
> having either bridging or NOTRACK (both not in vanilla) loaded and used
> in the kernel:
>
> --- linux-2.4.32-orig/net/ipv4/ip_output.c 2005-11-21 11:29:41 +0100
> +++ linux-2.4.32-pab2/net/ipv4/ip_output.c 2005-11-23 11:42:13 +0100
> @@ -167,6 +167,9 @@
> nf_debug_ip_finish_output2(skb);
> #endif /*CONFIG_NETFILTER_DEBUG*/
>
> + /* Drop conntrack reference when packet leaves IP */
> + nf_reset(skb);
> +
> if (hh) {
> int hh_alen;
>
> Is there a reason not to include this patch in 2.4.x?
Yes, it turned out to break a lots of things on loopback.
We put a different patch in 2.6, which dropped the reference
at known points where the packet would be queued, except
for the qdiscs. We can put the same patch in 2.4.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-27 15:36 ` Patrick McHardy
@ 2005-11-27 18:22 ` Roberto Nibali
2005-11-27 18:49 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-27 18:22 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Willy Tarreau, Netfilter Developers, Roberto Nibali
Hello Patrick,
Thanks for replying.
>> --- linux-2.4.32-orig/net/ipv4/ip_output.c 2005-11-21 11:29:41 +0100
>> +++ linux-2.4.32-pab2/net/ipv4/ip_output.c 2005-11-23 11:42:13 +0100
>> @@ -167,6 +167,9 @@
>> nf_debug_ip_finish_output2(skb);
>> #endif /*CONFIG_NETFILTER_DEBUG*/
>>
>> + /* Drop conntrack reference when packet leaves IP */
>> + nf_reset(skb);
>> +
>> if (hh) {
>> int hh_alen;
>>
>> Is there a reason not to include this patch in 2.4.x?
>
> Yes, it turned out to break a lots of things on loopback.
Although I don't see how in 2.4.x, I now vaguely remember the bug report.
> We put a different patch in 2.6, which dropped the reference
> at known points where the packet would be queued, except
> for the qdiscs. We can put the same patch in 2.4.
That would be perfect, could you point me to the git reference to this
patch, please?
Cheers,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-27 18:22 ` Roberto Nibali
@ 2005-11-27 18:49 ` Patrick McHardy
2005-11-28 9:11 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2005-11-27 18:49 UTC (permalink / raw)
To: Roberto Nibali; +Cc: Willy Tarreau, Netfilter Developers, Roberto Nibali
Roberto Nibali wrote:
>>>Is there a reason not to include this patch in 2.4.x?
>>
>>Yes, it turned out to break a lots of things on loopback.
>
>
> Although I don't see how in 2.4.x, I now vaguely remember the bug report.
One of the things it broke was SO_ORIGINAL_DST support for
transparent proxying, which also affects 2.4.
>>We put a different patch in 2.6, which dropped the reference
>>at known points where the packet would be queued, except
>>for the qdiscs. We can put the same patch in 2.4.
>
>
> That would be perfect, could you point me to the git reference to this
> patch, please?
It was commit 84531c24f27b02daa8e54e2bb6dc74a730fdf0a5, titled
"[NETFILTER]: Revert nf_reset change".
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-27 18:49 ` Patrick McHardy
@ 2005-11-28 9:11 ` Roberto Nibali
2005-11-28 9:47 ` Roberto Nibali
0 siblings, 1 reply; 10+ messages in thread
From: Roberto Nibali @ 2005-11-28 9:11 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Willy Tarreau, Netfilter Developers, Roberto Nibali
> One of the things it broke was SO_ORIGINAL_DST support for
> transparent proxying, which also affects 2.4.
Ok.
>> That would be perfect, could you point me to the git reference to this
>> patch, please?
>
> It was commit 84531c24f27b02daa8e54e2bb6dc74a730fdf0a5, titled
> "[NETFILTER]: Revert nf_reset change".
Hmmm, so how about the following approach?
--- linux-2.4.32-orig/include/net/dst.h 2005-04-04 03:42:20 +0200
+++ linux-2.4.32-pab2/include/net/dst.h 2005-11-28 09:42:59 +0100
@@ -105,6 +105,7 @@
void dst_release(struct dst_entry * dst)
{
if (dst) {
+ WARN_ON(atomic_read(&dst->__refcnt) < 1);
smp_mb__before_atomic_dec();
atomic_dec(&dst->__refcnt);
}
diff -X dontdiff -Nur linux-2.4.32-orig/net/packet/af_packet.c
linux-2.4.32-pab2/net/packet/af_packet.c
--- linux-2.4.32-orig/net/packet/af_packet.c 2004-11-17 12:54:22 +0100
+++ linux-2.4.32-pab2/net/packet/af_packet.c 2005-11-28 10:00:27 +0100
@@ -272,6 +272,11 @@
if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
goto oom;
+ /* drop any routing info and conntrack reference */
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ nf_reset(skb);
+
spkt = (struct sockaddr_pkt*)skb->cb;
skb_push(skb, skb->data-skb->mac.raw);
@@ -507,6 +512,12 @@
skb_set_owner_r(skb, sk);
skb->dev = NULL;
+
+ /* drop any routing info and conntrack reference */
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ nf_reset(skb);
+
spin_lock(&sk->receive_queue.lock);
po->stats.tp_packets++;
__skb_queue_tail(&sk->receive_queue, skb);
I'm compiling it now and will be running test, so long as the thing even
boots ;).
I think the WARN_ON could be submitted to 2.4.x anyway since it helps
finding other occurances of wrong refcnt decreasing. Why is the routing
entry dropped in 2.6.x and not in 2.4.x? Maybe I should also cc netdev
as well.
Thanks and best regards,
Roberto Nibali, ratz
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.4] raw table and NOTRACK support
2005-11-28 9:11 ` Roberto Nibali
@ 2005-11-28 9:47 ` Roberto Nibali
0 siblings, 0 replies; 10+ messages in thread
From: Roberto Nibali @ 2005-11-28 9:47 UTC (permalink / raw)
To: Roberto Nibali
Cc: Netfilter Developers, Roberto Nibali, Patrick McHardy,
Willy Tarreau
> Hmmm, so how about the following approach?
>
> --- linux-2.4.32-orig/include/net/dst.h 2005-04-04 03:42:20 +0200
> +++ linux-2.4.32-pab2/include/net/dst.h 2005-11-28 09:42:59 +0100
> @@ -105,6 +105,7 @@
> void dst_release(struct dst_entry * dst)
> {
> if (dst) {
> + WARN_ON(atomic_read(&dst->__refcnt) < 1);
> smp_mb__before_atomic_dec();
> atomic_dec(&dst->__refcnt);
> }
> diff -X dontdiff -Nur linux-2.4.32-orig/net/packet/af_packet.c
> linux-2.4.32-pab2/net/packet/af_packet.c
> --- linux-2.4.32-orig/net/packet/af_packet.c 2004-11-17 12:54:22 +0100
> +++ linux-2.4.32-pab2/net/packet/af_packet.c 2005-11-28 10:00:27 +0100
> @@ -272,6 +272,11 @@
> if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
> goto oom;
>
> + /* drop any routing info and conntrack reference */
> + dst_release(skb->dst);
> + skb->dst = NULL;
> + nf_reset(skb);
> +
> spkt = (struct sockaddr_pkt*)skb->cb;
>
> skb_push(skb, skb->data-skb->mac.raw);
> @@ -507,6 +512,12 @@
>
> skb_set_owner_r(skb, sk);
> skb->dev = NULL;
> +
> + /* drop any routing info and conntrack reference */
> + dst_release(skb->dst);
> + skb->dst = NULL;
> + nf_reset(skb);
> +
> spin_lock(&sk->receive_queue.lock);
> po->stats.tp_packets++;
> __skb_queue_tail(&sk->receive_queue, skb);
>
> I'm compiling it now and will be running test, so long as the thing even
> boots ;).
Hmm, somehow I haven't caught all possible skb releases, a conntrack
flush takes 8m32s. :). I'll add another nf_reset in ipv4/ip_input.c for
ip_call_ra_chain() ... this is getty really fishy.
--
-------------------------------------------------------------
addr://Kasinostrasse 30, CH-5001 Aarau tel://++41 62 823 9355
http://www.terreactive.com fax://++41 62 823 9356
-------------------------------------------------------------
terreActive AG Wir sichern Ihren Erfolg
-------------------------------------------------------------
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-11-28 9:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-21 10:26 [PATCH 2.4] raw table and NOTRACK support Roberto Nibali
2005-11-22 14:14 ` Roberto Nibali
2005-11-22 15:40 ` Roberto Nibali
2005-11-22 15:54 ` Roberto Nibali
2005-11-23 13:04 ` Roberto Nibali
2005-11-27 15:36 ` Patrick McHardy
2005-11-27 18:22 ` Roberto Nibali
2005-11-27 18:49 ` Patrick McHardy
2005-11-28 9:11 ` Roberto Nibali
2005-11-28 9:47 ` Roberto Nibali
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.