[PATCH 1/2] updates for [nf|ct]netlink and event API

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] updates for [nf|ct]netlink and event API
@ 2005-06-27 18:02 Pablo Neira
  2005-06-27 20:26 ` Harald Welte
                   ` (5 more replies)
  0 siblings, 6 replies; 50+ messages in thread
From: Pablo Neira @ 2005-06-27 18:02 UTC (permalink / raw)
  To: Netfilter Development Mailinglist; +Cc: Harald Welte

Hi Harald,

This patchset introduces tons of updates for the nfnetlink, ctnetlink 
and the conntrack event API. I haven't attached the file since it's that 
big, about 100K.

You can get an incremental diff against SVN from:
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch

Please apply.


I've split this big patch above into four pieces to make it easier to 
understand the changes:
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/

So these four patches shouldn't be applied, just they are meant to make 
your life easier to track the changes.

Summary of changes
------------------

o conntrack event API
- Don't kill NFC_IP_* stuff, keep it there to ensure for old iptables 
versions compilation.
- new file ip_conntrack_events.h that contains all event related 
functions to reduce pollution in ip_conntrack.h
- IPCT_DELIVERED bit. Loopback reports event are reported twice, this 
bit is set once event are delivered. I just came up with a better idea, 
reset nfcache once the events have been delivered, but I'll apply this 
change in the next patchset.

o nfnetlink
- kill unused list.
- kill nfnl_exlock(), not needed anymore.
- kill duplicated check: NFNL_SUBSYS_ID(type) > NFNL_SUBSYS_COUNT.
- kill unneeded initialization of subsys_table to NULL, since it's in 
BSS section (already set to zero).
- kill dead define CONFIG_NF_NETLINK.

o ctnetlink
- merge ctnetlink_get_mcgroups and ctnetlink_get_exp_mcgroups
- implemented NAT handlings
- kill unused ctnetlink_kill
- use __u64 id's for conntracks
- stop using NLMSG_DONE to report the end of a dump, use explicite ACK 
instead (NLM_F_ACK).
- fixed broken expectation timeout dumping.
- kill unused ctnetlink_exp_dump_proto
- kill ctnetlink_exp_dump: fairly small and just used once
- kill NFNL_SUBSYS_CTNETLINK_EXP, use NFNL_SUBSYS_CTNETLINK instead
- Fix expectation table dumping
- Fix expectation creation
- implemented flushing of the expect table

TODO
----

- Implement ip_conntrack_stats dumping and reset (accounting)
- Implement get conntrack and destroy (accounting)
- Kill event/dump mask based (?). Although it's unique, I think that it 
could be useful for weak conntrack event notification (think of just 
new, established and destroy event notification to reduce performance 
impact).

Once ip_conntrack_netlink gets fully featured and people don't report 
bugs for quite some time. I'll create a nf_conntrack_netlink tree.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
@ 2005-06-27 20:26 ` Harald Welte
  2005-06-28  2:00   ` Pablo Neira
  2005-06-27 21:31 ` Patrick McHardy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Harald Welte @ 2005-06-27 20:26 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]

On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote:
> Hi Harald,
> 
> This patchset introduces tons of updates for the nfnetlink, ctnetlink 
> and the conntrack event API. I haven't attached the file since it's that 
> big, about 100K.

> You can get an incremental diff against SVN from:
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch

the patch removes the whole old ip_conntrack_netlink.c file and replaces
it with a new one.  I don't really see why.  An, now, you changed the
directory name from 2.6.11 to 2.6.12.  Is this neccessarry?  Why would
your recent changes to ctnetlink not work with a 2.6.11 kernel?

If there is no good reason, I would suggest to just patch the existing
files in the 2.6.11 subdirectory.

> I've split this big patch above into four pieces to make it easier to 
> understand the changes:
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/

please remove stuff like

> -#if 0
> +#if 1
>  #define DEBUGP printk

before releasing a patch, thanks.

Also, I would rather not like to have NFNL_SUBSYS_CTNETLINK_EXP deleted.
I thought it would be nice to have expect handling separated from
conntracks, since they really are two seperate data structures.

Apart from that I'm fine with all of your modifications.

I don't understand why the order of the expectation list was changed,
though.  Would you please explain why this was done?

> - Implement ip_conntrack_stats dumping and reset (accounting)

you want to dump the statistics via netlink?  i'm not sure whether that
is required.  lnstat is the only program using those counters, and using
/proc seems fine for that.

> - Implement get conntrack and destroy (accounting)

sorry, what are you referring to?

> - Kill event/dump mask based (?). Although it's unique, I think that it 
> could be useful for weak conntrack event notification (think of just 
> new, established and destroy event notification to reduce performance 
> impact).

well, an interesting optimization.  My major problem with this whole
system (like the other IPCTNL_MSG_CONFIG stuff) is: how to correctly
deal with multipel users.  

Now, I think there is a way to do it right.  Basically every application
would only request the events that it is interested in, and the kernel
would bitwise-or those events to create the event mask that actually is
to be dumped into userspace.

It's a bit like multicast membership subscription with IGMP...

So the only question remaining is: how does the kernel clean up / expire
the masks?  If apps only subscribe but never unsubscribe, we would never
clean up.

I'm not sure if we could somehow (cleanly) tie into the scoket
close/destroy code.  If yes, we could check if we can reduce the mask at
socket destroy time.

Another approach would be timer-based, so applications would have to
re-subscribe to 'their' groups in regular intervals if they are still
interested in certain events.  Sounds racy and ugly, though :(

I'm always open for better solutions, though :)

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 20:26 ` Harald Welte
@ 2005-06-28  2:00   ` Pablo Neira
  2005-06-28  2:12     ` Pablo Neira
                       ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Pablo Neira @ 2005-06-28  2:00 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter Development Mailinglist, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 4509 bytes --]

Harald Welte wrote:
> the patch removes the whole old ip_conntrack_netlink.c file and replaces
> it with a new one.  I don't really see why.  An, now, you changed the
> directory name from 2.6.11 to 2.6.12.  Is this neccessarry?  Why would
> your recent changes to ctnetlink not work with a 2.6.11 kernel?

I did that because of the file linux-2.6.patch, the 2.6.11 version 
differs from the new 2.6.12 one. Can we have in pom-ng two different 
file patches for every version? I mean, something like 
linux-2.6.11.patch and linux-2.6.12.patch living in ctnetlink. If so, 
please add the patch attached to ctnetlink.

>>I've split this big patch above into four pieces to make it easier to 
>>understand the changes:
>>http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/
> 
> please remove stuff like
> 
>>-#if 0
>>+#if 1
>> #define DEBUGP printk
> 
> before releasing a patch, thanks.

OK, I'll do so.

> Also, I would rather not like to have NFNL_SUBSYS_CTNETLINK_EXP deleted.
> I thought it would be nice to have expect handling separated from
> conntracks, since they really are two seperate data structures.
> 
> Apart from that I'm fine with all of your modifications.
> 
> I don't understand why the order of the expectation list was changed,
> though.  Would you please explain why this was done?

Because of the table dumping logic I was using.

-                       if (ct->id <= *id)
+                       if (ct->id >= *id)
                                 continue;

I needed to add the conntracks at the tail. This isn't really needed so 
I've just reverted this change in the linux-2.6.12.patch attached.

Please see that you have to apply the patch 05ctnetlink.patch to 
ctnetlink as well.

>>- Implement ip_conntrack_stats dumping and reset (accounting)
> 
> you want to dump the statistics via netlink?  i'm not sure whether that
> is required.  lnstat is the only program using those counters, and using
> /proc seems fine for that.

Could this be useful for accounting purposes? I mean, dump and reset as 
a single operation, something similar to conntrack -L -z, say conntrack 
-S -z.

>>- Implement get conntrack and destroy (accounting)
> 
> sorry, what are you referring to?

An atomical operation to dump conntrack information and then destroy it. 
For example, if someone does:

conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp 
--orig-src-port 10 --orig-dst-port 20

the it sends the conntrack info and destroy it. This could be worth it 
for accounting.

>>- Kill event/dump mask based (?). Although it's unique, I think that it 
>>could be useful for weak conntrack event notification (think of just 
>>new, established and destroy event notification to reduce performance 
>>impact).
> 
> 
> well, an interesting optimization.  My major problem with this whole
> system (like the other IPCTNL_MSG_CONFIG stuff) is: how to correctly
> deal with multipel users.  

Yes, this is a problem.

> Now, I think there is a way to do it right.  Basically every application
> would only request the events that it is interested in, and the kernel
> would bitwise-or those events to create the event mask that actually is
> to be dumped into userspace.
> 
> It's a bit like multicast membership subscription with IGMP...
> 
> So the only question remaining is: how does the kernel clean up / expire
> the masks?  If apps only subscribe but never unsubscribe, we would never
> clean up.
> 
> I'm not sure if we could somehow (cleanly) tie into the scoket
> close/destroy code.  If yes, we could check if we can reduce the mask at
> socket destroy time.

I think there's no sane way to do this but let me check the netlink code 
again. I could use multicast netlink subscrition, so the client can get 
subscribed to those events that the user considers interesting.

Initially the purpose of the mask based event notification was reducing 
the impact of an socket overrun under big stress situations. Say you've 
got a firewall very loaded, so conntrack-netlink subsys has to send tons 
of update messages, in that case the buffer can overrun and some 
messages will be drop.

To avoid this I could use an approach similar to what you do in ULOG and 
just keep the mask in userspace. Anyway, this hurts performance as well, 
since unneeded messages will be delivered to userspace and dropped if 
the client doesn't consider them interesting. This solution isn't that 
bad though since the conntrack-netlink subsys will send burst of 
messages in once.

--
Pablo

[-- Attachment #2: linux-2.6.12.patch --]
[-- Type: text/x-patch, Size: 26285 bytes --]

Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_amanda.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c	2005-06-28 02:28:44.000000000 +0200
@@ -151,6 +151,7 @@
 	.mask = { .src = { .u = { 0xFFFF } },
 		 .dst = { .protonum = 0xFF },
 	},
+	.change_help = ip_ct_generic_change_help,
 };
 
 static void __exit fini(void)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c	2005-06-28 02:44:45.000000000 +0200
@@ -76,6 +76,8 @@
 static LIST_HEAD(unconfirmed);
 static int ip_conntrack_vmalloc;
 
+static unsigned int ip_conntrack_next_id = 1;
+static unsigned int ip_conntrack_expect_next_id = 1;
 #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS
 struct notifier_block *ip_conntrack_chain;
 struct notifier_block *ip_conntrack_expect_chain;
@@ -83,13 +85,6 @@
 
 DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat);
 
-void 
-ip_conntrack_put(struct ip_conntrack *ct)
-{
-	IP_NF_ASSERT(ct);
-	nf_conntrack_put(&ct->ct_general);
-}
-
 static int ip_conntrack_hash_rnd_initted;
 static unsigned int ip_conntrack_hash_rnd;
 
@@ -146,7 +141,7 @@
 {
 	ip_conntrack_put(exp->master);
 	IP_NF_ASSERT(!timer_pending(&exp->timeout));
-	kmem_cache_free(ip_conntrack_expect_cachep, exp);
+	ip_conntrack_expect_free(exp);
 	CONNTRACK_STAT_INC(expect_delete);
 }
 
@@ -158,6 +153,13 @@
 	exp->master->expecting--;
 }
 
+void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp)
+{
+	MUST_BE_WRITE_LOCKED(&ip_conntrack_lock);
+	unlink_expect(exp);
+	destroy_expect(exp);
+}
+
 static void expectation_timed_out(unsigned long ul_expect)
 {
 	struct ip_conntrack_expect *exp = (void *)ul_expect;
@@ -169,6 +171,33 @@
 	destroy_expect(exp);
 }
 
+struct ip_conntrack_expect *
+__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple)
+{
+	struct ip_conntrack_expect *i;
+	
+	list_for_each_entry(i, &ip_conntrack_expect_list, list) {
+		if (ip_ct_tuple_mask_cmp(tuple, &i->tuple, &i->mask)) {
+			atomic_inc(&i->use);
+			return i;
+		}
+	}
+	return NULL;
+}
+
+/* Just find a expectation corresponding to a tuple. */
+struct ip_conntrack_expect *
+ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple)
+{
+	struct ip_conntrack_expect *i;
+	
+	READ_LOCK(&ip_conntrack_lock);
+	i = __ip_conntrack_expect_find(tuple);
+	READ_UNLOCK(&ip_conntrack_lock);
+
+	return i;
+}
+
 /* If an expectation for this connection is found, it gets delete from
  * global list then returned. */
 static struct ip_conntrack_expect *
@@ -193,7 +222,7 @@
 }
 
 /* delete all expectations for this conntrack */
-static void remove_expectations(struct ip_conntrack *ct)
+void ip_ct_remove_expectations(struct ip_conntrack *ct)
 {
 	struct ip_conntrack_expect *i, *tmp;
 
@@ -223,7 +252,7 @@
 	LIST_DELETE(&ip_conntrack_hash[hr], &ct->tuplehash[IP_CT_DIR_REPLY]);
 
 	/* Destroy all pending expectations */
-	remove_expectations(ct);
+	ip_ct_remove_expectations(ct);
 }
 
 static void
@@ -253,7 +282,7 @@
 	 * except TFTP can create an expectation on the first packet,
 	 * before connection is in the list, so we need to clean here,
 	 * too. */
-	remove_expectations(ct);
+	ip_ct_remove_expectations(ct);
 
 	/* We overload first tuple to link into unconfirmed list. */
 	if (!is_confirmed(ct)) {
@@ -268,8 +297,7 @@
 		ip_conntrack_put(ct->master);
 
 	DEBUGP("destroy_conntrack: returning ct=%p to slab\n", ct);
-	kmem_cache_free(ip_conntrack_cachep, ct);
-	atomic_dec(&ip_conntrack_count);
+	ip_conntrack_free(ct);
 }
 
 static void death_by_timeout(unsigned long ul_conntrack)
@@ -296,7 +324,7 @@
 		&& ip_ct_tuple_equal(tuple, &i->tuple);
 }
 
-static struct ip_conntrack_tuple_hash *
+struct ip_conntrack_tuple_hash *
 __ip_conntrack_find(const struct ip_conntrack_tuple *tuple,
 		    const struct ip_conntrack *ignored_conntrack)
 {
@@ -331,6 +359,27 @@
 	return h;
 }
 
+static void __ip_conntrack_hash_insert(struct ip_conntrack *ct) 
+{
+	size_t hash, repl_hash;
+	
+	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+	
+	list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, 
+		 &ip_conntrack_hash[hash]);
+	list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list,
+		 &ip_conntrack_hash[repl_hash]);
+	ct->id = ++ip_conntrack_next_id;
+}
+
+void ip_conntrack_hash_insert(struct ip_conntrack *ct)
+{
+	WRITE_LOCK(&ip_conntrack_lock);
+	__ip_conntrack_hash_insert(ct);
+	WRITE_UNLOCK(&ip_conntrack_lock);
+}
+
 /* Confirm a connection given skb; places it in hash table */
 int
 __ip_conntrack_confirm(struct sk_buff **pskb)
@@ -377,10 +426,10 @@
 		/* Remove from unconfirmed list */
 		list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list);
 
-		list_prepend(&ip_conntrack_hash[hash],
-			     &ct->tuplehash[IP_CT_DIR_ORIGINAL]);
-		list_prepend(&ip_conntrack_hash[repl_hash],
-			     &ct->tuplehash[IP_CT_DIR_REPLY]);
+		list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, 
+			 &ip_conntrack_hash[hash]);
+		list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list,
+			 &ip_conntrack_hash[repl_hash]);
 		/* Timer relative to confirmation time, not original
 		   setting time, otherwise we'd get timer wrap in
 		   weird delay cases. */
@@ -398,6 +447,7 @@
 #endif
 		ip_conntrack_event_cache(master_ct(ct) ?
 					 IPCT_RELATED : IPCT_NEW, *pskb);
+		ct->id = ++ip_conntrack_next_id;
 
 		return NF_ACCEPT;
 	}
@@ -432,7 +482,7 @@
 
 static int early_drop(struct list_head *chain)
 {
-	/* Traverse forwards: gives us oldest, which is roughly LRU */
+	/* Traverse backwards: gives us oldest, which is roughly LRU */
 	struct ip_conntrack_tuple_hash *h;
 	struct ip_conntrack *ct = NULL;
 	int dropped = 0;
@@ -463,7 +513,7 @@
 	return ip_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask);
 }
 
-static struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple)
+struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple)
 {
 	return LIST_FIND(&helpers, helper_cmp,
 			 struct ip_conntrack_helper *,
@@ -472,22 +522,18 @@
 
 /* Allocate a new conntrack: we return -ENOMEM if classification
    failed due to stress.  Otherwise it really is unclassifiable. */
-static struct ip_conntrack_tuple_hash *
-init_conntrack(const struct ip_conntrack_tuple *tuple,
-	       struct ip_conntrack_protocol *protocol,
-	       struct sk_buff *skb)
+struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *orig,
+					struct ip_conntrack_tuple *repl)
 {
 	struct ip_conntrack *conntrack;
-	struct ip_conntrack_tuple repl_tuple;
 	size_t hash;
-	struct ip_conntrack_expect *exp;
 
 	if (!ip_conntrack_hash_rnd_initted) {
 		get_random_bytes(&ip_conntrack_hash_rnd, 4);
 		ip_conntrack_hash_rnd_initted = 1;
 	}
 
-	hash = hash_conntrack(tuple);
+	hash = hash_conntrack(orig);
 
 	if (ip_conntrack_max
 	    && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) {
@@ -501,11 +547,6 @@
 		}
 	}
 
-	if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) {
-		DEBUGP("Can't invert tuple.\n");
-		return NULL;
-	}
-
 	conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC);
 	if (!conntrack) {
 		DEBUGP("Can't allocate conntrack.\n");
@@ -515,17 +556,51 @@
 	memset(conntrack, 0, sizeof(*conntrack));
 	atomic_set(&conntrack->ct_general.use, 1);
 	conntrack->ct_general.destroy = destroy_conntrack;
-	conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple;
-	conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = repl_tuple;
-	if (!protocol->new(conntrack, skb)) {
-		kmem_cache_free(ip_conntrack_cachep, conntrack);
-		return NULL;
-	}
+	conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig;
+	conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *repl;
+
 	/* Don't set timer yet: wait for confirmation */
 	init_timer(&conntrack->timeout);
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
+	atomic_inc(&ip_conntrack_count);
+
+	return conntrack;
+}
+
+void
+ip_conntrack_free(struct ip_conntrack *conntrack)
+{
+	atomic_dec(&ip_conntrack_count);
+	kmem_cache_free(ip_conntrack_cachep, conntrack);
+}
+
+static struct ip_conntrack_tuple_hash *
+init_conntrack(struct ip_conntrack_tuple *tuple,
+	       struct ip_conntrack_protocol *protocol,
+	       struct sk_buff *skb)
+{
+	struct ip_conntrack *conntrack;
+	struct ip_conntrack_tuple repl_tuple;
+	struct ip_conntrack_expect *exp;
+
+	if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) {
+		DEBUGP("Can't invert tuple.\n");
+		return NULL;
+	}
+
+	if (!(conntrack = ip_conntrack_alloc(tuple, &repl_tuple)))
+		return NULL;
+
+	if (IS_ERR(conntrack))
+		return (void *) conntrack;
+
+	if (!protocol->new(conntrack, skb)) {
+		ip_conntrack_free(conntrack);
+		return NULL;
+	}
+
 	WRITE_LOCK(&ip_conntrack_lock);
 	exp = find_expectation(tuple);
 
@@ -549,7 +624,6 @@
 	/* Overload tuple linked list to put us in unconfirmed list. */
 	list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed);
 
-	atomic_inc(&ip_conntrack_count);
 	WRITE_UNLOCK(&ip_conntrack_lock);
 
 	if (exp) {
@@ -765,13 +839,15 @@
 		DEBUGP("expect_related: OOM allocating expect\n");
 		return NULL;
 	}
+	atomic_set(&new->use, 0);
 	new->master = NULL;
 	return new;
 }
 
 void ip_conntrack_expect_free(struct ip_conntrack_expect *expect)
 {
-	kmem_cache_free(ip_conntrack_expect_cachep, expect);
+	if (atomic_dec_and_test(&expect->use))
+		kmem_cache_free(ip_conntrack_expect_cachep, expect);
 }
 
 static void ip_conntrack_expect_insert(struct ip_conntrack_expect *exp)
@@ -790,6 +866,9 @@
 	} else
 		exp->timeout.function = NULL;
 
+	exp->id = ++ip_conntrack_expect_next_id;
+
+	atomic_inc(&exp->use);
 	CONNTRACK_STAT_INC(expect_create);
 }
 
@@ -1018,6 +1097,32 @@
 	nf_conntrack_get(nskb->nfct);
 }
 
+void ip_ct_generic_change_proto(struct ip_conntrack *ct, 
+				union ip_conntrack_proto *p)
+{
+	struct ip_conntrack_protocol *proto;
+	struct ip_conntrack_tuple_hash *th = &ct->tuplehash[IP_CT_DIR_REPLY];
+
+	proto = ip_ct_find_proto(th->tuple.dst.protonum);
+	if (proto->lock != NULL) {
+		write_lock_bh(proto->lock);
+		memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto));
+		write_unlock_bh(proto->lock);
+	} else
+		memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto));
+}
+
+void ip_ct_generic_change_help(struct ip_conntrack *ct, 
+			       union ip_conntrack_help *h)
+{
+	if (ct->helper->lock != NULL) {
+		spin_lock_bh(ct->helper->lock);
+		memcpy(&ct->help, h, sizeof(ct->help));
+		spin_unlock_bh(ct->helper->lock);
+	} else
+		memcpy(&ct->help, h, sizeof(ct->help));
+}
+
 static inline int
 do_iter(const struct ip_conntrack_tuple_hash *i,
 	int (*iter)(struct ip_conntrack *i, void *data),
@@ -1144,23 +1249,27 @@
 				     * ip_conntrack_htable_size));
 }
 
-/* Mishearing the voices in his head, our hero wonders how he's
-   supposed to kill the mall. */
-void ip_conntrack_cleanup(void)
+void ip_conntrack_flush()
 {
-	ip_ct_attach = NULL;
 	/* This makes sure all current packets have passed through
            netfilter framework.  Roll on, two-stage module
            delete... */
 	synchronize_net();
- 
+
  i_see_dead_people:
 	ip_ct_iterate_cleanup(kill_all, NULL);
 	if (atomic_read(&ip_conntrack_count) != 0) {
 		schedule();
 		goto i_see_dead_people;
 	}
+}
 
+/* Mishearing the voices in his head, our hero wonders how he's
+   supposed to kill the mall. */
+void ip_conntrack_cleanup(void)
+{
+	ip_ct_attach = NULL;
+	ip_conntrack_flush();
 	kmem_cache_destroy(ip_conntrack_cachep);
 	kmem_cache_destroy(ip_conntrack_expect_cachep);
 	free_conntrack_hash();
@@ -1258,6 +1367,8 @@
 	/*  - and look it like as a confirmed connection */
 	set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status);
 
+	printk("untracked: %p\n", &ip_conntrack_untracked);
+
 	return ret;
 
 err_free_conntrack_slab:
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_ftp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c	2005-06-28 02:28:44.000000000 +0200
@@ -481,6 +481,8 @@
 		ftp[i].timeout = 5 * 60; /* 5 minutes */
 		ftp[i].me = THIS_MODULE;
 		ftp[i].help = help;
+		ftp[i].lock = &ip_ftp_lock;
+		ftp[i].change_help = ip_ct_generic_change_help;
 
 		tmpname = &ftp_names[i][0];
 		if (ports[i] == FTP_PORT)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_irc.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c	2005-06-28 02:28:44.000000000 +0200
@@ -275,6 +275,8 @@
 		hlpr->timeout = dcc_timeout;
 		hlpr->me = THIS_MODULE;
 		hlpr->help = help;
+		hlpr->lock = &irc_buffer_lock;
+		hlpr->change_help = ip_ct_generic_change_help;
 
 		tmpname = &irc_names[i][0];
 		if (ports[i] == IRC_PORT)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2005-06-28 02:28:44.000000000 +0200
@@ -109,16 +109,17 @@
 	return NF_ACCEPT;
 }
 
+static u_int8_t valid_new[] = { 
+	[ICMP_ECHO] = 1,
+	[ICMP_TIMESTAMP] = 1,
+	[ICMP_INFO_REQUEST] = 1,
+	[ICMP_ADDRESS] = 1 
+};
+
 /* Called when a new connection for this protocol found. */
 static int icmp_new(struct ip_conntrack *conntrack,
 		    const struct sk_buff *skb)
 {
-	static u_int8_t valid_new[]
-		= { [ICMP_ECHO] = 1,
-		    [ICMP_TIMESTAMP] = 1,
-		    [ICMP_INFO_REQUEST] = 1,
-		    [ICMP_ADDRESS] = 1 };
-
 	if (conntrack->tuplehash[0].tuple.dst.u.icmp.type >= sizeof(valid_new)
 	    || !valid_new[conntrack->tuplehash[0].tuple.dst.u.icmp.type]) {
 		/* Can't create a new ICMP `conn' with this. */
@@ -266,6 +267,17 @@
 	return icmp_error_message(skb, ctinfo, hooknum);
 }
 
+static int icmp_change_check_tuples(struct ip_conntrack_tuple *orig,
+				    struct ip_conntrack_tuple *reply)
+{
+	unsigned int type = orig->dst.u.icmp.type;
+
+	if (type >= sizeof(valid_new) || !valid_new[type])
+		return -EINVAL;
+
+	return 0;
+}
+
 struct ip_conntrack_protocol ip_conntrack_protocol_icmp =
 {
 	.proto 			= IPPROTO_ICMP,
@@ -277,4 +289,6 @@
 	.packet			= icmp_packet,
 	.new			= icmp_new,
 	.error			= icmp_error,
+	.change_check_tuples    = icmp_change_check_tuples,
+	.change_proto           = ip_ct_generic_change_proto,
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c	2005-06-28 02:28:44.000000000 +0200
@@ -499,6 +499,7 @@
 static struct ip_conntrack_protocol ip_conntrack_protocol_sctp = { 
 	.proto 		 = IPPROTO_SCTP, 
 	.name 		 = "sctp",
+	.lock		 = &sctp_lock,
 	.pkt_to_tuple 	 = sctp_pkt_to_tuple, 
 	.invert_tuple 	 = sctp_invert_tuple, 
 	.print_tuple 	 = sctp_print_tuple, 
@@ -506,7 +507,8 @@
 	.packet 	 = sctp_packet, 
 	.new 		 = sctp_new, 
 	.destroy 	 = NULL, 
-	.me 		 = THIS_MODULE 
+	.me 		 = THIS_MODULE,
+	.change_proto	 = ip_ct_generic_change_proto,
 };
 
 #ifdef CONFIG_SYSCTL
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2005-06-28 02:28:44.000000000 +0200
@@ -1094,6 +1094,7 @@
 {
 	.proto 			= IPPROTO_TCP,
 	.name 			= "tcp",
+	.lock			= &tcp_lock,
 	.pkt_to_tuple 		= tcp_pkt_to_tuple,
 	.invert_tuple 		= tcp_invert_tuple,
 	.print_tuple 		= tcp_print_tuple,
@@ -1101,4 +1102,5 @@
 	.packet 		= tcp_packet,
 	.new 			= tcp_new,
 	.error			= tcp_error,
+	.change_proto           = ip_ct_generic_change_proto
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c	2005-06-28 02:28:44.000000000 +0200
@@ -144,4 +144,5 @@
 	.packet			= udp_packet,
 	.new			= udp_new,
 	.error			= udp_error,
+	.change_proto           = ip_ct_generic_change_proto
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_standalone.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c	2005-06-28 02:28:44.000000000 +0200
@@ -963,6 +963,19 @@
 {
 }
 
+EXPORT_SYMBOL(ip_ct_unlink_destroy_expect);
+EXPORT_SYMBOL(ip_conntrack_flush);
+EXPORT_SYMBOL(__ip_conntrack_expect_find);
+EXPORT_SYMBOL(__ip_conntrack_find);
+EXPORT_SYMBOL(ip_conntrack_expect_list);
+EXPORT_SYMBOL(ip_ct_generic_change_proto);
+EXPORT_SYMBOL(ip_ct_generic_change_help);
+EXPORT_SYMBOL(ip_conntrack_expect_find_get);
+EXPORT_SYMBOL(ip_conntrack_alloc);
+EXPORT_SYMBOL(ip_conntrack_free);
+EXPORT_SYMBOL(ip_conntrack_hash_insert);
+EXPORT_SYMBOL(ip_ct_remove_expectations);
+EXPORT_SYMBOL(ip_ct_find_helper);
 #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS
 EXPORT_SYMBOL(ip_conntrack_chain);
 EXPORT_SYMBOL(ip_conntrack_expect_chain);
@@ -993,7 +1006,6 @@
 EXPORT_SYMBOL(ip_conntrack_hash);
 EXPORT_SYMBOL(ip_conntrack_untracked);
 EXPORT_SYMBOL_GPL(ip_conntrack_find_get);
-EXPORT_SYMBOL_GPL(ip_conntrack_put);
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 EXPORT_SYMBOL(ip_conntrack_tcp_update);
 #endif
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_tftp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c	2005-06-28 02:28:44.000000000 +0200
@@ -134,6 +134,7 @@
 		tftp[i].timeout = 5 * 60; /* 5 minutes */
 		tftp[i].me = THIS_MODULE;
 		tftp[i].help = tftp_help;
+		tftp[i].change_help = ip_ct_generic_change_help;
 
 		tmpname = &tftp_names[i][0];
 		if (ports[i] == TFTP_PORT)
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_amanda.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h	2005-06-28 02:28:44.000000000 +0200
@@ -3,9 +3,11 @@
 /* AMANDA tracking. */
 
 struct ip_conntrack_expect;
+#ifdef __KERNEL__
 extern unsigned int (*ip_nat_amanda_hook)(struct sk_buff **pskb,
 					  enum ip_conntrack_info ctinfo,
 					  unsigned int matchoff,
 					  unsigned int matchlen,
 					  struct ip_conntrack_expect *exp);
+#endif
 #endif /* _IP_CONNTRACK_AMANDA_H */
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_ftp.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h	2005-06-28 02:28:44.000000000 +0200
@@ -31,6 +31,7 @@
 
 struct ip_conntrack_expect;
 
+#ifdef __KERNEL__
 /* For NAT to hook in when we find a packet which describes what other
  * connection we should expect. */
 extern unsigned int (*ip_nat_ftp_hook)(struct sk_buff **pskb,
@@ -40,4 +41,5 @@
 				       unsigned int matchlen,
 				       struct ip_conntrack_expect *exp,
 				       u32 *seq);
+#endif
 #endif /* _IP_CONNTRACK_FTP_H */
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h	2005-06-28 02:28:44.000000000 +0200
@@ -71,13 +71,13 @@
 	IPS_DESTROYED = (1 << IPS_DESTROYED_BIT),
 };
 
-#ifdef __KERNEL__
-#include <linux/config.h>
-#include <linux/netfilter_ipv4/ip_conntrack_tuple.h>
-#include <linux/bitops.h>
-#include <linux/compiler.h>
-#include <asm/atomic.h>
+struct ip_conntrack_counter
+{
+	u_int64_t packets;
+	u_int64_t bytes;
+};
 
+#include <linux/netfilter_ipv4/ip_conntrack_tuple.h>
 #include <linux/netfilter_ipv4/ip_conntrack_tcp.h>
 #include <linux/netfilter_ipv4/ip_conntrack_icmp.h>
 #include <linux/netfilter_ipv4/ip_conntrack_sctp.h>
@@ -106,6 +106,7 @@
 	struct ip_ct_irc_master ct_irc_info;
 };
 
+#ifdef __KERNEL__
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 #include <linux/netfilter_ipv4/ip_nat.h>
 #endif
@@ -126,12 +127,6 @@
 #define IP_NF_ASSERT(x)
 #endif
 
-struct ip_conntrack_counter
-{
-	u_int64_t packets;
-	u_int64_t bytes;
-};
-
 struct ip_conntrack_helper;
 
 struct ip_conntrack
@@ -140,6 +135,9 @@
            plus 1 for any connection(s) we are `master' for */
 	struct nf_conntrack ct_general;
 
+	/* Unique ID that identifies this conntrack*/
+	u_int64_t id;
+
 	/* Have we seen traffic both ways yet? (bitset) */
 	unsigned long status;
 
@@ -188,6 +186,9 @@
 	/* Internal linked list (global expectation list) */
 	struct list_head list;
 
+	/* Expectation ID */
+	__u64 id;
+
 	/* We expect this tuple, with the following mask */
 	struct ip_conntrack_tuple tuple, mask;
  
@@ -201,6 +202,8 @@
 	/* Timer function; deletes the expectation. */
 	struct timer_list timeout;
 
+	atomic_t use;
+
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 	/* This is the original per-proto part, used to map the
 	 * expected connection the way the recipient expects. */
@@ -240,7 +243,12 @@
 }
 
 /* decrement reference count on a conntrack */
-extern inline void ip_conntrack_put(struct ip_conntrack *ct);
+static inline void
+ip_conntrack_put(struct ip_conntrack *ct)
+{
+	IP_NF_ASSERT(ct);
+	nf_conntrack_put(&ct->ct_general);
+}
 
 /* call to create an explicit dependency on ip_conntrack. */
 extern void need_ip_conntrack(void);
@@ -275,6 +283,34 @@
 ip_ct_iterate_cleanup(int (*iter)(struct ip_conntrack *i, void *data),
 		      void *data);
 
+extern struct ip_conntrack_helper *
+ip_ct_find_helper(const struct ip_conntrack_tuple *tuple);
+
+extern void ip_ct_remove_expectations(struct ip_conntrack *ct);
+
+extern struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *,
+					       struct ip_conntrack_tuple *);
+
+extern void ip_conntrack_free(struct ip_conntrack *ct);
+
+extern void ip_conntrack_hash_insert(struct ip_conntrack *ct);
+
+extern struct ip_conntrack_expect *
+__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple);
+
+extern struct ip_conntrack_expect *
+ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple);
+
+extern struct ip_conntrack_tuple_hash *
+__ip_conntrack_find(const struct ip_conntrack_tuple *tuple,
+                    const struct ip_conntrack *ignored_conntrack);
+
+extern inline void ip_conntrack_expect_put(struct ip_conntrack_expect *exp);
+
+extern void ip_conntrack_flush(void);
+
+extern void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp);
+
 /* It's confirmed if it is, or has been in the hash table. */
 static inline int is_confirmed(struct ip_conntrack *ct)
 {
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_helper.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h	2005-06-28 02:28:44.000000000 +0200
@@ -9,6 +9,8 @@
 {	
 	struct list_head list; 		/* Internal use. */
 
+	spinlock_t *lock;		/* protect private info and buffer */
+
 	const char *name;		/* name of the module */
 	struct module *me;		/* pointer to self */
 	unsigned int max_expected;	/* Maximum number of concurrent 
@@ -24,6 +26,8 @@
 	int (*help)(struct sk_buff **pskb,
 		    struct ip_conntrack *ct,
 		    enum ip_conntrack_info conntrackinfo);
+
+	void (*change_help)(struct ip_conntrack *, union ip_conntrack_help *);
 };
 
 extern int ip_conntrack_helper_register(struct ip_conntrack_helper *);
@@ -38,4 +42,7 @@
 extern int ip_conntrack_expect_related(struct ip_conntrack_expect *exp);
 extern void ip_conntrack_unexpect_related(struct ip_conntrack_expect *exp);
 
+extern void ip_ct_generic_change_help(struct ip_conntrack *ct,
+				      union ip_conntrack_help *h);
+
 #endif /*_IP_CONNTRACK_HELPER_H*/
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h	2005-06-28 02:28:44.000000000 +0200
@@ -10,6 +10,8 @@
 	/* Protocol number. */
 	u_int8_t proto;
 
+	rwlock_t *lock;
+
 	/* Protocol name */
 	const char *name;
 
@@ -47,6 +49,17 @@
 	int (*error)(struct sk_buff *skb, enum ip_conntrack_info *ctinfo,
 		     unsigned int hooknum);
 
+	/* check if tuples are valid for a new connection */
+	int (*change_check_tuples)(struct ip_conntrack_tuple *orig,
+				   struct ip_conntrack_tuple *reply);
+
+	/* check protocol data is valid */
+	int (*change_check_proto)(union ip_conntrack_proto *p);
+
+	/* change protocol info on behalf of ctnetlink */
+	void (*change_proto)(struct ip_conntrack *ct,
+			     union ip_conntrack_proto *p);
+	
 	/* Module (if any) which this is connected to. */
 	struct module *me;
 };
@@ -57,6 +70,8 @@
 /* Protocol registration. */
 extern int ip_conntrack_protocol_register(struct ip_conntrack_protocol *proto);
 extern void ip_conntrack_protocol_unregister(struct ip_conntrack_protocol *proto);
+extern void ip_ct_generic_change_proto(struct ip_conntrack *conntrack,
+				       union ip_conntrack_proto *p);
 
 static inline struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol)
 {

[-- Attachment #3: 05ctnetlink.patch --]
[-- Type: text/x-patch, Size: 1171 bytes --]

Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_netlink.c	2005-06-28 02:40:30.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c	2005-06-28 02:41:10.000000000 +0200
@@ -455,7 +455,7 @@
 			if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = tuplehash_to_ctrack(h);
-			if (ct->id <= *id)
+			if (ct->id >= *id)
 				continue;
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 		                        	cb->nlh->nlmsg_seq,
@@ -491,7 +491,7 @@
 			if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = tuplehash_to_ctrack(h);
-			if (ct->id <= *id)
+			if (ct->id >= *id)
 				continue;
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 		                        	cb->nlh->nlmsg_seq,
@@ -1060,7 +1060,7 @@
 	READ_LOCK(&ip_conntrack_lock);
 	list_for_each(i, &ip_conntrack_expect_list) {
 		exp = (struct ip_conntrack_expect *) i;
-		if (exp->id <= *id)
+		if (exp->id >= *id)
 			continue;
 		if (ctnetlink_exp_fill_info(skb, NETLINK_CB(cb->skb).pid,
 					    cb->nlh->nlmsg_seq,

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:00   ` Pablo Neira
@ 2005-06-28  2:12     ` Pablo Neira
  2005-06-28  2:15       ` Pablo Neira
  2005-06-28  3:53     ` Patrick McHardy
  2005-06-28  7:06     ` Harald Welte
  2 siblings, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-28  2:12 UTC (permalink / raw)
  To: Pablo Neira
  Cc: Harald Welte, Netfilter Development Mailinglist, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

Pablo Neira wrote:
> Harald Welte wrote:
>> I don't understand why the order of the expectation list was changed,
>> though.  Would you please explain why this was done?
> 
> 
> Because of the table dumping logic I was using.
> 
> -                       if (ct->id <= *id)
> +                       if (ct->id >= *id)
>                                 continue;

Please ignore the patches I've sent you.

I need to insert conntracks and expectations at the tail because of the 
nature of netlink dumping. *id (that points to cb->args[1]) will be 
initialized to zero, so I need to insert conntracks ordered from lesser 
to bigger id's, otherwise I'd need to set *id to ~0UL and that's not 
possible.

Please, add the following patch to ctnetlink.

--
Pablo

[-- Attachment #2: linux-2.6.12.patch --]
[-- Type: text/x-patch, Size: 26285 bytes --]

Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_amanda.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c	2005-06-28 02:28:44.000000000 +0200
@@ -151,6 +151,7 @@
 	.mask = { .src = { .u = { 0xFFFF } },
 		 .dst = { .protonum = 0xFF },
 	},
+	.change_help = ip_ct_generic_change_help,
 };
 
 static void __exit fini(void)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c	2005-06-28 02:44:45.000000000 +0200
@@ -76,6 +76,8 @@
 static LIST_HEAD(unconfirmed);
 static int ip_conntrack_vmalloc;
 
+static unsigned int ip_conntrack_next_id = 1;
+static unsigned int ip_conntrack_expect_next_id = 1;
 #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS
 struct notifier_block *ip_conntrack_chain;
 struct notifier_block *ip_conntrack_expect_chain;
@@ -83,13 +85,6 @@
 
 DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat);
 
-void 
-ip_conntrack_put(struct ip_conntrack *ct)
-{
-	IP_NF_ASSERT(ct);
-	nf_conntrack_put(&ct->ct_general);
-}
-
 static int ip_conntrack_hash_rnd_initted;
 static unsigned int ip_conntrack_hash_rnd;
 
@@ -146,7 +141,7 @@
 {
 	ip_conntrack_put(exp->master);
 	IP_NF_ASSERT(!timer_pending(&exp->timeout));
-	kmem_cache_free(ip_conntrack_expect_cachep, exp);
+	ip_conntrack_expect_free(exp);
 	CONNTRACK_STAT_INC(expect_delete);
 }
 
@@ -158,6 +153,13 @@
 	exp->master->expecting--;
 }
 
+void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp)
+{
+	MUST_BE_WRITE_LOCKED(&ip_conntrack_lock);
+	unlink_expect(exp);
+	destroy_expect(exp);
+}
+
 static void expectation_timed_out(unsigned long ul_expect)
 {
 	struct ip_conntrack_expect *exp = (void *)ul_expect;
@@ -169,6 +171,33 @@
 	destroy_expect(exp);
 }
 
+struct ip_conntrack_expect *
+__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple)
+{
+	struct ip_conntrack_expect *i;
+	
+	list_for_each_entry(i, &ip_conntrack_expect_list, list) {
+		if (ip_ct_tuple_mask_cmp(tuple, &i->tuple, &i->mask)) {
+			atomic_inc(&i->use);
+			return i;
+		}
+	}
+	return NULL;
+}
+
+/* Just find a expectation corresponding to a tuple. */
+struct ip_conntrack_expect *
+ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple)
+{
+	struct ip_conntrack_expect *i;
+	
+	READ_LOCK(&ip_conntrack_lock);
+	i = __ip_conntrack_expect_find(tuple);
+	READ_UNLOCK(&ip_conntrack_lock);
+
+	return i;
+}
+
 /* If an expectation for this connection is found, it gets delete from
  * global list then returned. */
 static struct ip_conntrack_expect *
@@ -193,7 +222,7 @@
 }
 
 /* delete all expectations for this conntrack */
-static void remove_expectations(struct ip_conntrack *ct)
+void ip_ct_remove_expectations(struct ip_conntrack *ct)
 {
 	struct ip_conntrack_expect *i, *tmp;
 
@@ -223,7 +252,7 @@
 	LIST_DELETE(&ip_conntrack_hash[hr], &ct->tuplehash[IP_CT_DIR_REPLY]);
 
 	/* Destroy all pending expectations */
-	remove_expectations(ct);
+	ip_ct_remove_expectations(ct);
 }
 
 static void
@@ -253,7 +282,7 @@
 	 * except TFTP can create an expectation on the first packet,
 	 * before connection is in the list, so we need to clean here,
 	 * too. */
-	remove_expectations(ct);
+	ip_ct_remove_expectations(ct);
 
 	/* We overload first tuple to link into unconfirmed list. */
 	if (!is_confirmed(ct)) {
@@ -268,8 +297,7 @@
 		ip_conntrack_put(ct->master);
 
 	DEBUGP("destroy_conntrack: returning ct=%p to slab\n", ct);
-	kmem_cache_free(ip_conntrack_cachep, ct);
-	atomic_dec(&ip_conntrack_count);
+	ip_conntrack_free(ct);
 }
 
 static void death_by_timeout(unsigned long ul_conntrack)
@@ -296,7 +324,7 @@
 		&& ip_ct_tuple_equal(tuple, &i->tuple);
 }
 
-static struct ip_conntrack_tuple_hash *
+struct ip_conntrack_tuple_hash *
 __ip_conntrack_find(const struct ip_conntrack_tuple *tuple,
 		    const struct ip_conntrack *ignored_conntrack)
 {
@@ -331,6 +359,27 @@
 	return h;
 }
 
+static void __ip_conntrack_hash_insert(struct ip_conntrack *ct) 
+{
+	size_t hash, repl_hash;
+	
+	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+	
+	list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, 
+		 &ip_conntrack_hash[hash]);
+	list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list,
+		 &ip_conntrack_hash[repl_hash]);
+	ct->id = ++ip_conntrack_next_id;
+}
+
+void ip_conntrack_hash_insert(struct ip_conntrack *ct)
+{
+	WRITE_LOCK(&ip_conntrack_lock);
+	__ip_conntrack_hash_insert(ct);
+	WRITE_UNLOCK(&ip_conntrack_lock);
+}
+
 /* Confirm a connection given skb; places it in hash table */
 int
 __ip_conntrack_confirm(struct sk_buff **pskb)
@@ -377,10 +426,10 @@
 		/* Remove from unconfirmed list */
 		list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list);
 
-		list_prepend(&ip_conntrack_hash[hash],
-			     &ct->tuplehash[IP_CT_DIR_ORIGINAL]);
-		list_prepend(&ip_conntrack_hash[repl_hash],
-			     &ct->tuplehash[IP_CT_DIR_REPLY]);
+		list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, 
+			 &ip_conntrack_hash[hash]);
+		list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list,
+			 &ip_conntrack_hash[repl_hash]);
 		/* Timer relative to confirmation time, not original
 		   setting time, otherwise we'd get timer wrap in
 		   weird delay cases. */
@@ -398,6 +447,7 @@
 #endif
 		ip_conntrack_event_cache(master_ct(ct) ?
 					 IPCT_RELATED : IPCT_NEW, *pskb);
+		ct->id = ++ip_conntrack_next_id;
 
 		return NF_ACCEPT;
 	}
@@ -432,7 +482,7 @@
 
 static int early_drop(struct list_head *chain)
 {
-	/* Traverse forwards: gives us oldest, which is roughly LRU */
+	/* Traverse backwards: gives us oldest, which is roughly LRU */
 	struct ip_conntrack_tuple_hash *h;
 	struct ip_conntrack *ct = NULL;
 	int dropped = 0;
@@ -463,7 +513,7 @@
 	return ip_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask);
 }
 
-static struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple)
+struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple)
 {
 	return LIST_FIND(&helpers, helper_cmp,
 			 struct ip_conntrack_helper *,
@@ -472,22 +522,18 @@
 
 /* Allocate a new conntrack: we return -ENOMEM if classification
    failed due to stress.  Otherwise it really is unclassifiable. */
-static struct ip_conntrack_tuple_hash *
-init_conntrack(const struct ip_conntrack_tuple *tuple,
-	       struct ip_conntrack_protocol *protocol,
-	       struct sk_buff *skb)
+struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *orig,
+					struct ip_conntrack_tuple *repl)
 {
 	struct ip_conntrack *conntrack;
-	struct ip_conntrack_tuple repl_tuple;
 	size_t hash;
-	struct ip_conntrack_expect *exp;
 
 	if (!ip_conntrack_hash_rnd_initted) {
 		get_random_bytes(&ip_conntrack_hash_rnd, 4);
 		ip_conntrack_hash_rnd_initted = 1;
 	}
 
-	hash = hash_conntrack(tuple);
+	hash = hash_conntrack(orig);
 
 	if (ip_conntrack_max
 	    && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) {
@@ -501,11 +547,6 @@
 		}
 	}
 
-	if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) {
-		DEBUGP("Can't invert tuple.\n");
-		return NULL;
-	}
-
 	conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC);
 	if (!conntrack) {
 		DEBUGP("Can't allocate conntrack.\n");
@@ -515,17 +556,51 @@
 	memset(conntrack, 0, sizeof(*conntrack));
 	atomic_set(&conntrack->ct_general.use, 1);
 	conntrack->ct_general.destroy = destroy_conntrack;
-	conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple;
-	conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = repl_tuple;
-	if (!protocol->new(conntrack, skb)) {
-		kmem_cache_free(ip_conntrack_cachep, conntrack);
-		return NULL;
-	}
+	conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig;
+	conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *repl;
+
 	/* Don't set timer yet: wait for confirmation */
 	init_timer(&conntrack->timeout);
 	conntrack->timeout.data = (unsigned long)conntrack;
 	conntrack->timeout.function = death_by_timeout;
 
+	atomic_inc(&ip_conntrack_count);
+
+	return conntrack;
+}
+
+void
+ip_conntrack_free(struct ip_conntrack *conntrack)
+{
+	atomic_dec(&ip_conntrack_count);
+	kmem_cache_free(ip_conntrack_cachep, conntrack);
+}
+
+static struct ip_conntrack_tuple_hash *
+init_conntrack(struct ip_conntrack_tuple *tuple,
+	       struct ip_conntrack_protocol *protocol,
+	       struct sk_buff *skb)
+{
+	struct ip_conntrack *conntrack;
+	struct ip_conntrack_tuple repl_tuple;
+	struct ip_conntrack_expect *exp;
+
+	if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) {
+		DEBUGP("Can't invert tuple.\n");
+		return NULL;
+	}
+
+	if (!(conntrack = ip_conntrack_alloc(tuple, &repl_tuple)))
+		return NULL;
+
+	if (IS_ERR(conntrack))
+		return (void *) conntrack;
+
+	if (!protocol->new(conntrack, skb)) {
+		ip_conntrack_free(conntrack);
+		return NULL;
+	}
+
 	WRITE_LOCK(&ip_conntrack_lock);
 	exp = find_expectation(tuple);
 
@@ -549,7 +624,6 @@
 	/* Overload tuple linked list to put us in unconfirmed list. */
 	list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed);
 
-	atomic_inc(&ip_conntrack_count);
 	WRITE_UNLOCK(&ip_conntrack_lock);
 
 	if (exp) {
@@ -765,13 +839,15 @@
 		DEBUGP("expect_related: OOM allocating expect\n");
 		return NULL;
 	}
+	atomic_set(&new->use, 0);
 	new->master = NULL;
 	return new;
 }
 
 void ip_conntrack_expect_free(struct ip_conntrack_expect *expect)
 {
-	kmem_cache_free(ip_conntrack_expect_cachep, expect);
+	if (atomic_dec_and_test(&expect->use))
+		kmem_cache_free(ip_conntrack_expect_cachep, expect);
 }
 
 static void ip_conntrack_expect_insert(struct ip_conntrack_expect *exp)
@@ -790,6 +866,9 @@
 	} else
 		exp->timeout.function = NULL;
 
+	exp->id = ++ip_conntrack_expect_next_id;
+
+	atomic_inc(&exp->use);
 	CONNTRACK_STAT_INC(expect_create);
 }
 
@@ -1018,6 +1097,32 @@
 	nf_conntrack_get(nskb->nfct);
 }
 
+void ip_ct_generic_change_proto(struct ip_conntrack *ct, 
+				union ip_conntrack_proto *p)
+{
+	struct ip_conntrack_protocol *proto;
+	struct ip_conntrack_tuple_hash *th = &ct->tuplehash[IP_CT_DIR_REPLY];
+
+	proto = ip_ct_find_proto(th->tuple.dst.protonum);
+	if (proto->lock != NULL) {
+		write_lock_bh(proto->lock);
+		memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto));
+		write_unlock_bh(proto->lock);
+	} else
+		memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto));
+}
+
+void ip_ct_generic_change_help(struct ip_conntrack *ct, 
+			       union ip_conntrack_help *h)
+{
+	if (ct->helper->lock != NULL) {
+		spin_lock_bh(ct->helper->lock);
+		memcpy(&ct->help, h, sizeof(ct->help));
+		spin_unlock_bh(ct->helper->lock);
+	} else
+		memcpy(&ct->help, h, sizeof(ct->help));
+}
+
 static inline int
 do_iter(const struct ip_conntrack_tuple_hash *i,
 	int (*iter)(struct ip_conntrack *i, void *data),
@@ -1144,23 +1249,27 @@
 				     * ip_conntrack_htable_size));
 }
 
-/* Mishearing the voices in his head, our hero wonders how he's
-   supposed to kill the mall. */
-void ip_conntrack_cleanup(void)
+void ip_conntrack_flush()
 {
-	ip_ct_attach = NULL;
 	/* This makes sure all current packets have passed through
            netfilter framework.  Roll on, two-stage module
            delete... */
 	synchronize_net();
- 
+
  i_see_dead_people:
 	ip_ct_iterate_cleanup(kill_all, NULL);
 	if (atomic_read(&ip_conntrack_count) != 0) {
 		schedule();
 		goto i_see_dead_people;
 	}
+}
 
+/* Mishearing the voices in his head, our hero wonders how he's
+   supposed to kill the mall. */
+void ip_conntrack_cleanup(void)
+{
+	ip_ct_attach = NULL;
+	ip_conntrack_flush();
 	kmem_cache_destroy(ip_conntrack_cachep);
 	kmem_cache_destroy(ip_conntrack_expect_cachep);
 	free_conntrack_hash();
@@ -1258,6 +1367,8 @@
 	/*  - and look it like as a confirmed connection */
 	set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status);
 
+	printk("untracked: %p\n", &ip_conntrack_untracked);
+
 	return ret;
 
 err_free_conntrack_slab:
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_ftp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c	2005-06-28 02:28:44.000000000 +0200
@@ -481,6 +481,8 @@
 		ftp[i].timeout = 5 * 60; /* 5 minutes */
 		ftp[i].me = THIS_MODULE;
 		ftp[i].help = help;
+		ftp[i].lock = &ip_ftp_lock;
+		ftp[i].change_help = ip_ct_generic_change_help;
 
 		tmpname = &ftp_names[i][0];
 		if (ports[i] == FTP_PORT)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_irc.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c	2005-06-28 02:28:44.000000000 +0200
@@ -275,6 +275,8 @@
 		hlpr->timeout = dcc_timeout;
 		hlpr->me = THIS_MODULE;
 		hlpr->help = help;
+		hlpr->lock = &irc_buffer_lock;
+		hlpr->change_help = ip_ct_generic_change_help;
 
 		tmpname = &irc_names[i][0];
 		if (ports[i] == IRC_PORT)
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c	2005-06-28 02:28:44.000000000 +0200
@@ -109,16 +109,17 @@
 	return NF_ACCEPT;
 }
 
+static u_int8_t valid_new[] = { 
+	[ICMP_ECHO] = 1,
+	[ICMP_TIMESTAMP] = 1,
+	[ICMP_INFO_REQUEST] = 1,
+	[ICMP_ADDRESS] = 1 
+};
+
 /* Called when a new connection for this protocol found. */
 static int icmp_new(struct ip_conntrack *conntrack,
 		    const struct sk_buff *skb)
 {
-	static u_int8_t valid_new[]
-		= { [ICMP_ECHO] = 1,
-		    [ICMP_TIMESTAMP] = 1,
-		    [ICMP_INFO_REQUEST] = 1,
-		    [ICMP_ADDRESS] = 1 };
-
 	if (conntrack->tuplehash[0].tuple.dst.u.icmp.type >= sizeof(valid_new)
 	    || !valid_new[conntrack->tuplehash[0].tuple.dst.u.icmp.type]) {
 		/* Can't create a new ICMP `conn' with this. */
@@ -266,6 +267,17 @@
 	return icmp_error_message(skb, ctinfo, hooknum);
 }
 
+static int icmp_change_check_tuples(struct ip_conntrack_tuple *orig,
+				    struct ip_conntrack_tuple *reply)
+{
+	unsigned int type = orig->dst.u.icmp.type;
+
+	if (type >= sizeof(valid_new) || !valid_new[type])
+		return -EINVAL;
+
+	return 0;
+}
+
 struct ip_conntrack_protocol ip_conntrack_protocol_icmp =
 {
 	.proto 			= IPPROTO_ICMP,
@@ -277,4 +289,6 @@
 	.packet			= icmp_packet,
 	.new			= icmp_new,
 	.error			= icmp_error,
+	.change_check_tuples    = icmp_change_check_tuples,
+	.change_proto           = ip_ct_generic_change_proto,
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c	2005-06-28 02:28:44.000000000 +0200
@@ -499,6 +499,7 @@
 static struct ip_conntrack_protocol ip_conntrack_protocol_sctp = { 
 	.proto 		 = IPPROTO_SCTP, 
 	.name 		 = "sctp",
+	.lock		 = &sctp_lock,
 	.pkt_to_tuple 	 = sctp_pkt_to_tuple, 
 	.invert_tuple 	 = sctp_invert_tuple, 
 	.print_tuple 	 = sctp_print_tuple, 
@@ -506,7 +507,8 @@
 	.packet 	 = sctp_packet, 
 	.new 		 = sctp_new, 
 	.destroy 	 = NULL, 
-	.me 		 = THIS_MODULE 
+	.me 		 = THIS_MODULE,
+	.change_proto	 = ip_ct_generic_change_proto,
 };
 
 #ifdef CONFIG_SYSCTL
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	2005-06-28 02:28:44.000000000 +0200
@@ -1094,6 +1094,7 @@
 {
 	.proto 			= IPPROTO_TCP,
 	.name 			= "tcp",
+	.lock			= &tcp_lock,
 	.pkt_to_tuple 		= tcp_pkt_to_tuple,
 	.invert_tuple 		= tcp_invert_tuple,
 	.print_tuple 		= tcp_print_tuple,
@@ -1101,4 +1102,5 @@
 	.packet 		= tcp_packet,
 	.new 			= tcp_new,
 	.error			= tcp_error,
+	.change_proto           = ip_ct_generic_change_proto
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c	2005-06-28 02:28:44.000000000 +0200
@@ -144,4 +144,5 @@
 	.packet			= udp_packet,
 	.new			= udp_new,
 	.error			= udp_error,
+	.change_proto           = ip_ct_generic_change_proto
 };
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_standalone.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c	2005-06-28 02:28:44.000000000 +0200
@@ -963,6 +963,19 @@
 {
 }
 
+EXPORT_SYMBOL(ip_ct_unlink_destroy_expect);
+EXPORT_SYMBOL(ip_conntrack_flush);
+EXPORT_SYMBOL(__ip_conntrack_expect_find);
+EXPORT_SYMBOL(__ip_conntrack_find);
+EXPORT_SYMBOL(ip_conntrack_expect_list);
+EXPORT_SYMBOL(ip_ct_generic_change_proto);
+EXPORT_SYMBOL(ip_ct_generic_change_help);
+EXPORT_SYMBOL(ip_conntrack_expect_find_get);
+EXPORT_SYMBOL(ip_conntrack_alloc);
+EXPORT_SYMBOL(ip_conntrack_free);
+EXPORT_SYMBOL(ip_conntrack_hash_insert);
+EXPORT_SYMBOL(ip_ct_remove_expectations);
+EXPORT_SYMBOL(ip_ct_find_helper);
 #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS
 EXPORT_SYMBOL(ip_conntrack_chain);
 EXPORT_SYMBOL(ip_conntrack_expect_chain);
@@ -993,7 +1006,6 @@
 EXPORT_SYMBOL(ip_conntrack_hash);
 EXPORT_SYMBOL(ip_conntrack_untracked);
 EXPORT_SYMBOL_GPL(ip_conntrack_find_get);
-EXPORT_SYMBOL_GPL(ip_conntrack_put);
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 EXPORT_SYMBOL(ip_conntrack_tcp_update);
 #endif
Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c
===================================================================
--- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_tftp.c	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c	2005-06-28 02:28:44.000000000 +0200
@@ -134,6 +134,7 @@
 		tftp[i].timeout = 5 * 60; /* 5 minutes */
 		tftp[i].me = THIS_MODULE;
 		tftp[i].help = tftp_help;
+		tftp[i].change_help = ip_ct_generic_change_help;
 
 		tmpname = &tftp_names[i][0];
 		if (ports[i] == TFTP_PORT)
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_amanda.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h	2005-06-28 02:28:44.000000000 +0200
@@ -3,9 +3,11 @@
 /* AMANDA tracking. */
 
 struct ip_conntrack_expect;
+#ifdef __KERNEL__
 extern unsigned int (*ip_nat_amanda_hook)(struct sk_buff **pskb,
 					  enum ip_conntrack_info ctinfo,
 					  unsigned int matchoff,
 					  unsigned int matchlen,
 					  struct ip_conntrack_expect *exp);
+#endif
 #endif /* _IP_CONNTRACK_AMANDA_H */
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_ftp.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h	2005-06-28 02:28:44.000000000 +0200
@@ -31,6 +31,7 @@
 
 struct ip_conntrack_expect;
 
+#ifdef __KERNEL__
 /* For NAT to hook in when we find a packet which describes what other
  * connection we should expect. */
 extern unsigned int (*ip_nat_ftp_hook)(struct sk_buff **pskb,
@@ -40,4 +41,5 @@
 				       unsigned int matchlen,
 				       struct ip_conntrack_expect *exp,
 				       u32 *seq);
+#endif
 #endif /* _IP_CONNTRACK_FTP_H */
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h	2005-06-28 02:28:44.000000000 +0200
@@ -71,13 +71,13 @@
 	IPS_DESTROYED = (1 << IPS_DESTROYED_BIT),
 };
 
-#ifdef __KERNEL__
-#include <linux/config.h>
-#include <linux/netfilter_ipv4/ip_conntrack_tuple.h>
-#include <linux/bitops.h>
-#include <linux/compiler.h>
-#include <asm/atomic.h>
+struct ip_conntrack_counter
+{
+	u_int64_t packets;
+	u_int64_t bytes;
+};
 
+#include <linux/netfilter_ipv4/ip_conntrack_tuple.h>
 #include <linux/netfilter_ipv4/ip_conntrack_tcp.h>
 #include <linux/netfilter_ipv4/ip_conntrack_icmp.h>
 #include <linux/netfilter_ipv4/ip_conntrack_sctp.h>
@@ -106,6 +106,7 @@
 	struct ip_ct_irc_master ct_irc_info;
 };
 
+#ifdef __KERNEL__
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 #include <linux/netfilter_ipv4/ip_nat.h>
 #endif
@@ -126,12 +127,6 @@
 #define IP_NF_ASSERT(x)
 #endif
 
-struct ip_conntrack_counter
-{
-	u_int64_t packets;
-	u_int64_t bytes;
-};
-
 struct ip_conntrack_helper;
 
 struct ip_conntrack
@@ -140,6 +135,9 @@
            plus 1 for any connection(s) we are `master' for */
 	struct nf_conntrack ct_general;
 
+	/* Unique ID that identifies this conntrack*/
+	u_int64_t id;
+
 	/* Have we seen traffic both ways yet? (bitset) */
 	unsigned long status;
 
@@ -188,6 +186,9 @@
 	/* Internal linked list (global expectation list) */
 	struct list_head list;
 
+	/* Expectation ID */
+	__u64 id;
+
 	/* We expect this tuple, with the following mask */
 	struct ip_conntrack_tuple tuple, mask;
  
@@ -201,6 +202,8 @@
 	/* Timer function; deletes the expectation. */
 	struct timer_list timeout;
 
+	atomic_t use;
+
 #ifdef CONFIG_IP_NF_NAT_NEEDED
 	/* This is the original per-proto part, used to map the
 	 * expected connection the way the recipient expects. */
@@ -240,7 +243,12 @@
 }
 
 /* decrement reference count on a conntrack */
-extern inline void ip_conntrack_put(struct ip_conntrack *ct);
+static inline void
+ip_conntrack_put(struct ip_conntrack *ct)
+{
+	IP_NF_ASSERT(ct);
+	nf_conntrack_put(&ct->ct_general);
+}
 
 /* call to create an explicit dependency on ip_conntrack. */
 extern void need_ip_conntrack(void);
@@ -275,6 +283,34 @@
 ip_ct_iterate_cleanup(int (*iter)(struct ip_conntrack *i, void *data),
 		      void *data);
 
+extern struct ip_conntrack_helper *
+ip_ct_find_helper(const struct ip_conntrack_tuple *tuple);
+
+extern void ip_ct_remove_expectations(struct ip_conntrack *ct);
+
+extern struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *,
+					       struct ip_conntrack_tuple *);
+
+extern void ip_conntrack_free(struct ip_conntrack *ct);
+
+extern void ip_conntrack_hash_insert(struct ip_conntrack *ct);
+
+extern struct ip_conntrack_expect *
+__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple);
+
+extern struct ip_conntrack_expect *
+ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple);
+
+extern struct ip_conntrack_tuple_hash *
+__ip_conntrack_find(const struct ip_conntrack_tuple *tuple,
+                    const struct ip_conntrack *ignored_conntrack);
+
+extern inline void ip_conntrack_expect_put(struct ip_conntrack_expect *exp);
+
+extern void ip_conntrack_flush(void);
+
+extern void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp);
+
 /* It's confirmed if it is, or has been in the hash table. */
 static inline int is_confirmed(struct ip_conntrack *ct)
 {
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_helper.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h	2005-06-28 02:28:44.000000000 +0200
@@ -9,6 +9,8 @@
 {	
 	struct list_head list; 		/* Internal use. */
 
+	spinlock_t *lock;		/* protect private info and buffer */
+
 	const char *name;		/* name of the module */
 	struct module *me;		/* pointer to self */
 	unsigned int max_expected;	/* Maximum number of concurrent 
@@ -24,6 +26,8 @@
 	int (*help)(struct sk_buff **pskb,
 		    struct ip_conntrack *ct,
 		    enum ip_conntrack_info conntrackinfo);
+
+	void (*change_help)(struct ip_conntrack *, union ip_conntrack_help *);
 };
 
 extern int ip_conntrack_helper_register(struct ip_conntrack_helper *);
@@ -38,4 +42,7 @@
 extern int ip_conntrack_expect_related(struct ip_conntrack_expect *exp);
 extern void ip_conntrack_unexpect_related(struct ip_conntrack_expect *exp);
 
+extern void ip_ct_generic_change_help(struct ip_conntrack *ct,
+				      union ip_conntrack_help *h);
+
 #endif /*_IP_CONNTRACK_HELPER_H*/
Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h
===================================================================
--- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h	2005-06-28 02:28:38.000000000 +0200
+++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h	2005-06-28 02:28:44.000000000 +0200
@@ -10,6 +10,8 @@
 	/* Protocol number. */
 	u_int8_t proto;
 
+	rwlock_t *lock;
+
 	/* Protocol name */
 	const char *name;
 
@@ -47,6 +49,17 @@
 	int (*error)(struct sk_buff *skb, enum ip_conntrack_info *ctinfo,
 		     unsigned int hooknum);
 
+	/* check if tuples are valid for a new connection */
+	int (*change_check_tuples)(struct ip_conntrack_tuple *orig,
+				   struct ip_conntrack_tuple *reply);
+
+	/* check protocol data is valid */
+	int (*change_check_proto)(union ip_conntrack_proto *p);
+
+	/* change protocol info on behalf of ctnetlink */
+	void (*change_proto)(struct ip_conntrack *ct,
+			     union ip_conntrack_proto *p);
+	
 	/* Module (if any) which this is connected to. */
 	struct module *me;
 };
@@ -57,6 +70,8 @@
 /* Protocol registration. */
 extern int ip_conntrack_protocol_register(struct ip_conntrack_protocol *proto);
 extern void ip_conntrack_protocol_unregister(struct ip_conntrack_protocol *proto);
+extern void ip_ct_generic_change_proto(struct ip_conntrack *conntrack,
+				       union ip_conntrack_proto *p);
 
 static inline struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol)
 {

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:12     ` Pablo Neira
@ 2005-06-28  2:15       ` Pablo Neira
  0 siblings, 0 replies; 50+ messages in thread
From: Pablo Neira @ 2005-06-28  2:15 UTC (permalink / raw)
  To: Pablo Neira
  Cc: Harald Welte, Netfilter Development Mailinglist, Patrick McHardy

Pablo Neira wrote:
> Pablo Neira wrote:
> 
>> Harald Welte wrote:
>>
>>> I don't understand why the order of the expectation list was changed,
>>> though.  Would you please explain why this was done?
>>
>>
>>
>> Because of the table dumping logic I was using.
>>
>> -                       if (ct->id <= *id)
>> +                       if (ct->id >= *id)
>>                                 continue;
> 
> 
> Please ignore the patches I've sent you.

Damn, I'm tired, wrong patch again :(. Ignore it, I'll send a new digest 
that applies to SVN tomorrow morning.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:00   ` Pablo Neira
  2005-06-28  2:12     ` Pablo Neira
@ 2005-06-28  3:53     ` Patrick McHardy
  2005-06-28  7:07       ` Harald Welte
  2005-07-04 12:59       ` Amin Azez
  2005-06-28  7:06     ` Harald Welte
  2 siblings, 2 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-28  3:53 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Harald Welte wrote:
>>> - Implement ip_conntrack_stats dumping and reset (accounting)
>>
>> you want to dump the statistics via netlink?  i'm not sure whether that
>> is required.  lnstat is the only program using those counters, and using
>> /proc seems fine for that.
> 
> Could this be useful for accounting purposes? I mean, dump and reset as
> a single operation, something similar to conntrack -L -z, say conntrack
> -S -z.

I think it is convenient for anyone wanting to access these statistics
without parsing /proc to have everything accessible through a single
interface.

>>> - Implement get conntrack and destroy (accounting)
>>
>> sorry, what are you referring to?
> 
> An atomical operation to dump conntrack information and then destroy it.
> For example, if someone does:
> 
> conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp
> --orig-src-port 10 --orig-dst-port 20
> 
> the it sends the conntrack info and destroy it. This could be worth it
> for accounting.

destroy notifications are sent anyway, so we get the stats. What's the
difference to normal destruction?

> I could use multicast netlink subscrition, so the client can get
> subscribed to those events that the user considers interesting.

That sounds like the right thing to do.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  3:53     ` Patrick McHardy
@ 2005-06-28  7:07       ` Harald Welte
  2005-07-04 12:59       ` Amin Azez
  1 sibling, 0 replies; 50+ messages in thread
From: Harald Welte @ 2005-06-28  7:07 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Tue, Jun 28, 2005 at 05:53:22AM +0200, Patrick McHardy wrote:
> Pablo Neira wrote:
> > Harald Welte wrote:
> >>> - Implement ip_conntrack_stats dumping and reset (accounting)
> >>
> >> you want to dump the statistics via netlink?  i'm not sure whether that
> >> is required.  lnstat is the only program using those counters, and using
> >> /proc seems fine for that.
> > 
> > Could this be useful for accounting purposes? I mean, dump and reset as
> > a single operation, something similar to conntrack -L -z, say conntrack
> > -S -z.
> 
> I think it is convenient for anyone wanting to access these statistics
> without parsing /proc to have everything accessible through a single
> interface.

... and it is already implemented in current svn ;)

> destroy notifications are sent anyway, so we get the stats. What's the
> difference to normal destruction?

I agree.

> > I could use multicast netlink subscrition, so the client can get
> > subscribed to those events that the user considers interesting.
> 
> That sounds like the right thing to do.

well, but we still create a horrible amount of event messages, let's say
in case of a syn flood...

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  3:53     ` Patrick McHardy
  2005-06-28  7:07       ` Harald Welte
@ 2005-07-04 12:59       ` Amin Azez
  1 sibling, 0 replies; 50+ messages in thread
From: Amin Azez @ 2005-07-04 12:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
> 
>>Harald Welte wrote:
>>
>>>>- Implement ip_conntrack_stats dumping and reset (accounting)
>>>
>>>you want to dump the statistics via netlink?  i'm not sure whether that
>>>is required.  lnstat is the only program using those counters, and using
>>>/proc seems fine for that.
>>
>>Could this be useful for accounting purposes? I mean, dump and reset as
>>a single operation, something similar to conntrack -L -z, say conntrack
>>-S -z.

I use this, reading and zeroing conntracks at the same time.

> I think it is convenient for anyone wanting to access these statistics
> without parsing /proc to have everything accessible through a single
> interface.

Yes; it also preserves synchronicity.

In other words it is nice to be sure if an event-update comes 
chronologically before or after the stats for the same conntrack were read.

Over netlink this is easy, over netlink AND /proc is impossible.

Azez

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:00   ` Pablo Neira
  2005-06-28  2:12     ` Pablo Neira
  2005-06-28  3:53     ` Patrick McHardy
@ 2005-06-28  7:06     ` Harald Welte
  2 siblings, 0 replies; 50+ messages in thread
From: Harald Welte @ 2005-06-28  7:06 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

On Tue, Jun 28, 2005 at 04:00:06AM +0200, Pablo Neira wrote:
> >I don't understand why the order of the expectation list was changed,
> >though.  Would you please explain why this was done?
> 
> Because of the table dumping logic I was using.

I see. 

> >>- Implement ip_conntrack_stats dumping and reset (accounting)
> >
> >you want to dump the statistics via netlink?  i'm not sure whether that
> >is required.  lnstat is the only program using those counters, and using
> >/proc seems fine for that.
> 
> Could this be useful for accounting purposes? I mean, dump and reset as 
> a single operation, something similar to conntrack -L -z, say conntrack 
> -S -z.

oh, you're referring to ip_conntrack_acct (the accounting), not
ip_conntrack_stat (the statistics) ?  /me is confused.

Isn't that what the CTRZERO message type already does?

> An atomical operation to dump conntrack information and then destroy it. 
> For example, if someone does:
> 
> conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp 
> --orig-src-port 10 --orig-dst-port 20
> 
> the it sends the conntrack info and destroy it. This could be worth it 
> for accounting.

I think it could be useful, indeed.

> >I'm not sure if we could somehow (cleanly) tie into the scoket
> >close/destroy code.  If yes, we could check if we can reduce the mask at
> >socket destroy time.
> 
> I think there's no sane way to do this but let me check the netlink code 
> again. I could use multicast netlink subscrition, so the client can get 
> subscribed to those events that the user considers interesting.

Well, maybe we can extend the netlink code to achieve what we're
interested in.  

Multicast netlink groups are already used for my clumsy tcp/udp/icmp/...
distinction.  I think it is not a good solution for ctnetlink, since the
kernel then sends all events, independent of any process interested in
them or not.

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
  2005-06-27 20:26 ` Harald Welte
@ 2005-06-27 21:31 ` Patrick McHardy
  2005-06-28  2:15   ` Pablo Neira
  2005-06-27 22:40 ` Patrick McHardy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-27 21:31 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Hi Harald,
> 
> This patchset introduces tons of updates for the nfnetlink, ctnetlink
> and the conntrack event API. I haven't attached the file since it's that
> big, about 100K.
> 
> You can get an incremental diff against SVN from:
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch

I haven't looked at the patch for a long time, and it removes
and adds the whole file, so not sure if it is new ..

+		/* This is tricky but it works. ip_nat_setup_info needs the
+		 * hook number as parameter, so let's do the correct
+		 * conversion and run away */
+		if (*status & IPS_SRC_NAT_DONE)
+			hooknum = NF_IP_POST_ROUTING; /* IP_NAT_MANIP_SRC */
+		else if (*status & IPS_DST_NAT_DONE)
+			hooknum = NF_IP_PRE_ROUTING;  /* IP_NAT_MANIP_DST */
+		else
+			return -EINVAL; /* Missing NAT flags */

This doesn't work reliably, locally generated packets never enter
PRE_ROUTING but can be DNATed. I think the hook should be supplied
by the user.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 21:31 ` Patrick McHardy
@ 2005-06-28  2:15   ` Pablo Neira
  2005-06-28  3:56     ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-28  2:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
> 
> I haven't looked at the patch for a long time, and it removes
> and adds the whole file, so not sure if it is new ..

Yes, that's new. Ick, it seems I didn't express fine myself. Together 
with the big and confusing patch I uploaded a splitted version that 
makes easier the review as well:

http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/

> +		/* This is tricky but it works. ip_nat_setup_info needs the
> +		 * hook number as parameter, so let's do the correct
> +		 * conversion and run away */
> +		if (*status & IPS_SRC_NAT_DONE)
> +			hooknum = NF_IP_POST_ROUTING; /* IP_NAT_MANIP_SRC */
> +		else if (*status & IPS_DST_NAT_DONE)
> +			hooknum = NF_IP_PRE_ROUTING;  /* IP_NAT_MANIP_DST */
> +		else
> +			return -EINVAL; /* Missing NAT flags */
> 
> This doesn't work reliably, locally generated packets never enter
> PRE_ROUTING but can be DNATed. I think the hook should be supplied
> by the user.

The macro HOOK2MANIP used is ip_nat_setup_info returns the same value 
(maniptype) for NF_IP_PRE_ROUTING and NF_IP_LOCAL_IN, so the same 
manipulation (DNAT) will be applied to such conntrack. Since we works 
with conntracks, I don't mind where the packets came from, just want to 
apply the NAT handling that the user has requested.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:15   ` Pablo Neira
@ 2005-06-28  3:56     ` Patrick McHardy
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-28  3:56 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Patrick McHardy wrote:
>> This doesn't work reliably, locally generated packets never enter
>> PRE_ROUTING but can be DNATed. I think the hook should be supplied
>> by the user.
> 
> The macro HOOK2MANIP used is ip_nat_setup_info returns the same value
> (maniptype) for NF_IP_PRE_ROUTING and NF_IP_LOCAL_IN, so the same
> manipulation (DNAT) will be applied to such conntrack. Since we works
> with conntracks, I don't mind where the packets came from, just want to
> apply the NAT handling that the user has requested.

You're right, I'm still not familiar with the new NAT code ..

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
  2005-06-27 20:26 ` Harald Welte
  2005-06-27 21:31 ` Patrick McHardy
@ 2005-06-27 22:40 ` Patrick McHardy
  2005-06-28  2:16   ` Pablo Neira
  2005-06-28  7:13   ` Harald Welte
  2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-27 22:40 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> This patchset introduces tons of updates for the nfnetlink, ctnetlink
> and the conntrack event API. I haven't attached the file since it's that
> big, about 100K.

Looks good. A couple more comments on the patch, most are probably not
related to recent changes. I remember beeing responsible for at least
some of this :)

+struct cta_proto {
+       unsigned char num_proto;        /* Protocol number IPPROTO_X */
+       union ip_conntrack_proto proto;
+};

These should be changed not to use internal conntrack structures, it
will hurt us when we want to change them. Instead it should replicate
the fields interesting for the user. Also please use fixed-size types
instead of unions etc. All structures including u64 types should be
padded to multiples of 8 so they are equally sized on 32-bit and 64-bit
systems.

+#define CTA_HELP_MAXNAMESZ     31
+
+struct cta_help {
+       char name[CTA_HELP_MAXNAMESZ];  /* name of conntrack helper */
+       union ip_conntrack_help help;
+};

I suggest to use 32, the length in the kernel is not fixed AFAICS and
it will be padded to 32 in netlink messages anyway.

+/* ctnetlink multicast groups: reports any change of ctinfo,
+ * ctstatus, or protocol state change.
+ */
+#define NFGRP_IPV4_CT_TCP      0x01
+#define NFGRP_IPV4_CT_UDP      0x02
+#define NFGRP_IPV4_CT_ICMP     0x04
+#define NFGRP_IPV4_CT_OTHER    0x08

I'm not sure how useful these groups are. I think groups for different
event-types might be more useful to reduce the noise.

+       h->name[CTA_HELP_MAXNAMESZ - 1] = '\0';
+       if (strcmp(helper->name, h->name))
+               return -EINVAL;

We changed ipt_helper to accept a null-byte string as wildcard string,
probably a good idea to do the same here.

+       ct = ip_conntrack_alloc(otuple, rtuple);
+       if (ct == NULL || IS_ERR(ct))
+               return -ENOMEM;

ip_conntrack_alloc doesn't return ERR_PTR()

+       exp = ip_conntrack_expect_find_get(tuple);
+       if (!exp)
+               return -ENOENT;

I couldn't find this function, but in 2.6.12 expectations aren't
refcounted anymore. If they are again by this patch, the refcnt would
be leaked in the following lines:

+       skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+       if (!skb2)
+               return -ENOMEM;

++
++      /* Events were delivered: loopback mustn't notify events twice */
++      IPCT_DELIVERED_BIT = 11,
++      IPCT_DELIVERED = (1 << IPCT_DELIVERED_BIT),

I couldn't find any users, probably since this can't work with a
per-conntrack flag since events are generated by packets.

++      IPEXP_TIMEOUT_BIT = 1,
++      IPEXP_TIMEOUT = (1 << IPEXP_TIMEOUT_BIT),
++};
++

No user here either.

It would be great if you could send a patch without renaming the file,
that makes reviewing the changes a lot easier. Renaming can then be done
in a seperate patch that doesn't change anything.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 22:40 ` Patrick McHardy
@ 2005-06-28  2:16   ` Pablo Neira
  2005-06-28  4:03     ` Patrick McHardy
  2005-06-28  7:13   ` Harald Welte
  1 sibling, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-28  2:16 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
> 
>>This patchset introduces tons of updates for the nfnetlink, ctnetlink
>>and the conntrack event API. I haven't attached the file since it's that
>>big, about 100K.
> 
> 
> Looks good. A couple more comments on the patch, most are probably not
> related to recent changes. I remember beeing responsible for at least
> some of this :)

Fine. BTW, thanks a lot for the feedback you both, appreciated :)

> +struct cta_proto {
> +       unsigned char num_proto;        /* Protocol number IPPROTO_X */
> +       union ip_conntrack_proto proto;
> +};
> 
> These should be changed not to use internal conntrack structures, it
> will hurt us when we want to change them. Instead it should replicate
> the fields interesting for the user. Also please use fixed-size types
> instead of unions etc. All structures including u64 types should be
> padded to multiples of 8 so they are equally sized on 32-bit and 64-bit
> systems.

OK, I'll kill those in the next digest, added to my TODO.

> +#define CTA_HELP_MAXNAMESZ     31
> +
> +struct cta_help {
> +       char name[CTA_HELP_MAXNAMESZ];  /* name of conntrack helper */
> +       union ip_conntrack_help help;
> +};
> 
> I suggest to use 32, the length in the kernel is not fixed AFAICS and
> it will be padded to 32 in netlink messages anyway.

OK added.

> +/* ctnetlink multicast groups: reports any change of ctinfo,
> + * ctstatus, or protocol state change.
> + */
> +#define NFGRP_IPV4_CT_TCP      0x01
> +#define NFGRP_IPV4_CT_UDP      0x02
> +#define NFGRP_IPV4_CT_ICMP     0x04
> +#define NFGRP_IPV4_CT_OTHER    0x08
> 
> I'm not sure how useful these groups are. I think groups for different
> event-types might be more useful to reduce the noise.

Yes, this looks fine. So we could kill those and use an event subscription.

> 
> +       h->name[CTA_HELP_MAXNAMESZ - 1] = '\0';
> +       if (strcmp(helper->name, h->name))
> +               return -EINVAL;
> 
> We changed ipt_helper to accept a null-byte string as wildcard string,
> probably a good idea to do the same here.
> 
> 
> +       ct = ip_conntrack_alloc(otuple, rtuple);
> +       if (ct == NULL || IS_ERR(ct))
> +               return -ENOMEM;
> 
> ip_conntrack_alloc doesn't return ERR_PTR()

It didn't use to, but now it does in the updated patch:

http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/02modifs.patch

> +       exp = ip_conntrack_expect_find_get(tuple);
> +       if (!exp)
> +               return -ENOENT;
> 
> I couldn't find this function, but in 2.6.12 expectations aren't
> refcounted anymore. If they are again by this patch, the refcnt would
> be leaked in the following lines:
> 
> +       skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
> +       if (!skb2)
> +               return -ENOMEM;

Thanks for catching up this, I'll fix it. I recovered the refcounting 
for expectations to avoid any possible race condition since I could be 
working with an expectation whose timeout has expired.

> ++
> ++      /* Events were delivered: loopback mustn't notify events twice */
> ++      IPCT_DELIVERED_BIT = 11,
> ++      IPCT_DELIVERED = (1 << IPCT_DELIVERED_BIT),
> 
> I couldn't find any users, probably since this can't work with a
> per-conntrack flag since events are generated by packets.

This must dissapear, I introduced this to fix a loopback event 
notification (was done twice). This is ugly, so I'll kill it and reset 
nfcache once the events has been delivered to avoid such event duplication.

> ++      IPEXP_TIMEOUT_BIT = 1,
> ++      IPEXP_TIMEOUT = (1 << IPEXP_TIMEOUT_BIT),
> ++};
> ++
> 
> No user here either.

Right, not yet. I have problems to complete the expectation part of the 
conntrack event API since most events should be delivered holding 
ip_conntrack_lock and that's really ugly.

> It would be great if you could send a patch without renaming the file,
> that makes reviewing the changes a lot easier. Renaming can then be done
> in a seperate patch that doesn't change anything.

Does this help? Hope so.
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/04ctnetlink.patch

I'll send another patch that includes all your suggestion. Thanks.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  2:16   ` Pablo Neira
@ 2005-06-28  4:03     ` Patrick McHardy
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-28  4:03 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Patrick McHardy wrote:
> 
>> +/* ctnetlink multicast groups: reports any change of ctinfo,
>> + * ctstatus, or protocol state change.
>> + */
>> +#define NFGRP_IPV4_CT_TCP      0x01
>> +#define NFGRP_IPV4_CT_UDP      0x02
>> +#define NFGRP_IPV4_CT_ICMP     0x04
>> +#define NFGRP_IPV4_CT_OTHER    0x08
>>
>> I'm not sure how useful these groups are. I think groups for different
>> event-types might be more useful to reduce the noise.
>
> Yes, this looks fine. So we could kill those and use an event
> subscription.

Yes. The question is how to define the groups. One group per event seems
wasteful, we probably need some sensible grouping. What events types
are there currently?

>> I couldn't find this function, but in 2.6.12 expectations aren't
>> refcounted anymore. If they are again by this patch, the refcnt would
>> be leaked in the following lines:
>>
>> +       skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
>> +       if (!skb2)
>> +               return -ENOMEM;
> 
> 
> Thanks for catching up this, I'll fix it. I recovered the refcounting
> for expectations to avoid any possible race condition since I could be
> working with an expectation whose timeout has expired.

Great. I wanted to restore it anyway for "permanent expectations" for
the H.323 helper.

> Does this help? Hope so.
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/04ctnetlink.patch

I'll continue tomorrow, its getting late here ..

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 22:40 ` Patrick McHardy
  2005-06-28  2:16   ` Pablo Neira
@ 2005-06-28  7:13   ` Harald Welte
  2005-06-28 16:02     ` Patrick McHardy
  1 sibling, 1 reply; 50+ messages in thread
From: Harald Welte @ 2005-06-28  7:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira

[-- Attachment #1: Type: text/plain, Size: 2123 bytes --]

On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote:
> +struct cta_proto {
> +       unsigned char num_proto;        /* Protocol number IPPROTO_X */
> +       union ip_conntrack_proto proto;
> +};
> 
> These should be changed not to use internal conntrack structures, it
> will hurt us when we want to change them. Instead it should replicate
> the fields interesting for the user. Also please use fixed-size types
> instead of unions etc. All structures including u64 types should be
> padded to multiples of 8 so they are equally sized on 32-bit and 64-bit
> systems.

agreed.  However, we still don't have some kind of versioning in the
protocol, too.  I think we've learned by now that we need versioned
structures ;)

> +/* ctnetlink multicast groups: reports any change of ctinfo,
> + * ctstatus, or protocol state change.
> + */
> +#define NFGRP_IPV4_CT_TCP      0x01
> +#define NFGRP_IPV4_CT_UDP      0x02
> +#define NFGRP_IPV4_CT_ICMP     0x04
> +#define NFGRP_IPV4_CT_OTHER    0x08
> 
> I'm not sure how useful these groups are. I think groups for different
> event-types might be more useful to reduce the noise.

that was my idea in the beginning (since I didn't think of events at
that point).

Still, I think creating messages for any kind of event (even if noone
listens) is too much overhead.  netlink needs to be extended to deal
with that issue.

Maybe the 'which socket is subscribed to which group' accounting should
be done by the core netlink layer, which would then only export a
merged bitmask of all netlink sockets.  This way ctnetlink can easily
check whether it makes sense to create a certain event message or not.

This should be useful for other netlink users, too.

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28  7:13   ` Harald Welte
@ 2005-06-28 16:02     ` Patrick McHardy
  2005-06-29 19:13       ` Pablo Neira
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-28 16:02 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter Development Mailinglist, Pablo Neira

Harald Welte wrote:
> On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote:
> 
>>These should be changed not to use internal conntrack structures, it
>>will hurt us when we want to change them. Instead it should replicate
>>the fields interesting for the user. Also please use fixed-size types
>>instead of unions etc. All structures including u64 types should be
>>padded to multiples of 8 so they are equally sized on 32-bit and 64-bit
>>systems.
> 
> agreed.  However, we still don't have some kind of versioning in the
> protocol, too.  I think we've learned by now that we need versioned
> structures ;)

Netlink is easy to extend by adding new fields at the end if users
only check for msgsize >= sizeof(struct). Do you think we should
have versioning anyway?

>>+/* ctnetlink multicast groups: reports any change of ctinfo,
>>+ * ctstatus, or protocol state change.
>>+ */
>>+#define NFGRP_IPV4_CT_TCP      0x01
>>+#define NFGRP_IPV4_CT_UDP      0x02
>>+#define NFGRP_IPV4_CT_ICMP     0x04
>>+#define NFGRP_IPV4_CT_OTHER    0x08
>>
>>I'm not sure how useful these groups are. I think groups for different
>>event-types might be more useful to reduce the noise.
> 
> 
> that was my idea in the beginning (since I didn't think of events at
> that point).
> 
> Still, I think creating messages for any kind of event (even if noone
> listens) is too much overhead.  netlink needs to be extended to deal
> with that issue.
> 
> Maybe the 'which socket is subscribed to which group' accounting should
> be done by the core netlink layer, which would then only export a
> merged bitmask of all netlink sockets.  This way ctnetlink can easily
> check whether it makes sense to create a certain event message or not.
> 
> This should be useful for other netlink users, too.

Sounds like a good idea.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28 16:02     ` Patrick McHardy
@ 2005-06-29 19:13       ` Pablo Neira
  2005-06-29 19:52         ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-29 19:13 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Harald Welte wrote:
> 
>>On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote:
>>
>>
>>>These should be changed not to use internal conntrack structures, it
>>>will hurt us when we want to change them. Instead it should replicate
>>>the fields interesting for the user. Also please use fixed-size types
>>>instead of unions etc. All structures including u64 types should be
>>>padded to multiples of 8 so they are equally sized on 32-bit and 64-bit
>>>systems.
>>
>>agreed.  However, we still don't have some kind of versioning in the
>>protocol, too.  I think we've learned by now that we need versioned
>>structures ;)
> 
> Netlink is easy to extend by adding new fields at the end if users
> only check for msgsize >= sizeof(struct). Do you think we should
> have versioning anyway?

I think that we could split the structure into fine grain fields. For 
example, CTA_TUPLE_ORIG would composed of:

CTA_ORIG_IPV4_SRC
CTA_ORIG_IPV4_DST
CTA_L4_PROTONUM
CTA_PROTO_IPV4_SRC
CTA_PROTO_IPV4_DST
CTA_DIR

So, instead of sending a packet that contains a reference to an 
ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields 
(CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure.

But I'll need a function to glue all the fields to create a 
ip_conntrack_tuple. Maybe too bloated?

>>>+/* ctnetlink multicast groups: reports any change of ctinfo,
>>>+ * ctstatus, or protocol state change.
>>>+ */
>>>+#define NFGRP_IPV4_CT_TCP      0x01
>>>+#define NFGRP_IPV4_CT_UDP      0x02
>>>+#define NFGRP_IPV4_CT_ICMP     0x04
>>>+#define NFGRP_IPV4_CT_OTHER    0x08
>>>
>>>I'm not sure how useful these groups are. I think groups for different
>>>event-types might be more useful to reduce the noise.
>>
>>
>>that was my idea in the beginning (since I didn't think of events at
>>that point).
>>
>>Still, I think creating messages for any kind of event (even if noone
>>listens) is too much overhead.  netlink needs to be extended to deal
>>with that issue.
>>
>>Maybe the 'which socket is subscribed to which group' accounting should
>>be done by the core netlink layer, which would then only export a
>>merged bitmask of all netlink sockets.  This way ctnetlink can easily
>>check whether it makes sense to create a certain event message or not.
>>
>>This should be useful for other netlink users, too.

Isn't netlink broadcast subscription enough? netlink_broadcast doesn't 
enqueue packets for a socket that isn't subscribed to a group, so the 
process never gets useless packets.

So I think that we can group event, say:

level 1 (weak):
	- IPCT_NEW
	- IPCT_DESTROY

level 2 (normal):
	- IPCT_NEW
	- IPCT_UPDATE
	- IPCT_STATUS
	- ...
	- IPCT_DESTROY

At reserve some groups to let the user define some level whenever he 
wants. Although such level would be unique.

OTOH, there are 10 events currently, why is that bad creating a group 
per event?

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-29 19:13       ` Pablo Neira
@ 2005-06-29 19:52         ` Patrick McHardy
  2005-06-29 20:16           ` Harald Welte
  2005-06-30  0:34           ` Pablo Neira
  0 siblings, 2 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-29 19:52 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> I think that we could split the structure into fine grain fields. For
> example, CTA_TUPLE_ORIG would composed of:
> 
> CTA_ORIG_IPV4_SRC
> CTA_ORIG_IPV4_DST
> CTA_L4_PROTONUM
> CTA_PROTO_IPV4_SRC
> CTA_PROTO_IPV4_DST
> CTA_DIR
> 
> So, instead of sending a packet that contains a reference to an
> ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields
> (CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure.
> 
> But I'll need a function to glue all the fields to create a
> ip_conntrack_tuple. Maybe too bloated?

I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be
CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them.
Please nest all attributes related to a tuple in CTA_TUPLE
instead of adding them to the top-level.

>>> Still, I think creating messages for any kind of event (even if noone
>>> listens) is too much overhead.  netlink needs to be extended to deal
>>> with that issue.
>>>
>>> Maybe the 'which socket is subscribed to which group' accounting should
>>> be done by the core netlink layer, which would then only export a
>>> merged bitmask of all netlink sockets.  This way ctnetlink can easily
>>> check whether it makes sense to create a certain event message or not.
>>>
>>> This should be useful for other netlink users, too.
> 
> 
> Isn't netlink broadcast subscription enough? netlink_broadcast doesn't
> enqueue packets for a socket that isn't subscribed to a group, so the
> process never gets useless packets.

But we still spent time building them, just to have them thrown away.

> So I think that we can group event, say:
> 
> level 1 (weak):
>     - IPCT_NEW
>     - IPCT_DESTROY
> 
> level 2 (normal):
>     - IPCT_NEW
>     - IPCT_UPDATE
>     - IPCT_STATUS
>     - ...
>     - IPCT_DESTROY
> 
> At reserve some groups to let the user define some level whenever he
> wants. Although such level would be unique.
> 
> OTOH, there are 10 events currently, why is that bad creating a group
> per event?

I don't think its bad, but I fear hitting the 32 group limit some day.
Probably that fear is unfounded and we're never going to reach 32
different types of events. Harald, what do you think?

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-29 19:52         ` Patrick McHardy
@ 2005-06-29 20:16           ` Harald Welte
  2005-06-30  0:27             ` Pablo Neira
  2005-06-30  0:34           ` Pablo Neira
  1 sibling, 1 reply; 50+ messages in thread
From: Harald Welte @ 2005-06-29 20:16 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira

[-- Attachment #1: Type: text/plain, Size: 1537 bytes --]

On Wed, Jun 29, 2005 at 09:52:52PM +0200, Patrick McHardy wrote:
> > So I think that we can group event, say:
> > 
> > level 1 (weak):
> >     - IPCT_NEW
> >     - IPCT_DESTROY
> > 
> > level 2 (normal):
> >     - IPCT_NEW
> >     - IPCT_UPDATE
> >     - IPCT_STATUS
> >     - ...
> >     - IPCT_DESTROY
> > 
> > At reserve some groups to let the user define some level whenever he
> > wants. Although such level would be unique.
> > 
> > OTOH, there are 10 events currently, why is that bad creating a group
> > per event?
> 
> I don't think its bad, but I fear hitting the 32 group limit some day.
> Probably that fear is unfounded and we're never going to reach 32
> different types of events. Harald, what do you think?

I think it's not all that much of a problem.  I doubt that conntrack
will get 22 more events anytime soon.  The other problem is, don't we
overlap with other nfnetlink-based groups?

nfnetlink acts as a a multiplex between netlink and the higher layers,
so I guess we only have 32groups for nfnetlink as a whole... and there
is other stuff like ip_queue and ULOG that is to be ported on top of
nfnetlink...

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-29 20:16           ` Harald Welte
@ 2005-06-30  0:27             ` Pablo Neira
  2005-06-30  0:53               ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-30  0:27 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter Development Mailinglist, Patrick McHardy

Harald Welte wrote:
> On Wed, Jun 29, 2005 at 09:52:52PM +0200, Patrick McHardy wrote:
> 
>>>So I think that we can group event, say:
>>>
>>>level 1 (weak):
>>>    - IPCT_NEW
>>>    - IPCT_DESTROY
>>>
>>>level 2 (normal):
>>>    - IPCT_NEW
>>>    - IPCT_UPDATE
>>>    - IPCT_STATUS
>>>    - ...
>>>    - IPCT_DESTROY
>>>
>>>At reserve some groups to let the user define some level whenever he
>>>wants. Although such level would be unique.
>>>
>>>OTOH, there are 10 events currently, why is that bad creating a group
>>>per event?
>>
>>I don't think its bad, but I fear hitting the 32 group limit some day.
>>Probably that fear is unfounded and we're never going to reach 32
>>different types of events. Harald, what do you think?
> 
> 
> I think it's not all that much of a problem.  I doubt that conntrack
> will get 22 more events anytime soon.  The other problem is, don't we
> overlap with other nfnetlink-based groups?
> 
> nfnetlink acts as a a multiplex between netlink and the higher layers,
> so I guess we only have 32groups for nfnetlink as a whole... and there
> is other stuff like ip_queue and ULOG that is to be ported on top of
> nfnetlink...

You're right, we can't do that. In short:

nfnetlink currently relies on the fact that is the user program that 
decides if a packet was sent to it or not. In case that it wasn't, the 
program drops it.

Say we've got ip_queue and ip_conntrack_netlink on top of nfnetlink: 
ip_queue sends a packet to user space and ip_conntrack_netlink sends 
event notifications. If both the ipqueue userspace program and 
libctnetlink are subscribed to the same group, they both will receive 
spam from each other.

To avoid this spamming situation, every subsystem could reserve just one 
group and user programs must get subscribed to the corresponding group 
that is interested in. In conclusion, nfnetlink would support up to 32 
subsystems but user space programs won't get spam from other subsystem.

So, back to ip_conntrack_netlink, we still have the performance issue 
with regards to the events. Userspace programs will decide if the event 
received is interesting for them or not.

a) We could extend netlink code with some kind of ioctl that sets a mask 
to say what kind of events are interesting for a given userspace program.

b) The fact that we can send events and nobody's listening to them is 
more difficult to catch up. Maybe a function that checks if there are 
listeners available before sending an event could help: In that case 
such function would be kind of racy because we can lose some messages 
while a userspace program is binding to a given netlink socket. This 
isn't that annoying, I prefer losing a couple of messages than hurting 
performance generating messages that won't be ever delivered.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  0:27             ` Pablo Neira
@ 2005-06-30  0:53               ` Patrick McHardy
  2005-06-30  9:47                 ` Pablo Neira
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30  0:53 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Harald Welte wrote:
> 
>> I think it's not all that much of a problem.  I doubt that conntrack
>> will get 22 more events anytime soon.  The other problem is, don't we
>> overlap with other nfnetlink-based groups?
>>
>> nfnetlink acts as a a multiplex between netlink and the higher layers,
>> so I guess we only have 32groups for nfnetlink as a whole... and there
>> is other stuff like ip_queue and ULOG that is to be ported on top of
>> nfnetlink...
>
> You're right, we can't do that. In short:
> 
> nfnetlink currently relies on the fact that is the user program that
> decides if a packet was sent to it or not. In case that it wasn't, the
> program drops it.

That's not true, A programm can do pure unicast communication with the
kernel, but can also subscribe to the multicast-groups it is interested
in. Events are sent to multicast groups and, if they happend in response
to a request, to the unicast address. Same for dumps.

> Say we've got ip_queue and ip_conntrack_netlink on top of nfnetlink:
> ip_queue sends a packet to user space and ip_conntrack_netlink sends
> event notifications. If both the ipqueue userspace program and
> libctnetlink are subscribed to the same group, they both will receive
> spam from each other.

Sure, they both need different groups.

> To avoid this spamming situation, every subsystem could reserve just one
> group and user programs must get subscribed to the corresponding group
> that is interested in. In conclusion, nfnetlink would support up to 32
> subsystems but user space programs won't get spam from other subsystem.

That's how it should be done. Conntrack is an exceptional case in that
it is the only part in the kernel I know of that generates asynchronous
netlink messages at high speed, so I think reserving multiple groups
is the right way.

> So, back to ip_conntrack_netlink, we still have the performance issue
> with regards to the events. Userspace programs will decide if the event
> received is interesting for them or not.
> 
> a) We could extend netlink code with some kind of ioctl that sets a mask
> to say what kind of events are interesting for a given userspace program.

Please no ioctls. Groups are fine, if one group per event is too much we
need to group them somehow. Maybe:

1) creation/destruction
2) status changes
3) timeout
4) protoinfo
5) helper
6) nat
7) expectations

> b) The fact that we can send events and nobody's listening to them is
> more difficult to catch up. Maybe a function that checks if there are
> listeners available before sending an event could help: In that case
> such function would be kind of racy because we can lose some messages
> while a userspace program is binding to a given netlink socket. This
> isn't that annoying, I prefer losing a couple of messages than hurting
> performance generating messages that won't be ever delivered.

That race shouldn't be a problem. Userspace can't know when exactly it
will start receiving events anyway.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  0:53               ` Patrick McHardy
@ 2005-06-30  9:47                 ` Pablo Neira
  2005-06-30 21:30                   ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-30  9:47 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
>>To avoid this spamming situation, every subsystem could reserve just one
>>group and user programs must get subscribed to the corresponding group
>>that is interested in. In conclusion, nfnetlink would support up to 32
>>subsystems but user space programs won't get spam from other subsystem.
> 
> 
> That's how it should be done. Conntrack is an exceptional case in that
> it is the only part in the kernel I know of that generates asynchronous
> netlink messages at high speed, so I think reserving multiple groups
> is the right way.

OK, I'm fine with the solution based on groups. Then I'll go that way.

>>So, back to ip_conntrack_netlink, we still have the performance issue
>>with regards to the events. Userspace programs will decide if the event
>>received is interesting for them or not.
>>
>>a) We could extend netlink code with some kind of ioctl that sets a mask
>>to say what kind of events are interesting for a given userspace program.
> 
> 
> Please no ioctls. Groups are fine, if one group per event is too much we
> need to group them somehow. Maybe:
> 
> 1) creation/destruction
> 2) status changes
> 3) timeout
> 4) protoinfo
> 5) helper
> 6) nat
> 7) expectations
> 
> 
>>b) The fact that we can send events and nobody's listening to them is
>>more difficult to catch up. Maybe a function that checks if there are
>>listeners available before sending an event could help: In that case
>>such function would be kind of racy because we can lose some messages
>>while a userspace program is binding to a given netlink socket. This
>>isn't that annoying, I prefer losing a couple of messages than hurting
>>performance generating messages that won't be ever delivered.
> 
> 
> That race shouldn't be a problem. Userspace can't know when exactly it
> will start receiving events anyway.

So, should be a netlink function to check if there are currently 
listeners worth it?

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  9:47                 ` Pablo Neira
@ 2005-06-30 21:30                   ` Patrick McHardy
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30 21:30 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> So, should be a netlink function to check if there are currently
> listeners worth it?

That sounds fine. Messages should then only be built if there is
a listener for this group or if it is a reply to a unicast request.
BTW, I noticed nfnetlink_send is always called with pid = 0 and
echo = 0. This needs to be fixed, whenever anything happens in response
to a request the correct pid and echo = 1 need to be used.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-29 19:52         ` Patrick McHardy
  2005-06-29 20:16           ` Harald Welte
@ 2005-06-30  0:34           ` Pablo Neira
  2005-06-30  1:00             ` Patrick McHardy
  1 sibling, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-30  0:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira wrote:
> 
>>I think that we could split the structure into fine grain fields. For
>>example, CTA_TUPLE_ORIG would composed of:
>>
>>CTA_ORIG_IPV4_SRC
>>CTA_ORIG_IPV4_DST
>>CTA_L4_PROTONUM
>>CTA_PROTO_IPV4_SRC
>>CTA_PROTO_IPV4_DST
>>CTA_DIR
>>
>>So, instead of sending a packet that contains a reference to an
>>ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields
>>(CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure.
>>
>>But I'll need a function to glue all the fields to create a
>>ip_conntrack_tuple. Maybe too bloated?
> 
> 
> I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be
> CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them.

That could be fine for nf_conntrack to differenciate between conntracks 
that represents an ipv4 and ipv6 connections.

CTA_PROTO_IPV4_SRC and CTA_PROTO_IPV6_SRC

> Please nest all attributes related to a tuple in CTA_TUPLE
> instead of adding them to the top-level.

Sorry, how that will look like?

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  0:34           ` Pablo Neira
@ 2005-06-30  1:00             ` Patrick McHardy
  2005-06-30  1:49               ` Thomas Graf
  2005-06-30 17:06               ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira
  0 siblings, 2 replies; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30  1:00 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist

Pablo Neira wrote:
> Patrick McHardy wrote:
>>
>> I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be
>> CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them.
> 
> 
> That could be fine for nf_conntrack to differenciate between conntracks
> that represents an ipv4 and ipv6 connections.
> 
> CTA_PROTO_IPV4_SRC and CTA_PROTO_IPV6_SRC

It can do that by looking at the IP attributes (CTA_ORIG_IPV4_SRC and
CTA_IPV6_SRC).

>> Please nest all attributes related to a tuple in CTA_TUPLE
>> instead of adding them to the top-level.
> 
> Sorry, how that will look like?

You add a new nfattr header with the type set to CTA_TUPLE and
note the position of skb->tail. Then you add the nested attributes
as usual. When you're done you set the length of the nfattr header
to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle
this for rtnetlink (include/linux/rtnetlink.h).

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  1:00             ` Patrick McHardy
@ 2005-06-30  1:49               ` Thomas Graf
  2005-06-30  1:53                 ` Patrick McHardy
  2005-06-30 17:06               ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Graf @ 2005-06-30  1:49 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

* Patrick McHardy <42C34445.9020709@trash.net> 2005-06-30 03:00
> You add a new nfattr header with the type set to CTA_TUPLE and
> note the position of skb->tail. Then you add the nested attributes
> as usual. When you're done you set the length of the nfattr header
> to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle
> this for rtnetlink (include/linux/rtnetlink.h).

At some point we should introduce a "generic" attribute
architecture for all netlink families. The connector
stuff will need it as well, so maybe the time has come
to actually do it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  1:49               ` Thomas Graf
@ 2005-06-30  1:53                 ` Patrick McHardy
  2005-06-30 12:03                   ` Thomas Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30  1:53 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira

Thomas Graf wrote:
> * Patrick McHardy <42C34445.9020709@trash.net> 2005-06-30 03:00
> 
>>You add a new nfattr header with the type set to CTA_TUPLE and
>>note the position of skb->tail. Then you add the nested attributes
>>as usual. When you're done you set the length of the nfattr header
>>to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle
>>this for rtnetlink (include/linux/rtnetlink.h).
> 
> 
> At some point we should introduce a "generic" attribute
> architecture for all netlink families. The connector
> stuff will need it as well, so maybe the time has come
> to actually do it.

I agree, most of the macros are just copied without modification
anyway. One more thing I would like to change is the excessive
use of RTA_PUT with structures on the stack. Using __RTA_PUT
and putting together these structures in-place would be much
nicer. nfnetlink has copied this part of rtnetlink.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30  1:53                 ` Patrick McHardy
@ 2005-06-30 12:03                   ` Thomas Graf
  2005-06-30 13:27                     ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Graf @ 2005-06-30 12:03 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, netdev, Jamal Hadi Salim,
	Netfilter Development Mailinglist, Pablo Neira

* Patrick McHardy <42C350A1.1030602@trash.net> 2005-06-30 03:53
> Thomas Graf wrote:
> > At some point we should introduce a "generic" attribute
> > architecture for all netlink families. The connector
> > stuff will need it as well, so maybe the time has come
> > to actually do it.
> 
> I agree, most of the macros are just copied without modification
> anyway. One more thing I would like to change is the excessive
> use of RTA_PUT with structures on the stack. Using __RTA_PUT
> and putting together these structures in-place would be much
> nicer. nfnetlink has copied this part of rtnetlink.

Good point, after all, I think structs have been used too often
and we now suffer from backwards compatbility issues. The
points likely to be argueable are things like trim responsibility
in the error handling, i.e. delegate it down to the function
which also created the header or trim on every level. Other than
that the whole thing should be pretty straight forward.

What do you think about naming it nlattr?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 12:03                   ` Thomas Graf
@ 2005-06-30 13:27                     ` Patrick McHardy
  2005-06-30 18:02                       ` Thomas Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30 13:27 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Harald Welte, netdev, Jamal Hadi Salim,
	Netfilter Development Mailinglist, Pablo Neira

Thomas Graf wrote:
> Good point, after all, I think structs have been used too often
> and we now suffer from backwards compatbility issues. The
> points likely to be argueable are things like trim responsibility
> in the error handling, i.e. delegate it down to the function
> which also created the header or trim on every level. Other than
> that the whole thing should be pretty straight forward.

I think for nested attributes error handling should happen on the
outer level. Just trimming on the inner level would leave a
half-finished nested attribute.

> What do you think about naming it nlattr?

Sounds good.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 13:27                     ` Patrick McHardy
@ 2005-06-30 18:02                       ` Thomas Graf
  2005-06-30 21:26                         ` Patrick McHardy
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Graf @ 2005-06-30 18:02 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, netdev, Jamal Hadi Salim,
	Netfilter Development Mailinglist, Pablo Neira

* Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27
> Thomas Graf wrote:
> > Good point, after all, I think structs have been used too often
> > and we now suffer from backwards compatbility issues. The
> > points likely to be argueable are things like trim responsibility
> > in the error handling, i.e. delegate it down to the function
> > which also created the header or trim on every level. Other than
> > that the whole thing should be pretty straight forward.
> 
> I think for nested attributes error handling should happen on the
> outer level. Just trimming on the inner level would leave a
> half-finished nested attribute.

I can agree with this, the question that remains is: do we want
to trim in functions where no nesting is done at all? i.e. things
like the generic network statistics dumping interface.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 18:02                       ` Thomas Graf
@ 2005-06-30 21:26                         ` Patrick McHardy
  2005-06-30 21:34                           ` Thomas Graf
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick McHardy @ 2005-06-30 21:26 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Harald Welte, netdev, Jamal Hadi Salim,
	Netfilter Development Mailinglist, Pablo Neira

Thomas Graf wrote:
> * Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27
>
>>I think for nested attributes error handling should happen on the
>>outer level. Just trimming on the inner level would leave a
>>half-finished nested attribute.
> 
> I can agree with this, the question that remains is: do we want
> to trim in functions where no nesting is done at all? i.e. things
> like the generic network statistics dumping interface.

Trimming isn't required in this case since the length is know and
checked in advance. An error should still be propagated back of
course so potential outer levels can do trimming.

Regards
Patrick

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 21:26                         ` Patrick McHardy
@ 2005-06-30 21:34                           ` Thomas Graf
  2005-06-30 21:49                             ` David S. Miller
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Graf @ 2005-06-30 21:34 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Harald Welte, netdev, Jamal Hadi Salim,
	Netfilter Development Mailinglist, Pablo Neira

* Patrick McHardy <42C46374.5000407@trash.net> 2005-06-30 23:26
> Thomas Graf wrote:
> > * Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27
> >
> >>I think for nested attributes error handling should happen on the
> >>outer level. Just trimming on the inner level would leave a
> >>half-finished nested attribute.
> > 
> > I can agree with this, the question that remains is: do we want
> > to trim in functions where no nesting is done at all? i.e. things
> > like the generic network statistics dumping interface.
> 
> Trimming isn't required in this case since the length is know and
> checked in advance. An error should still be propagated back of
> course so potential outer levels can do trimming.

This is only true if one attribute is added. What I really meant
is something like em_text_dump() in em_text.c. Assuming we rework
areas like this one in order to avoid structs on the stack, we'll
have a lot of such cases where we might hit the limit while not
really knowing about the start of the nested tlv.

Anyways, I think this is a minor issue and we can safely ignore it
and delegate the trimming responsibility to the underlying layer
which "started" the nested tlv.

The gnet_stats interface was a bad example since it doesn't have
such a case.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 21:34                           ` Thomas Graf
@ 2005-06-30 21:49                             ` David S. Miller
  2005-06-30 22:08                               ` Thomas Graf
  0 siblings, 1 reply; 50+ messages in thread
From: David S. Miller @ 2005-06-30 21:49 UTC (permalink / raw)
  To: tgraf; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 30 Jun 2005 23:34:59 +0200

> This is only true if one attribute is added. What I really meant
> is something like em_text_dump() in em_text.c.

Speaking of em_text.c, when one loads ematch rules into that
module it does a printk().  I assume that's a thinko and should
be removed?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 21:49                             ` David S. Miller
@ 2005-06-30 22:08                               ` Thomas Graf
  2005-06-30 22:08                                 ` David S. Miller
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Graf @ 2005-06-30 22:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber

* David S. Miller <20050630.144942.74731532.davem@davemloft.net> 2005-06-30 14:49
> From: Thomas Graf <tgraf@suug.ch>
> Date: Thu, 30 Jun 2005 23:34:59 +0200
> 
> > This is only true if one attribute is added. What I really meant
> > is something like em_text_dump() in em_text.c.
> 
> Speaking of em_text.c, when one loads ematch rules into that
> module it does a printk().  I assume that's a thinko and should
> be removed?

Yes, it's a debugging leftover but I'll keep it in until rc3 so if
problems appear I have some debugging information. Given that is acceptable.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-30 22:08                               ` Thomas Graf
@ 2005-06-30 22:08                                 ` David S. Miller
  0 siblings, 0 replies; 50+ messages in thread
From: David S. Miller @ 2005-06-30 22:08 UTC (permalink / raw)
  To: tgraf; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber

From: Thomas Graf <tgraf@suug.ch>
Date: Fri, 1 Jul 2005 00:08:14 +0200

> Yes, it's a debugging leftover but I'll keep it in until rc3 so if
> problems appear I have some debugging information. Given that is acceptable.

No problem.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-06-30  1:00             ` Patrick McHardy
  2005-06-30  1:49               ` Thomas Graf
@ 2005-06-30 17:06               ` Pablo Neira
  2005-07-11 16:30                 ` Amin Azez
  1 sibling, 1 reply; 50+ messages in thread
From: Pablo Neira @ 2005-06-30 17:06 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist

Hi,

Here below a candidate of attributes for ctnetlink. As you could see 
some attributes are still missing, for example, those to modify internal 
helper structures like ip_ct_ftp... but that would be easy to add later.

enum ctattr_type {
	CTA_UNSPEC,
	CTA_TUPLE,
	CTA_STATUS,
	CTA_PROTOINFO,
	CTA_HELP,
	CTA_NAT,
	CTA_TIMEOUT,
	CTA_MARK,
	CTA_COUNTERS,
	CTA_EXPECT,
	__CTA_MAX
};

enum ctattr_tuple {
	CTA_TUPLE_DIR,
	CTA_TUPLE_IP,
	CTA_TUPLE_PROTO,
	__CTA_TUPLE_MAX
};

enum ctattr_ip {
	CTA_IP_V4_SRC,
	CTA_IP_V4_DST,
	CTA_IP_V6_SRC,
	CTA_IP_V6_DST,
	__CTA_IP_MAX
};

enum ctattr_l4proto {
	CTA_PROTO_NUM,
	CTA_PROTO_TCP_SRC,
	CTA_PROTO_TCP_DST,
	CTA_PROTO_UDP_SRC,
	CTA_PROTO_UDP_DST,
	CTA_PROTO_ICMP_TYPE,
	CTA_PROTO_ICMP_CODE,
	CTA_PROTO_SCTP_CODE,
	__CTA_PROTO_MAX
};

enum ctattr_protoinfo {
	CTA_PROTOINFO_TCP,
	__CTA_PROTOINFO_MAX
};

enum ctattr_protoinfo_tcp {
	CTA_TCP_STATE,
	__CTA_TCP_MAX
};

enum ctattr_counters {
	CTA_COUNTERS_PACKETS,
	CTA_COUNTERS_BYTES,
	__CTA_COUNTERS_MAX
};

enum ctattr_nat {
	CTA_NAT_MINIP,
	CTA_NAT_MAXIP,
	CTA_NAT_PROTO,
	__CTA_NAT_MAX
};

enum ctattr_protonat {
	CTA_PROTONAT_MINIP,
	CTA_PROTONAT_MAXIP,
	CTA_PROTONAT_TCP,
	CTA_PROTONAT_UDP,
	__CTA_PROTONAT_MAX
};

enum ctattr_expect {
	CTA_EXPECT_TUPLE,
	CTA_EXPECT_MASK,
	CTA_EXPECT_TIMEOUT,
	__CTA_EXPECT_MAX
};

enum ctattr_help {
	CTA_HELP_NAME,
	__CTA_HELP_MAX
};

Comments welcome.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-06-30 17:06               ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira
@ 2005-07-11 16:30                 ` Amin Azez
  2005-07-11 16:50                   ` Jan Engelhardt
  2005-07-11 17:10                   ` Harald Welte
  0 siblings, 2 replies; 50+ messages in thread
From: Amin Azez @ 2005-07-11 16:30 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Harald Welte

I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just
because of my fixation with MAC addresses but also to use conntrack to
record flow data for non IP-related protocols.

I realise it is not the current intent of conntrack to do this, but as
much as conntrack solves the high-load problems of ulog, it is desirable
for it to keep counters for non-ip protocols.

As fast as protocol handlers can be devised it will become "conntrack"
again instead of "flowtrack", but the universal identifier between hosts
involved in a connection is the mac address and protocol. The IP
addresses and direction and and even port numbers etc can only be
resolved by some higher level protocol hanlders.

The question is, how do others feel about conntrack tracking non-ip
connections? I intend to provide patches to do this if it is welcome.

Amin

Pablo Neira wrote:
> Hi,
> 
> Here below a candidate of attributes for ctnetlink. As you could see
> some attributes are still missing, for example, those to modify internal
> helper structures like ip_ct_ftp... but that would be easy to add later.
> 
> enum ctattr_type {
>     CTA_UNSPEC,
>     CTA_TUPLE,
>     CTA_STATUS,
>     CTA_PROTOINFO,
>     CTA_HELP,
>     CTA_NAT,
>     CTA_TIMEOUT,
>     CTA_MARK,
>     CTA_COUNTERS,
>     CTA_EXPECT,
>     __CTA_MAX
> };
> 
> enum ctattr_tuple {
>     CTA_TUPLE_DIR,
>     CTA_TUPLE_IP,
>     CTA_TUPLE_PROTO,
>     __CTA_TUPLE_MAX
> };
> 
> enum ctattr_ip {
>     CTA_IP_V4_SRC,
>     CTA_IP_V4_DST,
>     CTA_IP_V6_SRC,
>     CTA_IP_V6_DST,
>     __CTA_IP_MAX
> };
> 
> enum ctattr_l4proto {
>     CTA_PROTO_NUM,
>     CTA_PROTO_TCP_SRC,
>     CTA_PROTO_TCP_DST,
>     CTA_PROTO_UDP_SRC,
>     CTA_PROTO_UDP_DST,
>     CTA_PROTO_ICMP_TYPE,
>     CTA_PROTO_ICMP_CODE,
>     CTA_PROTO_SCTP_CODE,
>     __CTA_PROTO_MAX
> };
> 
> enum ctattr_protoinfo {
>     CTA_PROTOINFO_TCP,
>     __CTA_PROTOINFO_MAX
> };
> 
> enum ctattr_protoinfo_tcp {
>     CTA_TCP_STATE,
>     __CTA_TCP_MAX
> };
> 
> enum ctattr_counters {
>     CTA_COUNTERS_PACKETS,
>     CTA_COUNTERS_BYTES,
>     __CTA_COUNTERS_MAX
> };
> 
> enum ctattr_nat {
>     CTA_NAT_MINIP,
>     CTA_NAT_MAXIP,
>     CTA_NAT_PROTO,
>     __CTA_NAT_MAX
> };
> 
> enum ctattr_protonat {
>     CTA_PROTONAT_MINIP,
>     CTA_PROTONAT_MAXIP,
>     CTA_PROTONAT_TCP,
>     CTA_PROTONAT_UDP,
>     __CTA_PROTONAT_MAX
> };
> 
> enum ctattr_expect {
>     CTA_EXPECT_TUPLE,
>     CTA_EXPECT_MASK,
>     CTA_EXPECT_TIMEOUT,
>     __CTA_EXPECT_MAX
> };
> 
> enum ctattr_help {
>     CTA_HELP_NAME,
>     __CTA_HELP_MAX
> };
> 
> Comments welcome.
> 
> -- 
> Pablo
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 16:30                 ` Amin Azez
@ 2005-07-11 16:50                   ` Jan Engelhardt
  2005-07-11 17:11                     ` Harald Welte
  2005-07-11 17:10                   ` Harald Welte
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Engelhardt @ 2005-07-11 16:50 UTC (permalink / raw)
  To: Amin Azez; +Cc: Harald Welte, netfilter-devel

>I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just
>because of my fixation with MAC addresses but also to use conntrack to
>record flow data for non IP-related protocols.

Can't this be done with ebtables?

On another note, how about bringing ebt and ipt closer together, maybe merge 
them?


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 16:50                   ` Jan Engelhardt
@ 2005-07-11 17:11                     ` Harald Welte
  2005-07-11 17:40                       ` Jan Engelhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Harald Welte @ 2005-07-11 17:11 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, Amin Azez

[-- Attachment #1: Type: text/plain, Size: 933 bytes --]

On Mon, Jul 11, 2005 at 06:50:20PM +0200, Jan Engelhardt wrote:
> >I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just
> >because of my fixation with MAC addresses but also to use conntrack to
> >record flow data for non IP-related protocols.
> 
> Can't this be done with ebtables?
> 
> On another note, how about bringing ebt and ipt closer together, maybe merge 
> them?

yes, the good old pkttables idea.  Unfortunately the ever growing number
of gpl infringing companies keeps me from tackling this rater large
project :(

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 17:11                     ` Harald Welte
@ 2005-07-11 17:40                       ` Jan Engelhardt
  2005-07-12  7:54                         ` Harald Welte
  0 siblings, 1 reply; 50+ messages in thread
From: Jan Engelhardt @ 2005-07-11 17:40 UTC (permalink / raw)
  To: Harald Welte; +Cc: netfilter-devel, Amin Azez


>> On another note, how about bringing ebt and ipt closer together, maybe merge 
>> them?
>
>yes, the good old pkttables idea.  Unfortunately the ever growing number
>of gpl infringing companies keeps me from tackling this rater large
>project :(

Where is the (legal) problem?



Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 17:40                       ` Jan Engelhardt
@ 2005-07-12  7:54                         ` Harald Welte
  0 siblings, 0 replies; 50+ messages in thread
From: Harald Welte @ 2005-07-12  7:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel, Amin Azez

[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]

On Mon, Jul 11, 2005 at 07:40:36PM +0200, Jan Engelhardt wrote:
> 
> >> On another note, how about bringing ebt and ipt closer together, maybe merge 
> >> them?
> >
> >yes, the good old pkttables idea.  Unfortunately the ever growing number
> >of gpl infringing companies keeps me from tackling this rater large
> >project :(
> 
> Where is the (legal) problem?

I'm not talking about legal problems. I'm just talking about the
ever-growing percentage of my daily working time that is consumed
with handling those cases.  My initial hope was that a number of
high-profile cases would cause enough publicity to reduce the amount of
GPL violations we're seeing.  At the moment I'd say rather is the
opposite way.  I hope we'll reach break-even at some point...

Please follow-up to private mail, if there is the need to.  This is
off-topic on netfilter-devel, and I merely mentioned it as an excuse for
my lack of time.

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 16:30                 ` Amin Azez
  2005-07-11 16:50                   ` Jan Engelhardt
@ 2005-07-11 17:10                   ` Harald Welte
  2005-07-11 17:45                     ` Jan Engelhardt
  2005-07-12  8:18                     ` Amin Azez
  1 sibling, 2 replies; 50+ messages in thread
From: Harald Welte @ 2005-07-11 17:10 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Pablo Neira

[-- Attachment #1: Type: text/plain, Size: 1873 bytes --]

On Mon, Jul 11, 2005 at 05:30:56PM +0100, Amin Azez wrote:
> I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just
> because of my fixation with MAC addresses but also to use conntrack to
> record flow data for non IP-related protocols.

ctnetlink MAC attributes are IMHO nonsense because there is no mac
address tracking in conntrack.
 
> I realise it is not the current intent of conntrack to do this, but as
> much as conntrack solves the high-load problems of ulog, it is desirable
> for it to keep counters for non-ip protocols.

Also, MAC address tracking doesn't really make sense since as an
intermediate router there is no way that the assumption "source and
destination mac never change" is ever valid.

> As fast as protocol handlers can be devised it will become "conntrack"
> again instead of "flowtrack", but the universal identifier between hosts
> involved in a connection is the mac address and protocol. 

no.  you always only see the next hop mac address.  let's say you have
an upsteram and a downstream router, then all your 'connections' would
be between the same two mac addresses.

> The question is, how do others feel about conntrack tracking non-ip
> connections? I intend to provide patches to do this if it is welcome.

nf_conntrack goes the way to layer-3-independent connection tracking.
However, conntrack will remain to happen at layer3 and 4 (with minor
exceptions to higher layers in the case of conntrack helpers).

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 17:10                   ` Harald Welte
@ 2005-07-11 17:45                     ` Jan Engelhardt
  2005-07-12  7:55                       ` Harald Welte
  2005-07-12  8:18                     ` Amin Azez
  1 sibling, 1 reply; 50+ messages in thread
From: Jan Engelhardt @ 2005-07-11 17:45 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter Development Mailinglist


>nf_conntrack goes the way to layer-3-independent connection tracking.
>However, conntrack will remain to happen at layer3 and 4 (with minor
>exceptions to higher layers in the case of conntrack helpers).

Minor? Hehe that's good :p Reminds me of l7-filter.sf.net
(It's not even a helper as in ip_conntrack_irc, but a match module)


Jan Engelhardt
-- 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 17:45                     ` Jan Engelhardt
@ 2005-07-12  7:55                       ` Harald Welte
  0 siblings, 0 replies; 50+ messages in thread
From: Harald Welte @ 2005-07-12  7:55 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 971 bytes --]

On Mon, Jul 11, 2005 at 07:45:17PM +0200, Jan Engelhardt wrote:
> 
> >nf_conntrack goes the way to layer-3-independent connection tracking.
> >However, conntrack will remain to happen at layer3 and 4 (with minor
> >exceptions to higher layers in the case of conntrack helpers).
> 
> Minor? Hehe that's good :p Reminds me of l7-filter.sf.net
> (It's not even a helper as in ip_conntrack_irc, but a match module)

exactly, it's an iptables match module and therefore not related to
nf_conntrack or ip_conntrack.  And there is a reason I don't like the
idea of pushing it into mainline...

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API]
  2005-07-11 17:10                   ` Harald Welte
  2005-07-11 17:45                     ` Jan Engelhardt
@ 2005-07-12  8:18                     ` Amin Azez
  1 sibling, 0 replies; 50+ messages in thread
From: Amin Azez @ 2005-07-12  8:18 UTC (permalink / raw)
  To: Harald Welte; +Cc: Netfilter Development Mailinglist, Pablo Neira

Harald Welte wrote:

>On Mon, Jul 11, 2005 at 05:30:56PM +0100, Amin Azez wrote:
>  
>
>>I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just
>>because of my fixation with MAC addresses but also to use conntrack to
>>record flow data for non IP-related protocols.
>>    
>>
>
>ctnetlink MAC attributes are IMHO nonsense because there is no mac
>address tracking in conntrack.
>  
>
Indeed; I was proposing that it be so, I'm currently maintaining
conntrack patches where mac addresses are part of the conntrack.

>>I realise it is not the current intent of conntrack to do this, but as
>>much as conntrack solves the high-load problems of ulog, it is desirable
>>for it to keep counters for non-ip protocols.
>>    
>>
>
>Also, MAC address tracking doesn't really make sense since as an
>intermediate router there is no way that the assumption "source and
>destination mac never change" is ever valid.
>  
>
This is also true, there are few circumstances that the mac address
could change within the life of a connection and maintain the
connection, and it is uncertain in the scenario I proposed how this
should be handled.

>>As fast as protocol handlers can be devised it will become "conntrack"
>>again instead of "flowtrack", but the universal identifier between hosts
>>involved in a connection is the mac address and protocol. 
>>    
>>
>
>no.  you always only see the next hop mac address.  let's say you have
>an upsteram and a downstream router, then all your 'connections' would
>be between the same two mac addresses.
>  
>
The use would be limited to switched or transparently bridged networks
which is where most non-ip trafffic will be kept. I don't think this
particularly limits the use; many protocols do not employ a global
address space anyway.

SPX protocol uses the MAC address in conjunction with a network number
for node addressing.

>>The question is, how do others feel about conntrack tracking non-ip
>>connections? I intend to provide patches to do this if it is welcome.
>>    
>>
>
>nf_conntrack goes the way to layer-3-independent connection tracking.
>However, conntrack will remain to happen at layer3 and 4 (with minor
>exceptions to higher layers in the case of conntrack helpers).
>  
>
So you are not against tracking non-IP traffic as such?
I agree that it is ideal to break open the layer 3 addressing for use in
the conntrack tuple, I was suggesting using mac address and protocol as
a fallback when this was not available.

Amin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
                   ` (2 preceding siblings ...)
  2005-06-27 22:40 ` Patrick McHardy
@ 2005-06-28 23:44 ` Josh Samuelson
  2005-06-29 19:14   ` Pablo Neira
  2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez
  2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez
  5 siblings, 1 reply; 50+ messages in thread
From: Josh Samuelson @ 2005-06-28 23:44 UTC (permalink / raw)
  To: Pablo Neira; +Cc: Netfilter Development Mailinglist

On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote:
> o conntrack event API

> - IPCT_DELIVERED bit. Loopback reports event are reported twice, this 
> bit is set once event are delivered. I just came up with a better idea, 
> reset nfcache once the events have been delivered, but I'll apply this 
> change in the next patchset.

Hi Pablo,

I noticed this with my flow module, any interface connecting to itself
would create two IPCT_NEW events.  I had made my own patch to your
linux-2.6.11.conntrack-event-api.patch to call
ip_conntrack_event_cache_init() after the events were delivered.
I was gonna send it your way but other things came up and there
were other problems that I was trying to figure out that were not
quite working the way I thought they should.  I investigated
further and think I've found the problem, but I'm not sure
how to move further on it, so now it's time to ask the list:

Can anyone explain why in ip_conntrack_proto_icmp.c
we delete the conntrack of an ICMP packet as soon as all replies
are seen instead of letting ip_conntrack_icmp_timeout take the
conntrack like other protocol types?  As far as I can see, this
breaks the conntrack event API because I get the following
sequence of notifications for an ICMP echo request to the host
(this is with the ip_conntrack_event_cache_init() patch after
notifications have been delivered so I don't get two IPCT_NEW events):

IPCT_NEW
IPCT_DESTROY
IPCT_STATUS

I think nfct->use is still held and the conntrack is never really
destroyed after death_by_timeout(), so later when the reply comes
the conntrack still exists in the hash and the conntrack API
fires a IPCT_STATUS event after the IPCT_DESTROY event was fired,
which I'm assuming is an invalid sequence of events. ;)

So, what would be the harm in the following patch to
ip_conntrack_proto_icmp.c?

diff -u a/ip_conntrack_proto_icmp.c b/ip_conntrack_proto_icmp.c
--- a/ip_conntrack_proto_icmp.c 2005-06-28 16:52:18.000000000 -0500
+++ b/ip_conntrack_proto_icmp.c 2005-06-28 16:52:46.000000000 -0500
@@ -92,15 +92,7 @@
                       const struct sk_buff *skb,
                       enum ip_conntrack_info ctinfo)
 {
-       /* Try to delete connection immediately after all replies:
-           won't actually vanish as we still have skb, and del_timer
-           means this will only run once even if count hits zero twice
-           (theoretically possible with SMP) */
-       if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) {
-               if (atomic_dec_and_test(&ct->proto.icmp.count)
-                   && del_timer(&ct->timeout))
-                       ct->timeout.function((unsigned long)ct);
-       } else {
+       if (CTINFO2DIR(ctinfo) != IP_CT_DIR_REPLY) {
                atomic_inc(&ct->proto.icmp.count);
                ip_ct_refresh_acct(ct, ctinfo, skb, ip_ct_icmp_timeout);
        }

Otherwise you'll need to move the IPCT_DESTROY event out of
death_by_timeout() and into destroy_conntrack().  I like the former
option because then ICMP can actually be utilized in the flow module.
The only reservations I have with it lie in the other ip_conntrack_proto_*
files that may have del_timer() followed by death_by_timeout(), this
may create other gotchas in the connection event API sequence, I'm
not sure.  Perhaps both changes could be made? :)

Cheers,
-Josh

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson
@ 2005-06-29 19:14   ` Pablo Neira
  0 siblings, 0 replies; 50+ messages in thread
From: Pablo Neira @ 2005-06-29 19:14 UTC (permalink / raw)
  To: Josh Samuelson; +Cc: Netfilter Development Mailinglist

Hi Josh,

Josh Samuelson wrote:
> On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote:
> 
>>o conntrack event API
> 
> 
>>- IPCT_DELIVERED bit. Loopback reports event are reported twice, this 
>>bit is set once event are delivered. I just came up with a better idea, 
>>reset nfcache once the events have been delivered, but I'll apply this 
>>change in the next patchset.
> 
> I noticed this with my flow module, any interface connecting to itself
> would create two IPCT_NEW events.  I had made my own patch to your
> linux-2.6.11.conntrack-event-api.patch to call
> ip_conntrack_event_cache_init() after the events were delivered.

Indeed, this is what I planned to fix this problem ;). Just killed the 
IPCT_DELIVERED bit, it will be included in next version of the 
ctevent-api. I'll post it soon.

> I was gonna send it your way but other things came up and there
> were other problems that I was trying to figure out that were not
> quite working the way I thought they should.  I investigated
> further and think I've found the problem, but I'm not sure
> how to move further on it, so now it's time to ask the list:
> 
> Can anyone explain why in ip_conntrack_proto_icmp.c
> we delete the conntrack of an ICMP packet as soon as all replies
> are seen instead of letting ip_conntrack_icmp_timeout take the
> conntrack like other protocol types?  As far as I can see, this
> breaks the conntrack event API because I get the following
> sequence of notifications for an ICMP echo request to the host
> (this is with the ip_conntrack_event_cache_init() patch after
> notifications have been delivered so I don't get two IPCT_NEW events):
> 
> IPCT_NEW
> IPCT_DESTROY
> IPCT_STATUS
> 
> I think nfct->use is still held and the conntrack is never really
> destroyed after death_by_timeout(), so later when the reply comes
> the conntrack still exists in the hash and the conntrack API
> fires a IPCT_STATUS event after the IPCT_DESTROY event was fired,
> which I'm assuming is an invalid sequence of events. ;)

OK, I moved the IPCT_DESTROY event to destroy_conntrack instead of 
death_by_timeout, this must fix the problem of out of sequence events.

> So, what would be the harm in the following patch to
> ip_conntrack_proto_icmp.c?
> 
> diff -u a/ip_conntrack_proto_icmp.c b/ip_conntrack_proto_icmp.c
> --- a/ip_conntrack_proto_icmp.c 2005-06-28 16:52:18.000000000 -0500
> +++ b/ip_conntrack_proto_icmp.c 2005-06-28 16:52:46.000000000 -0500
> @@ -92,15 +92,7 @@
>                        const struct sk_buff *skb,
>                        enum ip_conntrack_info ctinfo)
>  {
> -       /* Try to delete connection immediately after all replies:
> -           won't actually vanish as we still have skb, and del_timer
> -           means this will only run once even if count hits zero twice
> -           (theoretically possible with SMP) */
> -       if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) {
> -               if (atomic_dec_and_test(&ct->proto.icmp.count)
> -                   && del_timer(&ct->timeout))
> -                       ct->timeout.function((unsigned long)ct);
> -       } else {
> +       if (CTINFO2DIR(ctinfo) != IP_CT_DIR_REPLY) {
>                 atomic_inc(&ct->proto.icmp.count);
>                 ip_ct_refresh_acct(ct, ctinfo, skb, ip_ct_icmp_timeout);
>         }

I don't want to change the current icmp tracking logic, the connection 
tracking table could get full sooner because of icmp entries that has 
already seen their reply wanting to timeout.

--
Pablo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* NETLINK_NETFILTER and NETLINK_FIB_LOOKUP
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
                   ` (3 preceding siblings ...)
  2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson
@ 2005-07-11 11:34 ` Amin Azez
  2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez
  5 siblings, 0 replies; 50+ messages in thread
From: Amin Azez @ 2005-07-11 11:34 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Harald Welte


2.6-git defines

#define NETLINK_FIB_LOOKUP      10

which conflicts with

#define NETLINK_NETFILTER     10      /* netfilter subsystem */



Are we going to have to change number again?

Should we push this definition upstream just to get a fixed number
reserved? I don't think netfilter is going away any time soon.

Azez

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API
  2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
                   ` (4 preceding siblings ...)
  2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez
@ 2005-07-11 16:32 ` Amin Azez
  5 siblings, 0 replies; 50+ messages in thread
From: Amin Azez @ 2005-07-11 16:32 UTC (permalink / raw)
  To: netfilter-devel

Pablo, I realise that there was a lot of dicussion on various parts of
your rework of conntrack.

Is the SVN patch at
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch
still preferred, or has all that you intend been committed to SVN already?

Thanks

Amin

Pablo Neira wrote:
> Hi Harald,
> 
> This patchset introduces tons of updates for the nfnetlink, ctnetlink
> and the conntrack event API. I haven't attached the file since it's that
> big, about 100K.
> 
> You can get an incremental diff against SVN from:
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch
> 
> 
> Please apply.
> 
> 
> I've split this big patch above into four pieces to make it easier to
> understand the changes:
> http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/
> 
> So these four patches shouldn't be applied, just they are meant to make
> your life easier to track the changes.
> 
> Summary of changes
> ------------------
> 
> o conntrack event API
> - Don't kill NFC_IP_* stuff, keep it there to ensure for old iptables
> versions compilation.
> - new file ip_conntrack_events.h that contains all event related
> functions to reduce pollution in ip_conntrack.h
> - IPCT_DELIVERED bit. Loopback reports event are reported twice, this
> bit is set once event are delivered. I just came up with a better idea,
> reset nfcache once the events have been delivered, but I'll apply this
> change in the next patchset.
> 
> o nfnetlink
> - kill unused list.
> - kill nfnl_exlock(), not needed anymore.
> - kill duplicated check: NFNL_SUBSYS_ID(type) > NFNL_SUBSYS_COUNT.
> - kill unneeded initialization of subsys_table to NULL, since it's in
> BSS section (already set to zero).
> - kill dead define CONFIG_NF_NETLINK.
> 
> o ctnetlink
> - merge ctnetlink_get_mcgroups and ctnetlink_get_exp_mcgroups
> - implemented NAT handlings
> - kill unused ctnetlink_kill
> - use __u64 id's for conntracks
> - stop using NLMSG_DONE to report the end of a dump, use explicite ACK
> instead (NLM_F_ACK).
> - fixed broken expectation timeout dumping.
> - kill unused ctnetlink_exp_dump_proto
> - kill ctnetlink_exp_dump: fairly small and just used once
> - kill NFNL_SUBSYS_CTNETLINK_EXP, use NFNL_SUBSYS_CTNETLINK instead
> - Fix expectation table dumping
> - Fix expectation creation
> - implemented flushing of the expect table
> 
> TODO
> ----
> 
> - Implement ip_conntrack_stats dumping and reset (accounting)
> - Implement get conntrack and destroy (accounting)
> - Kill event/dump mask based (?). Although it's unique, I think that it
> could be useful for weak conntrack event notification (think of just
> new, established and destroy event notification to reduce performance
> impact).
> 
> Once ip_conntrack_netlink gets fully featured and people don't report
> bugs for quite some time. I'll create a nf_conntrack_netlink tree.
> 
> -- 
> Pablo
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2005-07-12  8:18 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira
2005-06-27 20:26 ` Harald Welte
2005-06-28  2:00   ` Pablo Neira
2005-06-28  2:12     ` Pablo Neira
2005-06-28  2:15       ` Pablo Neira
2005-06-28  3:53     ` Patrick McHardy
2005-06-28  7:07       ` Harald Welte
2005-07-04 12:59       ` Amin Azez
2005-06-28  7:06     ` Harald Welte
2005-06-27 21:31 ` Patrick McHardy
2005-06-28  2:15   ` Pablo Neira
2005-06-28  3:56     ` Patrick McHardy
2005-06-27 22:40 ` Patrick McHardy
2005-06-28  2:16   ` Pablo Neira
2005-06-28  4:03     ` Patrick McHardy
2005-06-28  7:13   ` Harald Welte
2005-06-28 16:02     ` Patrick McHardy
2005-06-29 19:13       ` Pablo Neira
2005-06-29 19:52         ` Patrick McHardy
2005-06-29 20:16           ` Harald Welte
2005-06-30  0:27             ` Pablo Neira
2005-06-30  0:53               ` Patrick McHardy
2005-06-30  9:47                 ` Pablo Neira
2005-06-30 21:30                   ` Patrick McHardy
2005-06-30  0:34           ` Pablo Neira
2005-06-30  1:00             ` Patrick McHardy
2005-06-30  1:49               ` Thomas Graf
2005-06-30  1:53                 ` Patrick McHardy
2005-06-30 12:03                   ` Thomas Graf
2005-06-30 13:27                     ` Patrick McHardy
2005-06-30 18:02                       ` Thomas Graf
2005-06-30 21:26                         ` Patrick McHardy
2005-06-30 21:34                           ` Thomas Graf
2005-06-30 21:49                             ` David S. Miller
2005-06-30 22:08                               ` Thomas Graf
2005-06-30 22:08                                 ` David S. Miller
2005-06-30 17:06               ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira
2005-07-11 16:30                 ` Amin Azez
2005-07-11 16:50                   ` Jan Engelhardt
2005-07-11 17:11                     ` Harald Welte
2005-07-11 17:40                       ` Jan Engelhardt
2005-07-12  7:54                         ` Harald Welte
2005-07-11 17:10                   ` Harald Welte
2005-07-11 17:45                     ` Jan Engelhardt
2005-07-12  7:55                       ` Harald Welte
2005-07-12  8:18                     ` Amin Azez
2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson
2005-06-29 19:14   ` Pablo Neira
2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez
2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.