* [PATCH 1/2] updates for [nf|ct]netlink and event API
@ 2005-06-27 18:02 Pablo Neira
2005-06-27 20:26 ` Harald Welte
` (5 more replies)
0 siblings, 6 replies; 50+ messages in thread
From: Pablo Neira @ 2005-06-27 18:02 UTC (permalink / raw)
To: Netfilter Development Mailinglist; +Cc: Harald Welte
Hi Harald,
This patchset introduces tons of updates for the nfnetlink, ctnetlink
and the conntrack event API. I haven't attached the file since it's that
big, about 100K.
You can get an incremental diff against SVN from:
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch
Please apply.
I've split this big patch above into four pieces to make it easier to
understand the changes:
http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/
So these four patches shouldn't be applied, just they are meant to make
your life easier to track the changes.
Summary of changes
------------------
o conntrack event API
- Don't kill NFC_IP_* stuff, keep it there to ensure for old iptables
versions compilation.
- new file ip_conntrack_events.h that contains all event related
functions to reduce pollution in ip_conntrack.h
- IPCT_DELIVERED bit. Loopback reports event are reported twice, this
bit is set once event are delivered. I just came up with a better idea,
reset nfcache once the events have been delivered, but I'll apply this
change in the next patchset.
o nfnetlink
- kill unused list.
- kill nfnl_exlock(), not needed anymore.
- kill duplicated check: NFNL_SUBSYS_ID(type) > NFNL_SUBSYS_COUNT.
- kill unneeded initialization of subsys_table to NULL, since it's in
BSS section (already set to zero).
- kill dead define CONFIG_NF_NETLINK.
o ctnetlink
- merge ctnetlink_get_mcgroups and ctnetlink_get_exp_mcgroups
- implemented NAT handlings
- kill unused ctnetlink_kill
- use __u64 id's for conntracks
- stop using NLMSG_DONE to report the end of a dump, use explicite ACK
instead (NLM_F_ACK).
- fixed broken expectation timeout dumping.
- kill unused ctnetlink_exp_dump_proto
- kill ctnetlink_exp_dump: fairly small and just used once
- kill NFNL_SUBSYS_CTNETLINK_EXP, use NFNL_SUBSYS_CTNETLINK instead
- Fix expectation table dumping
- Fix expectation creation
- implemented flushing of the expect table
TODO
----
- Implement ip_conntrack_stats dumping and reset (accounting)
- Implement get conntrack and destroy (accounting)
- Kill event/dump mask based (?). Although it's unique, I think that it
could be useful for weak conntrack event notification (think of just
new, established and destroy event notification to reduce performance
impact).
Once ip_conntrack_netlink gets fully featured and people don't report
bugs for quite some time. I'll create a nf_conntrack_netlink tree.
--
Pablo
^ permalink raw reply [flat|nested] 50+ messages in thread* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira @ 2005-06-27 20:26 ` Harald Welte 2005-06-28 2:00 ` Pablo Neira 2005-06-27 21:31 ` Patrick McHardy ` (4 subsequent siblings) 5 siblings, 1 reply; 50+ messages in thread From: Harald Welte @ 2005-06-27 20:26 UTC (permalink / raw) To: Pablo Neira; +Cc: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 3514 bytes --] On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote: > Hi Harald, > > This patchset introduces tons of updates for the nfnetlink, ctnetlink > and the conntrack event API. I haven't attached the file since it's that > big, about 100K. > You can get an incremental diff against SVN from: > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch the patch removes the whole old ip_conntrack_netlink.c file and replaces it with a new one. I don't really see why. An, now, you changed the directory name from 2.6.11 to 2.6.12. Is this neccessarry? Why would your recent changes to ctnetlink not work with a 2.6.11 kernel? If there is no good reason, I would suggest to just patch the existing files in the 2.6.11 subdirectory. > I've split this big patch above into four pieces to make it easier to > understand the changes: > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/ please remove stuff like > -#if 0 > +#if 1 > #define DEBUGP printk before releasing a patch, thanks. Also, I would rather not like to have NFNL_SUBSYS_CTNETLINK_EXP deleted. I thought it would be nice to have expect handling separated from conntracks, since they really are two seperate data structures. Apart from that I'm fine with all of your modifications. I don't understand why the order of the expectation list was changed, though. Would you please explain why this was done? > - Implement ip_conntrack_stats dumping and reset (accounting) you want to dump the statistics via netlink? i'm not sure whether that is required. lnstat is the only program using those counters, and using /proc seems fine for that. > - Implement get conntrack and destroy (accounting) sorry, what are you referring to? > - Kill event/dump mask based (?). Although it's unique, I think that it > could be useful for weak conntrack event notification (think of just > new, established and destroy event notification to reduce performance > impact). well, an interesting optimization. My major problem with this whole system (like the other IPCTNL_MSG_CONFIG stuff) is: how to correctly deal with multipel users. Now, I think there is a way to do it right. Basically every application would only request the events that it is interested in, and the kernel would bitwise-or those events to create the event mask that actually is to be dumped into userspace. It's a bit like multicast membership subscription with IGMP... So the only question remaining is: how does the kernel clean up / expire the masks? If apps only subscribe but never unsubscribe, we would never clean up. I'm not sure if we could somehow (cleanly) tie into the scoket close/destroy code. If yes, we could check if we can reduce the mask at socket destroy time. Another approach would be timer-based, so applications would have to re-subscribe to 'their' groups in regular intervals if they are still interested in certain events. Sounds racy and ugly, though :( I'm always open for better solutions, though :) -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 20:26 ` Harald Welte @ 2005-06-28 2:00 ` Pablo Neira 2005-06-28 2:12 ` Pablo Neira ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Pablo Neira @ 2005-06-28 2:00 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, Patrick McHardy [-- Attachment #1: Type: text/plain, Size: 4509 bytes --] Harald Welte wrote: > the patch removes the whole old ip_conntrack_netlink.c file and replaces > it with a new one. I don't really see why. An, now, you changed the > directory name from 2.6.11 to 2.6.12. Is this neccessarry? Why would > your recent changes to ctnetlink not work with a 2.6.11 kernel? I did that because of the file linux-2.6.patch, the 2.6.11 version differs from the new 2.6.12 one. Can we have in pom-ng two different file patches for every version? I mean, something like linux-2.6.11.patch and linux-2.6.12.patch living in ctnetlink. If so, please add the patch attached to ctnetlink. >>I've split this big patch above into four pieces to make it easier to >>understand the changes: >>http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/ > > please remove stuff like > >>-#if 0 >>+#if 1 >> #define DEBUGP printk > > before releasing a patch, thanks. OK, I'll do so. > Also, I would rather not like to have NFNL_SUBSYS_CTNETLINK_EXP deleted. > I thought it would be nice to have expect handling separated from > conntracks, since they really are two seperate data structures. > > Apart from that I'm fine with all of your modifications. > > I don't understand why the order of the expectation list was changed, > though. Would you please explain why this was done? Because of the table dumping logic I was using. - if (ct->id <= *id) + if (ct->id >= *id) continue; I needed to add the conntracks at the tail. This isn't really needed so I've just reverted this change in the linux-2.6.12.patch attached. Please see that you have to apply the patch 05ctnetlink.patch to ctnetlink as well. >>- Implement ip_conntrack_stats dumping and reset (accounting) > > you want to dump the statistics via netlink? i'm not sure whether that > is required. lnstat is the only program using those counters, and using > /proc seems fine for that. Could this be useful for accounting purposes? I mean, dump and reset as a single operation, something similar to conntrack -L -z, say conntrack -S -z. >>- Implement get conntrack and destroy (accounting) > > sorry, what are you referring to? An atomical operation to dump conntrack information and then destroy it. For example, if someone does: conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp --orig-src-port 10 --orig-dst-port 20 the it sends the conntrack info and destroy it. This could be worth it for accounting. >>- Kill event/dump mask based (?). Although it's unique, I think that it >>could be useful for weak conntrack event notification (think of just >>new, established and destroy event notification to reduce performance >>impact). > > > well, an interesting optimization. My major problem with this whole > system (like the other IPCTNL_MSG_CONFIG stuff) is: how to correctly > deal with multipel users. Yes, this is a problem. > Now, I think there is a way to do it right. Basically every application > would only request the events that it is interested in, and the kernel > would bitwise-or those events to create the event mask that actually is > to be dumped into userspace. > > It's a bit like multicast membership subscription with IGMP... > > So the only question remaining is: how does the kernel clean up / expire > the masks? If apps only subscribe but never unsubscribe, we would never > clean up. > > I'm not sure if we could somehow (cleanly) tie into the scoket > close/destroy code. If yes, we could check if we can reduce the mask at > socket destroy time. I think there's no sane way to do this but let me check the netlink code again. I could use multicast netlink subscrition, so the client can get subscribed to those events that the user considers interesting. Initially the purpose of the mask based event notification was reducing the impact of an socket overrun under big stress situations. Say you've got a firewall very loaded, so conntrack-netlink subsys has to send tons of update messages, in that case the buffer can overrun and some messages will be drop. To avoid this I could use an approach similar to what you do in ULOG and just keep the mask in userspace. Anyway, this hurts performance as well, since unneeded messages will be delivered to userspace and dropped if the client doesn't consider them interesting. This solution isn't that bad though since the conntrack-netlink subsys will send burst of messages in once. -- Pablo [-- Attachment #2: linux-2.6.12.patch --] [-- Type: text/x-patch, Size: 26285 bytes --] Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_amanda.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c 2005-06-28 02:28:44.000000000 +0200 @@ -151,6 +151,7 @@ .mask = { .src = { .u = { 0xFFFF } }, .dst = { .protonum = 0xFF }, }, + .change_help = ip_ct_generic_change_help, }; static void __exit fini(void) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-28 02:44:45.000000000 +0200 @@ -76,6 +76,8 @@ static LIST_HEAD(unconfirmed); static int ip_conntrack_vmalloc; +static unsigned int ip_conntrack_next_id = 1; +static unsigned int ip_conntrack_expect_next_id = 1; #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS struct notifier_block *ip_conntrack_chain; struct notifier_block *ip_conntrack_expect_chain; @@ -83,13 +85,6 @@ DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat); -void -ip_conntrack_put(struct ip_conntrack *ct) -{ - IP_NF_ASSERT(ct); - nf_conntrack_put(&ct->ct_general); -} - static int ip_conntrack_hash_rnd_initted; static unsigned int ip_conntrack_hash_rnd; @@ -146,7 +141,7 @@ { ip_conntrack_put(exp->master); IP_NF_ASSERT(!timer_pending(&exp->timeout)); - kmem_cache_free(ip_conntrack_expect_cachep, exp); + ip_conntrack_expect_free(exp); CONNTRACK_STAT_INC(expect_delete); } @@ -158,6 +153,13 @@ exp->master->expecting--; } +void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp) +{ + MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); + unlink_expect(exp); + destroy_expect(exp); +} + static void expectation_timed_out(unsigned long ul_expect) { struct ip_conntrack_expect *exp = (void *)ul_expect; @@ -169,6 +171,33 @@ destroy_expect(exp); } +struct ip_conntrack_expect * +__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple) +{ + struct ip_conntrack_expect *i; + + list_for_each_entry(i, &ip_conntrack_expect_list, list) { + if (ip_ct_tuple_mask_cmp(tuple, &i->tuple, &i->mask)) { + atomic_inc(&i->use); + return i; + } + } + return NULL; +} + +/* Just find a expectation corresponding to a tuple. */ +struct ip_conntrack_expect * +ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple) +{ + struct ip_conntrack_expect *i; + + READ_LOCK(&ip_conntrack_lock); + i = __ip_conntrack_expect_find(tuple); + READ_UNLOCK(&ip_conntrack_lock); + + return i; +} + /* If an expectation for this connection is found, it gets delete from * global list then returned. */ static struct ip_conntrack_expect * @@ -193,7 +222,7 @@ } /* delete all expectations for this conntrack */ -static void remove_expectations(struct ip_conntrack *ct) +void ip_ct_remove_expectations(struct ip_conntrack *ct) { struct ip_conntrack_expect *i, *tmp; @@ -223,7 +252,7 @@ LIST_DELETE(&ip_conntrack_hash[hr], &ct->tuplehash[IP_CT_DIR_REPLY]); /* Destroy all pending expectations */ - remove_expectations(ct); + ip_ct_remove_expectations(ct); } static void @@ -253,7 +282,7 @@ * except TFTP can create an expectation on the first packet, * before connection is in the list, so we need to clean here, * too. */ - remove_expectations(ct); + ip_ct_remove_expectations(ct); /* We overload first tuple to link into unconfirmed list. */ if (!is_confirmed(ct)) { @@ -268,8 +297,7 @@ ip_conntrack_put(ct->master); DEBUGP("destroy_conntrack: returning ct=%p to slab\n", ct); - kmem_cache_free(ip_conntrack_cachep, ct); - atomic_dec(&ip_conntrack_count); + ip_conntrack_free(ct); } static void death_by_timeout(unsigned long ul_conntrack) @@ -296,7 +324,7 @@ && ip_ct_tuple_equal(tuple, &i->tuple); } -static struct ip_conntrack_tuple_hash * +struct ip_conntrack_tuple_hash * __ip_conntrack_find(const struct ip_conntrack_tuple *tuple, const struct ip_conntrack *ignored_conntrack) { @@ -331,6 +359,27 @@ return h; } +static void __ip_conntrack_hash_insert(struct ip_conntrack *ct) +{ + size_t hash, repl_hash; + + hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); + repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple); + + list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, + &ip_conntrack_hash[hash]); + list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list, + &ip_conntrack_hash[repl_hash]); + ct->id = ++ip_conntrack_next_id; +} + +void ip_conntrack_hash_insert(struct ip_conntrack *ct) +{ + WRITE_LOCK(&ip_conntrack_lock); + __ip_conntrack_hash_insert(ct); + WRITE_UNLOCK(&ip_conntrack_lock); +} + /* Confirm a connection given skb; places it in hash table */ int __ip_conntrack_confirm(struct sk_buff **pskb) @@ -377,10 +426,10 @@ /* Remove from unconfirmed list */ list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list); - list_prepend(&ip_conntrack_hash[hash], - &ct->tuplehash[IP_CT_DIR_ORIGINAL]); - list_prepend(&ip_conntrack_hash[repl_hash], - &ct->tuplehash[IP_CT_DIR_REPLY]); + list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, + &ip_conntrack_hash[hash]); + list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list, + &ip_conntrack_hash[repl_hash]); /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in weird delay cases. */ @@ -398,6 +447,7 @@ #endif ip_conntrack_event_cache(master_ct(ct) ? IPCT_RELATED : IPCT_NEW, *pskb); + ct->id = ++ip_conntrack_next_id; return NF_ACCEPT; } @@ -432,7 +482,7 @@ static int early_drop(struct list_head *chain) { - /* Traverse forwards: gives us oldest, which is roughly LRU */ + /* Traverse backwards: gives us oldest, which is roughly LRU */ struct ip_conntrack_tuple_hash *h; struct ip_conntrack *ct = NULL; int dropped = 0; @@ -463,7 +513,7 @@ return ip_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask); } -static struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple) +struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple) { return LIST_FIND(&helpers, helper_cmp, struct ip_conntrack_helper *, @@ -472,22 +522,18 @@ /* Allocate a new conntrack: we return -ENOMEM if classification failed due to stress. Otherwise it really is unclassifiable. */ -static struct ip_conntrack_tuple_hash * -init_conntrack(const struct ip_conntrack_tuple *tuple, - struct ip_conntrack_protocol *protocol, - struct sk_buff *skb) +struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *repl) { struct ip_conntrack *conntrack; - struct ip_conntrack_tuple repl_tuple; size_t hash; - struct ip_conntrack_expect *exp; if (!ip_conntrack_hash_rnd_initted) { get_random_bytes(&ip_conntrack_hash_rnd, 4); ip_conntrack_hash_rnd_initted = 1; } - hash = hash_conntrack(tuple); + hash = hash_conntrack(orig); if (ip_conntrack_max && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) { @@ -501,11 +547,6 @@ } } - if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) { - DEBUGP("Can't invert tuple.\n"); - return NULL; - } - conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC); if (!conntrack) { DEBUGP("Can't allocate conntrack.\n"); @@ -515,17 +556,51 @@ memset(conntrack, 0, sizeof(*conntrack)); atomic_set(&conntrack->ct_general.use, 1); conntrack->ct_general.destroy = destroy_conntrack; - conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple; - conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = repl_tuple; - if (!protocol->new(conntrack, skb)) { - kmem_cache_free(ip_conntrack_cachep, conntrack); - return NULL; - } + conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig; + conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *repl; + /* Don't set timer yet: wait for confirmation */ init_timer(&conntrack->timeout); conntrack->timeout.data = (unsigned long)conntrack; conntrack->timeout.function = death_by_timeout; + atomic_inc(&ip_conntrack_count); + + return conntrack; +} + +void +ip_conntrack_free(struct ip_conntrack *conntrack) +{ + atomic_dec(&ip_conntrack_count); + kmem_cache_free(ip_conntrack_cachep, conntrack); +} + +static struct ip_conntrack_tuple_hash * +init_conntrack(struct ip_conntrack_tuple *tuple, + struct ip_conntrack_protocol *protocol, + struct sk_buff *skb) +{ + struct ip_conntrack *conntrack; + struct ip_conntrack_tuple repl_tuple; + struct ip_conntrack_expect *exp; + + if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) { + DEBUGP("Can't invert tuple.\n"); + return NULL; + } + + if (!(conntrack = ip_conntrack_alloc(tuple, &repl_tuple))) + return NULL; + + if (IS_ERR(conntrack)) + return (void *) conntrack; + + if (!protocol->new(conntrack, skb)) { + ip_conntrack_free(conntrack); + return NULL; + } + WRITE_LOCK(&ip_conntrack_lock); exp = find_expectation(tuple); @@ -549,7 +624,6 @@ /* Overload tuple linked list to put us in unconfirmed list. */ list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed); - atomic_inc(&ip_conntrack_count); WRITE_UNLOCK(&ip_conntrack_lock); if (exp) { @@ -765,13 +839,15 @@ DEBUGP("expect_related: OOM allocating expect\n"); return NULL; } + atomic_set(&new->use, 0); new->master = NULL; return new; } void ip_conntrack_expect_free(struct ip_conntrack_expect *expect) { - kmem_cache_free(ip_conntrack_expect_cachep, expect); + if (atomic_dec_and_test(&expect->use)) + kmem_cache_free(ip_conntrack_expect_cachep, expect); } static void ip_conntrack_expect_insert(struct ip_conntrack_expect *exp) @@ -790,6 +866,9 @@ } else exp->timeout.function = NULL; + exp->id = ++ip_conntrack_expect_next_id; + + atomic_inc(&exp->use); CONNTRACK_STAT_INC(expect_create); } @@ -1018,6 +1097,32 @@ nf_conntrack_get(nskb->nfct); } +void ip_ct_generic_change_proto(struct ip_conntrack *ct, + union ip_conntrack_proto *p) +{ + struct ip_conntrack_protocol *proto; + struct ip_conntrack_tuple_hash *th = &ct->tuplehash[IP_CT_DIR_REPLY]; + + proto = ip_ct_find_proto(th->tuple.dst.protonum); + if (proto->lock != NULL) { + write_lock_bh(proto->lock); + memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto)); + write_unlock_bh(proto->lock); + } else + memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto)); +} + +void ip_ct_generic_change_help(struct ip_conntrack *ct, + union ip_conntrack_help *h) +{ + if (ct->helper->lock != NULL) { + spin_lock_bh(ct->helper->lock); + memcpy(&ct->help, h, sizeof(ct->help)); + spin_unlock_bh(ct->helper->lock); + } else + memcpy(&ct->help, h, sizeof(ct->help)); +} + static inline int do_iter(const struct ip_conntrack_tuple_hash *i, int (*iter)(struct ip_conntrack *i, void *data), @@ -1144,23 +1249,27 @@ * ip_conntrack_htable_size)); } -/* Mishearing the voices in his head, our hero wonders how he's - supposed to kill the mall. */ -void ip_conntrack_cleanup(void) +void ip_conntrack_flush() { - ip_ct_attach = NULL; /* This makes sure all current packets have passed through netfilter framework. Roll on, two-stage module delete... */ synchronize_net(); - + i_see_dead_people: ip_ct_iterate_cleanup(kill_all, NULL); if (atomic_read(&ip_conntrack_count) != 0) { schedule(); goto i_see_dead_people; } +} +/* Mishearing the voices in his head, our hero wonders how he's + supposed to kill the mall. */ +void ip_conntrack_cleanup(void) +{ + ip_ct_attach = NULL; + ip_conntrack_flush(); kmem_cache_destroy(ip_conntrack_cachep); kmem_cache_destroy(ip_conntrack_expect_cachep); free_conntrack_hash(); @@ -1258,6 +1367,8 @@ /* - and look it like as a confirmed connection */ set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status); + printk("untracked: %p\n", &ip_conntrack_untracked); + return ret; err_free_conntrack_slab: Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-06-28 02:28:44.000000000 +0200 @@ -481,6 +481,8 @@ ftp[i].timeout = 5 * 60; /* 5 minutes */ ftp[i].me = THIS_MODULE; ftp[i].help = help; + ftp[i].lock = &ip_ftp_lock; + ftp[i].change_help = ip_ct_generic_change_help; tmpname = &ftp_names[i][0]; if (ports[i] == FTP_PORT) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_irc.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c 2005-06-28 02:28:44.000000000 +0200 @@ -275,6 +275,8 @@ hlpr->timeout = dcc_timeout; hlpr->me = THIS_MODULE; hlpr->help = help; + hlpr->lock = &irc_buffer_lock; + hlpr->change_help = ip_ct_generic_change_help; tmpname = &irc_names[i][0]; if (ports[i] == IRC_PORT) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-06-28 02:28:44.000000000 +0200 @@ -109,16 +109,17 @@ return NF_ACCEPT; } +static u_int8_t valid_new[] = { + [ICMP_ECHO] = 1, + [ICMP_TIMESTAMP] = 1, + [ICMP_INFO_REQUEST] = 1, + [ICMP_ADDRESS] = 1 +}; + /* Called when a new connection for this protocol found. */ static int icmp_new(struct ip_conntrack *conntrack, const struct sk_buff *skb) { - static u_int8_t valid_new[] - = { [ICMP_ECHO] = 1, - [ICMP_TIMESTAMP] = 1, - [ICMP_INFO_REQUEST] = 1, - [ICMP_ADDRESS] = 1 }; - if (conntrack->tuplehash[0].tuple.dst.u.icmp.type >= sizeof(valid_new) || !valid_new[conntrack->tuplehash[0].tuple.dst.u.icmp.type]) { /* Can't create a new ICMP `conn' with this. */ @@ -266,6 +267,17 @@ return icmp_error_message(skb, ctinfo, hooknum); } +static int icmp_change_check_tuples(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *reply) +{ + unsigned int type = orig->dst.u.icmp.type; + + if (type >= sizeof(valid_new) || !valid_new[type]) + return -EINVAL; + + return 0; +} + struct ip_conntrack_protocol ip_conntrack_protocol_icmp = { .proto = IPPROTO_ICMP, @@ -277,4 +289,6 @@ .packet = icmp_packet, .new = icmp_new, .error = icmp_error, + .change_check_tuples = icmp_change_check_tuples, + .change_proto = ip_ct_generic_change_proto, }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-06-28 02:28:44.000000000 +0200 @@ -499,6 +499,7 @@ static struct ip_conntrack_protocol ip_conntrack_protocol_sctp = { .proto = IPPROTO_SCTP, .name = "sctp", + .lock = &sctp_lock, .pkt_to_tuple = sctp_pkt_to_tuple, .invert_tuple = sctp_invert_tuple, .print_tuple = sctp_print_tuple, @@ -506,7 +507,8 @@ .packet = sctp_packet, .new = sctp_new, .destroy = NULL, - .me = THIS_MODULE + .me = THIS_MODULE, + .change_proto = ip_ct_generic_change_proto, }; #ifdef CONFIG_SYSCTL Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-06-28 02:28:44.000000000 +0200 @@ -1094,6 +1094,7 @@ { .proto = IPPROTO_TCP, .name = "tcp", + .lock = &tcp_lock, .pkt_to_tuple = tcp_pkt_to_tuple, .invert_tuple = tcp_invert_tuple, .print_tuple = tcp_print_tuple, @@ -1101,4 +1102,5 @@ .packet = tcp_packet, .new = tcp_new, .error = tcp_error, + .change_proto = ip_ct_generic_change_proto }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-06-28 02:28:44.000000000 +0200 @@ -144,4 +144,5 @@ .packet = udp_packet, .new = udp_new, .error = udp_error, + .change_proto = ip_ct_generic_change_proto }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-28 02:28:44.000000000 +0200 @@ -963,6 +963,19 @@ { } +EXPORT_SYMBOL(ip_ct_unlink_destroy_expect); +EXPORT_SYMBOL(ip_conntrack_flush); +EXPORT_SYMBOL(__ip_conntrack_expect_find); +EXPORT_SYMBOL(__ip_conntrack_find); +EXPORT_SYMBOL(ip_conntrack_expect_list); +EXPORT_SYMBOL(ip_ct_generic_change_proto); +EXPORT_SYMBOL(ip_ct_generic_change_help); +EXPORT_SYMBOL(ip_conntrack_expect_find_get); +EXPORT_SYMBOL(ip_conntrack_alloc); +EXPORT_SYMBOL(ip_conntrack_free); +EXPORT_SYMBOL(ip_conntrack_hash_insert); +EXPORT_SYMBOL(ip_ct_remove_expectations); +EXPORT_SYMBOL(ip_ct_find_helper); #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS EXPORT_SYMBOL(ip_conntrack_chain); EXPORT_SYMBOL(ip_conntrack_expect_chain); @@ -993,7 +1006,6 @@ EXPORT_SYMBOL(ip_conntrack_hash); EXPORT_SYMBOL(ip_conntrack_untracked); EXPORT_SYMBOL_GPL(ip_conntrack_find_get); -EXPORT_SYMBOL_GPL(ip_conntrack_put); #ifdef CONFIG_IP_NF_NAT_NEEDED EXPORT_SYMBOL(ip_conntrack_tcp_update); #endif Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_tftp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c 2005-06-28 02:28:44.000000000 +0200 @@ -134,6 +134,7 @@ tftp[i].timeout = 5 * 60; /* 5 minutes */ tftp[i].me = THIS_MODULE; tftp[i].help = tftp_help; + tftp[i].change_help = ip_ct_generic_change_help; tmpname = &tftp_names[i][0]; if (ports[i] == TFTP_PORT) Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_amanda.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h 2005-06-28 02:28:44.000000000 +0200 @@ -3,9 +3,11 @@ /* AMANDA tracking. */ struct ip_conntrack_expect; +#ifdef __KERNEL__ extern unsigned int (*ip_nat_amanda_hook)(struct sk_buff **pskb, enum ip_conntrack_info ctinfo, unsigned int matchoff, unsigned int matchlen, struct ip_conntrack_expect *exp); +#endif #endif /* _IP_CONNTRACK_AMANDA_H */ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_ftp.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h 2005-06-28 02:28:44.000000000 +0200 @@ -31,6 +31,7 @@ struct ip_conntrack_expect; +#ifdef __KERNEL__ /* For NAT to hook in when we find a packet which describes what other * connection we should expect. */ extern unsigned int (*ip_nat_ftp_hook)(struct sk_buff **pskb, @@ -40,4 +41,5 @@ unsigned int matchlen, struct ip_conntrack_expect *exp, u32 *seq); +#endif #endif /* _IP_CONNTRACK_FTP_H */ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-28 02:28:44.000000000 +0200 @@ -71,13 +71,13 @@ IPS_DESTROYED = (1 << IPS_DESTROYED_BIT), }; -#ifdef __KERNEL__ -#include <linux/config.h> -#include <linux/netfilter_ipv4/ip_conntrack_tuple.h> -#include <linux/bitops.h> -#include <linux/compiler.h> -#include <asm/atomic.h> +struct ip_conntrack_counter +{ + u_int64_t packets; + u_int64_t bytes; +}; +#include <linux/netfilter_ipv4/ip_conntrack_tuple.h> #include <linux/netfilter_ipv4/ip_conntrack_tcp.h> #include <linux/netfilter_ipv4/ip_conntrack_icmp.h> #include <linux/netfilter_ipv4/ip_conntrack_sctp.h> @@ -106,6 +106,7 @@ struct ip_ct_irc_master ct_irc_info; }; +#ifdef __KERNEL__ #ifdef CONFIG_IP_NF_NAT_NEEDED #include <linux/netfilter_ipv4/ip_nat.h> #endif @@ -126,12 +127,6 @@ #define IP_NF_ASSERT(x) #endif -struct ip_conntrack_counter -{ - u_int64_t packets; - u_int64_t bytes; -}; - struct ip_conntrack_helper; struct ip_conntrack @@ -140,6 +135,9 @@ plus 1 for any connection(s) we are `master' for */ struct nf_conntrack ct_general; + /* Unique ID that identifies this conntrack*/ + u_int64_t id; + /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; @@ -188,6 +186,9 @@ /* Internal linked list (global expectation list) */ struct list_head list; + /* Expectation ID */ + __u64 id; + /* We expect this tuple, with the following mask */ struct ip_conntrack_tuple tuple, mask; @@ -201,6 +202,8 @@ /* Timer function; deletes the expectation. */ struct timer_list timeout; + atomic_t use; + #ifdef CONFIG_IP_NF_NAT_NEEDED /* This is the original per-proto part, used to map the * expected connection the way the recipient expects. */ @@ -240,7 +243,12 @@ } /* decrement reference count on a conntrack */ -extern inline void ip_conntrack_put(struct ip_conntrack *ct); +static inline void +ip_conntrack_put(struct ip_conntrack *ct) +{ + IP_NF_ASSERT(ct); + nf_conntrack_put(&ct->ct_general); +} /* call to create an explicit dependency on ip_conntrack. */ extern void need_ip_conntrack(void); @@ -275,6 +283,34 @@ ip_ct_iterate_cleanup(int (*iter)(struct ip_conntrack *i, void *data), void *data); +extern struct ip_conntrack_helper * +ip_ct_find_helper(const struct ip_conntrack_tuple *tuple); + +extern void ip_ct_remove_expectations(struct ip_conntrack *ct); + +extern struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *, + struct ip_conntrack_tuple *); + +extern void ip_conntrack_free(struct ip_conntrack *ct); + +extern void ip_conntrack_hash_insert(struct ip_conntrack *ct); + +extern struct ip_conntrack_expect * +__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple); + +extern struct ip_conntrack_expect * +ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple); + +extern struct ip_conntrack_tuple_hash * +__ip_conntrack_find(const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack *ignored_conntrack); + +extern inline void ip_conntrack_expect_put(struct ip_conntrack_expect *exp); + +extern void ip_conntrack_flush(void); + +extern void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp); + /* It's confirmed if it is, or has been in the hash table. */ static inline int is_confirmed(struct ip_conntrack *ct) { Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_helper.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h 2005-06-28 02:28:44.000000000 +0200 @@ -9,6 +9,8 @@ { struct list_head list; /* Internal use. */ + spinlock_t *lock; /* protect private info and buffer */ + const char *name; /* name of the module */ struct module *me; /* pointer to self */ unsigned int max_expected; /* Maximum number of concurrent @@ -24,6 +26,8 @@ int (*help)(struct sk_buff **pskb, struct ip_conntrack *ct, enum ip_conntrack_info conntrackinfo); + + void (*change_help)(struct ip_conntrack *, union ip_conntrack_help *); }; extern int ip_conntrack_helper_register(struct ip_conntrack_helper *); @@ -38,4 +42,7 @@ extern int ip_conntrack_expect_related(struct ip_conntrack_expect *exp); extern void ip_conntrack_unexpect_related(struct ip_conntrack_expect *exp); +extern void ip_ct_generic_change_help(struct ip_conntrack *ct, + union ip_conntrack_help *h); + #endif /*_IP_CONNTRACK_HELPER_H*/ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-06-28 02:28:44.000000000 +0200 @@ -10,6 +10,8 @@ /* Protocol number. */ u_int8_t proto; + rwlock_t *lock; + /* Protocol name */ const char *name; @@ -47,6 +49,17 @@ int (*error)(struct sk_buff *skb, enum ip_conntrack_info *ctinfo, unsigned int hooknum); + /* check if tuples are valid for a new connection */ + int (*change_check_tuples)(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *reply); + + /* check protocol data is valid */ + int (*change_check_proto)(union ip_conntrack_proto *p); + + /* change protocol info on behalf of ctnetlink */ + void (*change_proto)(struct ip_conntrack *ct, + union ip_conntrack_proto *p); + /* Module (if any) which this is connected to. */ struct module *me; }; @@ -57,6 +70,8 @@ /* Protocol registration. */ extern int ip_conntrack_protocol_register(struct ip_conntrack_protocol *proto); extern void ip_conntrack_protocol_unregister(struct ip_conntrack_protocol *proto); +extern void ip_ct_generic_change_proto(struct ip_conntrack *conntrack, + union ip_conntrack_proto *p); static inline struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol) { [-- Attachment #3: 05ctnetlink.patch --] [-- Type: text/x-patch, Size: 1171 bytes --] Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2005-06-28 02:40:30.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c 2005-06-28 02:41:10.000000000 +0200 @@ -455,7 +455,7 @@ if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = tuplehash_to_ctrack(h); - if (ct->id <= *id) + if (ct->id >= *id) continue; if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -491,7 +491,7 @@ if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = tuplehash_to_ctrack(h); - if (ct->id <= *id) + if (ct->id >= *id) continue; if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -1060,7 +1060,7 @@ READ_LOCK(&ip_conntrack_lock); list_for_each(i, &ip_conntrack_expect_list) { exp = (struct ip_conntrack_expect *) i; - if (exp->id <= *id) + if (exp->id >= *id) continue; if (ctnetlink_exp_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:00 ` Pablo Neira @ 2005-06-28 2:12 ` Pablo Neira 2005-06-28 2:15 ` Pablo Neira 2005-06-28 3:53 ` Patrick McHardy 2005-06-28 7:06 ` Harald Welte 2 siblings, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-28 2:12 UTC (permalink / raw) To: Pablo Neira Cc: Harald Welte, Netfilter Development Mailinglist, Patrick McHardy [-- Attachment #1: Type: text/plain, Size: 758 bytes --] Pablo Neira wrote: > Harald Welte wrote: >> I don't understand why the order of the expectation list was changed, >> though. Would you please explain why this was done? > > > Because of the table dumping logic I was using. > > - if (ct->id <= *id) > + if (ct->id >= *id) > continue; Please ignore the patches I've sent you. I need to insert conntracks and expectations at the tail because of the nature of netlink dumping. *id (that points to cb->args[1]) will be initialized to zero, so I need to insert conntracks ordered from lesser to bigger id's, otherwise I'd need to set *id to ~0UL and that's not possible. Please, add the following patch to ctnetlink. -- Pablo [-- Attachment #2: linux-2.6.12.patch --] [-- Type: text/x-patch, Size: 26285 bytes --] Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_amanda.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_amanda.c 2005-06-28 02:28:44.000000000 +0200 @@ -151,6 +151,7 @@ .mask = { .src = { .u = { 0xFFFF } }, .dst = { .protonum = 0xFF }, }, + .change_help = ip_ct_generic_change_help, }; static void __exit fini(void) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_core.c 2005-06-28 02:44:45.000000000 +0200 @@ -76,6 +76,8 @@ static LIST_HEAD(unconfirmed); static int ip_conntrack_vmalloc; +static unsigned int ip_conntrack_next_id = 1; +static unsigned int ip_conntrack_expect_next_id = 1; #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS struct notifier_block *ip_conntrack_chain; struct notifier_block *ip_conntrack_expect_chain; @@ -83,13 +85,6 @@ DEFINE_PER_CPU(struct ip_conntrack_stat, ip_conntrack_stat); -void -ip_conntrack_put(struct ip_conntrack *ct) -{ - IP_NF_ASSERT(ct); - nf_conntrack_put(&ct->ct_general); -} - static int ip_conntrack_hash_rnd_initted; static unsigned int ip_conntrack_hash_rnd; @@ -146,7 +141,7 @@ { ip_conntrack_put(exp->master); IP_NF_ASSERT(!timer_pending(&exp->timeout)); - kmem_cache_free(ip_conntrack_expect_cachep, exp); + ip_conntrack_expect_free(exp); CONNTRACK_STAT_INC(expect_delete); } @@ -158,6 +153,13 @@ exp->master->expecting--; } +void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp) +{ + MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); + unlink_expect(exp); + destroy_expect(exp); +} + static void expectation_timed_out(unsigned long ul_expect) { struct ip_conntrack_expect *exp = (void *)ul_expect; @@ -169,6 +171,33 @@ destroy_expect(exp); } +struct ip_conntrack_expect * +__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple) +{ + struct ip_conntrack_expect *i; + + list_for_each_entry(i, &ip_conntrack_expect_list, list) { + if (ip_ct_tuple_mask_cmp(tuple, &i->tuple, &i->mask)) { + atomic_inc(&i->use); + return i; + } + } + return NULL; +} + +/* Just find a expectation corresponding to a tuple. */ +struct ip_conntrack_expect * +ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple) +{ + struct ip_conntrack_expect *i; + + READ_LOCK(&ip_conntrack_lock); + i = __ip_conntrack_expect_find(tuple); + READ_UNLOCK(&ip_conntrack_lock); + + return i; +} + /* If an expectation for this connection is found, it gets delete from * global list then returned. */ static struct ip_conntrack_expect * @@ -193,7 +222,7 @@ } /* delete all expectations for this conntrack */ -static void remove_expectations(struct ip_conntrack *ct) +void ip_ct_remove_expectations(struct ip_conntrack *ct) { struct ip_conntrack_expect *i, *tmp; @@ -223,7 +252,7 @@ LIST_DELETE(&ip_conntrack_hash[hr], &ct->tuplehash[IP_CT_DIR_REPLY]); /* Destroy all pending expectations */ - remove_expectations(ct); + ip_ct_remove_expectations(ct); } static void @@ -253,7 +282,7 @@ * except TFTP can create an expectation on the first packet, * before connection is in the list, so we need to clean here, * too. */ - remove_expectations(ct); + ip_ct_remove_expectations(ct); /* We overload first tuple to link into unconfirmed list. */ if (!is_confirmed(ct)) { @@ -268,8 +297,7 @@ ip_conntrack_put(ct->master); DEBUGP("destroy_conntrack: returning ct=%p to slab\n", ct); - kmem_cache_free(ip_conntrack_cachep, ct); - atomic_dec(&ip_conntrack_count); + ip_conntrack_free(ct); } static void death_by_timeout(unsigned long ul_conntrack) @@ -296,7 +324,7 @@ && ip_ct_tuple_equal(tuple, &i->tuple); } -static struct ip_conntrack_tuple_hash * +struct ip_conntrack_tuple_hash * __ip_conntrack_find(const struct ip_conntrack_tuple *tuple, const struct ip_conntrack *ignored_conntrack) { @@ -331,6 +359,27 @@ return h; } +static void __ip_conntrack_hash_insert(struct ip_conntrack *ct) +{ + size_t hash, repl_hash; + + hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple); + repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple); + + list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, + &ip_conntrack_hash[hash]); + list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list, + &ip_conntrack_hash[repl_hash]); + ct->id = ++ip_conntrack_next_id; +} + +void ip_conntrack_hash_insert(struct ip_conntrack *ct) +{ + WRITE_LOCK(&ip_conntrack_lock); + __ip_conntrack_hash_insert(ct); + WRITE_UNLOCK(&ip_conntrack_lock); +} + /* Confirm a connection given skb; places it in hash table */ int __ip_conntrack_confirm(struct sk_buff **pskb) @@ -377,10 +426,10 @@ /* Remove from unconfirmed list */ list_del(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list); - list_prepend(&ip_conntrack_hash[hash], - &ct->tuplehash[IP_CT_DIR_ORIGINAL]); - list_prepend(&ip_conntrack_hash[repl_hash], - &ct->tuplehash[IP_CT_DIR_REPLY]); + list_add(&ct->tuplehash[IP_CT_DIR_ORIGINAL].list, + &ip_conntrack_hash[hash]); + list_add(&ct->tuplehash[IP_CT_DIR_REPLY].list, + &ip_conntrack_hash[repl_hash]); /* Timer relative to confirmation time, not original setting time, otherwise we'd get timer wrap in weird delay cases. */ @@ -398,6 +447,7 @@ #endif ip_conntrack_event_cache(master_ct(ct) ? IPCT_RELATED : IPCT_NEW, *pskb); + ct->id = ++ip_conntrack_next_id; return NF_ACCEPT; } @@ -432,7 +482,7 @@ static int early_drop(struct list_head *chain) { - /* Traverse forwards: gives us oldest, which is roughly LRU */ + /* Traverse backwards: gives us oldest, which is roughly LRU */ struct ip_conntrack_tuple_hash *h; struct ip_conntrack *ct = NULL; int dropped = 0; @@ -463,7 +513,7 @@ return ip_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask); } -static struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple) +struct ip_conntrack_helper *ip_ct_find_helper(const struct ip_conntrack_tuple *tuple) { return LIST_FIND(&helpers, helper_cmp, struct ip_conntrack_helper *, @@ -472,22 +522,18 @@ /* Allocate a new conntrack: we return -ENOMEM if classification failed due to stress. Otherwise it really is unclassifiable. */ -static struct ip_conntrack_tuple_hash * -init_conntrack(const struct ip_conntrack_tuple *tuple, - struct ip_conntrack_protocol *protocol, - struct sk_buff *skb) +struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *repl) { struct ip_conntrack *conntrack; - struct ip_conntrack_tuple repl_tuple; size_t hash; - struct ip_conntrack_expect *exp; if (!ip_conntrack_hash_rnd_initted) { get_random_bytes(&ip_conntrack_hash_rnd, 4); ip_conntrack_hash_rnd_initted = 1; } - hash = hash_conntrack(tuple); + hash = hash_conntrack(orig); if (ip_conntrack_max && atomic_read(&ip_conntrack_count) >= ip_conntrack_max) { @@ -501,11 +547,6 @@ } } - if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) { - DEBUGP("Can't invert tuple.\n"); - return NULL; - } - conntrack = kmem_cache_alloc(ip_conntrack_cachep, GFP_ATOMIC); if (!conntrack) { DEBUGP("Can't allocate conntrack.\n"); @@ -515,17 +556,51 @@ memset(conntrack, 0, sizeof(*conntrack)); atomic_set(&conntrack->ct_general.use, 1); conntrack->ct_general.destroy = destroy_conntrack; - conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *tuple; - conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = repl_tuple; - if (!protocol->new(conntrack, skb)) { - kmem_cache_free(ip_conntrack_cachep, conntrack); - return NULL; - } + conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig; + conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *repl; + /* Don't set timer yet: wait for confirmation */ init_timer(&conntrack->timeout); conntrack->timeout.data = (unsigned long)conntrack; conntrack->timeout.function = death_by_timeout; + atomic_inc(&ip_conntrack_count); + + return conntrack; +} + +void +ip_conntrack_free(struct ip_conntrack *conntrack) +{ + atomic_dec(&ip_conntrack_count); + kmem_cache_free(ip_conntrack_cachep, conntrack); +} + +static struct ip_conntrack_tuple_hash * +init_conntrack(struct ip_conntrack_tuple *tuple, + struct ip_conntrack_protocol *protocol, + struct sk_buff *skb) +{ + struct ip_conntrack *conntrack; + struct ip_conntrack_tuple repl_tuple; + struct ip_conntrack_expect *exp; + + if (!ip_ct_invert_tuple(&repl_tuple, tuple, protocol)) { + DEBUGP("Can't invert tuple.\n"); + return NULL; + } + + if (!(conntrack = ip_conntrack_alloc(tuple, &repl_tuple))) + return NULL; + + if (IS_ERR(conntrack)) + return (void *) conntrack; + + if (!protocol->new(conntrack, skb)) { + ip_conntrack_free(conntrack); + return NULL; + } + WRITE_LOCK(&ip_conntrack_lock); exp = find_expectation(tuple); @@ -549,7 +624,6 @@ /* Overload tuple linked list to put us in unconfirmed list. */ list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed); - atomic_inc(&ip_conntrack_count); WRITE_UNLOCK(&ip_conntrack_lock); if (exp) { @@ -765,13 +839,15 @@ DEBUGP("expect_related: OOM allocating expect\n"); return NULL; } + atomic_set(&new->use, 0); new->master = NULL; return new; } void ip_conntrack_expect_free(struct ip_conntrack_expect *expect) { - kmem_cache_free(ip_conntrack_expect_cachep, expect); + if (atomic_dec_and_test(&expect->use)) + kmem_cache_free(ip_conntrack_expect_cachep, expect); } static void ip_conntrack_expect_insert(struct ip_conntrack_expect *exp) @@ -790,6 +866,9 @@ } else exp->timeout.function = NULL; + exp->id = ++ip_conntrack_expect_next_id; + + atomic_inc(&exp->use); CONNTRACK_STAT_INC(expect_create); } @@ -1018,6 +1097,32 @@ nf_conntrack_get(nskb->nfct); } +void ip_ct_generic_change_proto(struct ip_conntrack *ct, + union ip_conntrack_proto *p) +{ + struct ip_conntrack_protocol *proto; + struct ip_conntrack_tuple_hash *th = &ct->tuplehash[IP_CT_DIR_REPLY]; + + proto = ip_ct_find_proto(th->tuple.dst.protonum); + if (proto->lock != NULL) { + write_lock_bh(proto->lock); + memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto)); + write_unlock_bh(proto->lock); + } else + memcpy(&ct->proto, p, sizeof(union ip_conntrack_proto)); +} + +void ip_ct_generic_change_help(struct ip_conntrack *ct, + union ip_conntrack_help *h) +{ + if (ct->helper->lock != NULL) { + spin_lock_bh(ct->helper->lock); + memcpy(&ct->help, h, sizeof(ct->help)); + spin_unlock_bh(ct->helper->lock); + } else + memcpy(&ct->help, h, sizeof(ct->help)); +} + static inline int do_iter(const struct ip_conntrack_tuple_hash *i, int (*iter)(struct ip_conntrack *i, void *data), @@ -1144,23 +1249,27 @@ * ip_conntrack_htable_size)); } -/* Mishearing the voices in his head, our hero wonders how he's - supposed to kill the mall. */ -void ip_conntrack_cleanup(void) +void ip_conntrack_flush() { - ip_ct_attach = NULL; /* This makes sure all current packets have passed through netfilter framework. Roll on, two-stage module delete... */ synchronize_net(); - + i_see_dead_people: ip_ct_iterate_cleanup(kill_all, NULL); if (atomic_read(&ip_conntrack_count) != 0) { schedule(); goto i_see_dead_people; } +} +/* Mishearing the voices in his head, our hero wonders how he's + supposed to kill the mall. */ +void ip_conntrack_cleanup(void) +{ + ip_ct_attach = NULL; + ip_conntrack_flush(); kmem_cache_destroy(ip_conntrack_cachep); kmem_cache_destroy(ip_conntrack_expect_cachep); free_conntrack_hash(); @@ -1258,6 +1367,8 @@ /* - and look it like as a confirmed connection */ set_bit(IPS_CONFIRMED_BIT, &ip_conntrack_untracked.status); + printk("untracked: %p\n", &ip_conntrack_untracked); + return ret; err_free_conntrack_slab: Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_ftp.c 2005-06-28 02:28:44.000000000 +0200 @@ -481,6 +481,8 @@ ftp[i].timeout = 5 * 60; /* 5 minutes */ ftp[i].me = THIS_MODULE; ftp[i].help = help; + ftp[i].lock = &ip_ftp_lock; + ftp[i].change_help = ip_ct_generic_change_help; tmpname = &ftp_names[i][0]; if (ports[i] == FTP_PORT) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_irc.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_irc.c 2005-06-28 02:28:44.000000000 +0200 @@ -275,6 +275,8 @@ hlpr->timeout = dcc_timeout; hlpr->me = THIS_MODULE; hlpr->help = help; + hlpr->lock = &irc_buffer_lock; + hlpr->change_help = ip_ct_generic_change_help; tmpname = &irc_names[i][0]; if (ports[i] == IRC_PORT) Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2005-06-28 02:28:44.000000000 +0200 @@ -109,16 +109,17 @@ return NF_ACCEPT; } +static u_int8_t valid_new[] = { + [ICMP_ECHO] = 1, + [ICMP_TIMESTAMP] = 1, + [ICMP_INFO_REQUEST] = 1, + [ICMP_ADDRESS] = 1 +}; + /* Called when a new connection for this protocol found. */ static int icmp_new(struct ip_conntrack *conntrack, const struct sk_buff *skb) { - static u_int8_t valid_new[] - = { [ICMP_ECHO] = 1, - [ICMP_TIMESTAMP] = 1, - [ICMP_INFO_REQUEST] = 1, - [ICMP_ADDRESS] = 1 }; - if (conntrack->tuplehash[0].tuple.dst.u.icmp.type >= sizeof(valid_new) || !valid_new[conntrack->tuplehash[0].tuple.dst.u.icmp.type]) { /* Can't create a new ICMP `conn' with this. */ @@ -266,6 +267,17 @@ return icmp_error_message(skb, ctinfo, hooknum); } +static int icmp_change_check_tuples(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *reply) +{ + unsigned int type = orig->dst.u.icmp.type; + + if (type >= sizeof(valid_new) || !valid_new[type]) + return -EINVAL; + + return 0; +} + struct ip_conntrack_protocol ip_conntrack_protocol_icmp = { .proto = IPPROTO_ICMP, @@ -277,4 +289,6 @@ .packet = icmp_packet, .new = icmp_new, .error = icmp_error, + .change_check_tuples = icmp_change_check_tuples, + .change_proto = ip_ct_generic_change_proto, }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_sctp.c 2005-06-28 02:28:44.000000000 +0200 @@ -499,6 +499,7 @@ static struct ip_conntrack_protocol ip_conntrack_protocol_sctp = { .proto = IPPROTO_SCTP, .name = "sctp", + .lock = &sctp_lock, .pkt_to_tuple = sctp_pkt_to_tuple, .invert_tuple = sctp_invert_tuple, .print_tuple = sctp_print_tuple, @@ -506,7 +507,8 @@ .packet = sctp_packet, .new = sctp_new, .destroy = NULL, - .me = THIS_MODULE + .me = THIS_MODULE, + .change_proto = ip_ct_generic_change_proto, }; #ifdef CONFIG_SYSCTL Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2005-06-28 02:28:44.000000000 +0200 @@ -1094,6 +1094,7 @@ { .proto = IPPROTO_TCP, .name = "tcp", + .lock = &tcp_lock, .pkt_to_tuple = tcp_pkt_to_tuple, .invert_tuple = tcp_invert_tuple, .print_tuple = tcp_print_tuple, @@ -1101,4 +1102,5 @@ .packet = tcp_packet, .new = tcp_new, .error = tcp_error, + .change_proto = ip_ct_generic_change_proto }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2005-06-28 02:28:44.000000000 +0200 @@ -144,4 +144,5 @@ .packet = udp_packet, .new = udp_new, .error = udp_error, + .change_proto = ip_ct_generic_change_proto }; Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c 2005-06-28 02:28:44.000000000 +0200 @@ -963,6 +963,19 @@ { } +EXPORT_SYMBOL(ip_ct_unlink_destroy_expect); +EXPORT_SYMBOL(ip_conntrack_flush); +EXPORT_SYMBOL(__ip_conntrack_expect_find); +EXPORT_SYMBOL(__ip_conntrack_find); +EXPORT_SYMBOL(ip_conntrack_expect_list); +EXPORT_SYMBOL(ip_ct_generic_change_proto); +EXPORT_SYMBOL(ip_ct_generic_change_help); +EXPORT_SYMBOL(ip_conntrack_expect_find_get); +EXPORT_SYMBOL(ip_conntrack_alloc); +EXPORT_SYMBOL(ip_conntrack_free); +EXPORT_SYMBOL(ip_conntrack_hash_insert); +EXPORT_SYMBOL(ip_ct_remove_expectations); +EXPORT_SYMBOL(ip_ct_find_helper); #ifdef CONFIG_IP_NF_CONNTRACK_EVENTS EXPORT_SYMBOL(ip_conntrack_chain); EXPORT_SYMBOL(ip_conntrack_expect_chain); @@ -993,7 +1006,6 @@ EXPORT_SYMBOL(ip_conntrack_hash); EXPORT_SYMBOL(ip_conntrack_untracked); EXPORT_SYMBOL_GPL(ip_conntrack_find_get); -EXPORT_SYMBOL_GPL(ip_conntrack_put); #ifdef CONFIG_IP_NF_NAT_NEEDED EXPORT_SYMBOL(ip_conntrack_tcp_update); #endif Index: davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c =================================================================== --- davem-2.6.orig/net/ipv4/netfilter/ip_conntrack_tftp.c 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/net/ipv4/netfilter/ip_conntrack_tftp.c 2005-06-28 02:28:44.000000000 +0200 @@ -134,6 +134,7 @@ tftp[i].timeout = 5 * 60; /* 5 minutes */ tftp[i].me = THIS_MODULE; tftp[i].help = tftp_help; + tftp[i].change_help = ip_ct_generic_change_help; tmpname = &tftp_names[i][0]; if (ports[i] == TFTP_PORT) Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_amanda.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_amanda.h 2005-06-28 02:28:44.000000000 +0200 @@ -3,9 +3,11 @@ /* AMANDA tracking. */ struct ip_conntrack_expect; +#ifdef __KERNEL__ extern unsigned int (*ip_nat_amanda_hook)(struct sk_buff **pskb, enum ip_conntrack_info ctinfo, unsigned int matchoff, unsigned int matchlen, struct ip_conntrack_expect *exp); +#endif #endif /* _IP_CONNTRACK_AMANDA_H */ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_ftp.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_ftp.h 2005-06-28 02:28:44.000000000 +0200 @@ -31,6 +31,7 @@ struct ip_conntrack_expect; +#ifdef __KERNEL__ /* For NAT to hook in when we find a packet which describes what other * connection we should expect. */ extern unsigned int (*ip_nat_ftp_hook)(struct sk_buff **pskb, @@ -40,4 +41,5 @@ unsigned int matchlen, struct ip_conntrack_expect *exp, u32 *seq); +#endif #endif /* _IP_CONNTRACK_FTP_H */ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack.h 2005-06-28 02:28:44.000000000 +0200 @@ -71,13 +71,13 @@ IPS_DESTROYED = (1 << IPS_DESTROYED_BIT), }; -#ifdef __KERNEL__ -#include <linux/config.h> -#include <linux/netfilter_ipv4/ip_conntrack_tuple.h> -#include <linux/bitops.h> -#include <linux/compiler.h> -#include <asm/atomic.h> +struct ip_conntrack_counter +{ + u_int64_t packets; + u_int64_t bytes; +}; +#include <linux/netfilter_ipv4/ip_conntrack_tuple.h> #include <linux/netfilter_ipv4/ip_conntrack_tcp.h> #include <linux/netfilter_ipv4/ip_conntrack_icmp.h> #include <linux/netfilter_ipv4/ip_conntrack_sctp.h> @@ -106,6 +106,7 @@ struct ip_ct_irc_master ct_irc_info; }; +#ifdef __KERNEL__ #ifdef CONFIG_IP_NF_NAT_NEEDED #include <linux/netfilter_ipv4/ip_nat.h> #endif @@ -126,12 +127,6 @@ #define IP_NF_ASSERT(x) #endif -struct ip_conntrack_counter -{ - u_int64_t packets; - u_int64_t bytes; -}; - struct ip_conntrack_helper; struct ip_conntrack @@ -140,6 +135,9 @@ plus 1 for any connection(s) we are `master' for */ struct nf_conntrack ct_general; + /* Unique ID that identifies this conntrack*/ + u_int64_t id; + /* Have we seen traffic both ways yet? (bitset) */ unsigned long status; @@ -188,6 +186,9 @@ /* Internal linked list (global expectation list) */ struct list_head list; + /* Expectation ID */ + __u64 id; + /* We expect this tuple, with the following mask */ struct ip_conntrack_tuple tuple, mask; @@ -201,6 +202,8 @@ /* Timer function; deletes the expectation. */ struct timer_list timeout; + atomic_t use; + #ifdef CONFIG_IP_NF_NAT_NEEDED /* This is the original per-proto part, used to map the * expected connection the way the recipient expects. */ @@ -240,7 +243,12 @@ } /* decrement reference count on a conntrack */ -extern inline void ip_conntrack_put(struct ip_conntrack *ct); +static inline void +ip_conntrack_put(struct ip_conntrack *ct) +{ + IP_NF_ASSERT(ct); + nf_conntrack_put(&ct->ct_general); +} /* call to create an explicit dependency on ip_conntrack. */ extern void need_ip_conntrack(void); @@ -275,6 +283,34 @@ ip_ct_iterate_cleanup(int (*iter)(struct ip_conntrack *i, void *data), void *data); +extern struct ip_conntrack_helper * +ip_ct_find_helper(const struct ip_conntrack_tuple *tuple); + +extern void ip_ct_remove_expectations(struct ip_conntrack *ct); + +extern struct ip_conntrack *ip_conntrack_alloc(struct ip_conntrack_tuple *, + struct ip_conntrack_tuple *); + +extern void ip_conntrack_free(struct ip_conntrack *ct); + +extern void ip_conntrack_hash_insert(struct ip_conntrack *ct); + +extern struct ip_conntrack_expect * +__ip_conntrack_expect_find(const struct ip_conntrack_tuple *tuple); + +extern struct ip_conntrack_expect * +ip_conntrack_expect_find_get(const struct ip_conntrack_tuple *tuple); + +extern struct ip_conntrack_tuple_hash * +__ip_conntrack_find(const struct ip_conntrack_tuple *tuple, + const struct ip_conntrack *ignored_conntrack); + +extern inline void ip_conntrack_expect_put(struct ip_conntrack_expect *exp); + +extern void ip_conntrack_flush(void); + +extern void ip_ct_unlink_destroy_expect(struct ip_conntrack_expect *exp); + /* It's confirmed if it is, or has been in the hash table. */ static inline int is_confirmed(struct ip_conntrack *ct) { Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_helper.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_helper.h 2005-06-28 02:28:44.000000000 +0200 @@ -9,6 +9,8 @@ { struct list_head list; /* Internal use. */ + spinlock_t *lock; /* protect private info and buffer */ + const char *name; /* name of the module */ struct module *me; /* pointer to self */ unsigned int max_expected; /* Maximum number of concurrent @@ -24,6 +26,8 @@ int (*help)(struct sk_buff **pskb, struct ip_conntrack *ct, enum ip_conntrack_info conntrackinfo); + + void (*change_help)(struct ip_conntrack *, union ip_conntrack_help *); }; extern int ip_conntrack_helper_register(struct ip_conntrack_helper *); @@ -38,4 +42,7 @@ extern int ip_conntrack_expect_related(struct ip_conntrack_expect *exp); extern void ip_conntrack_unexpect_related(struct ip_conntrack_expect *exp); +extern void ip_ct_generic_change_help(struct ip_conntrack *ct, + union ip_conntrack_help *h); + #endif /*_IP_CONNTRACK_HELPER_H*/ Index: davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h =================================================================== --- davem-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-06-28 02:28:38.000000000 +0200 +++ davem-2.6/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2005-06-28 02:28:44.000000000 +0200 @@ -10,6 +10,8 @@ /* Protocol number. */ u_int8_t proto; + rwlock_t *lock; + /* Protocol name */ const char *name; @@ -47,6 +49,17 @@ int (*error)(struct sk_buff *skb, enum ip_conntrack_info *ctinfo, unsigned int hooknum); + /* check if tuples are valid for a new connection */ + int (*change_check_tuples)(struct ip_conntrack_tuple *orig, + struct ip_conntrack_tuple *reply); + + /* check protocol data is valid */ + int (*change_check_proto)(union ip_conntrack_proto *p); + + /* change protocol info on behalf of ctnetlink */ + void (*change_proto)(struct ip_conntrack *ct, + union ip_conntrack_proto *p); + /* Module (if any) which this is connected to. */ struct module *me; }; @@ -57,6 +70,8 @@ /* Protocol registration. */ extern int ip_conntrack_protocol_register(struct ip_conntrack_protocol *proto); extern void ip_conntrack_protocol_unregister(struct ip_conntrack_protocol *proto); +extern void ip_ct_generic_change_proto(struct ip_conntrack *conntrack, + union ip_conntrack_proto *p); static inline struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol) { ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:12 ` Pablo Neira @ 2005-06-28 2:15 ` Pablo Neira 0 siblings, 0 replies; 50+ messages in thread From: Pablo Neira @ 2005-06-28 2:15 UTC (permalink / raw) To: Pablo Neira Cc: Harald Welte, Netfilter Development Mailinglist, Patrick McHardy Pablo Neira wrote: > Pablo Neira wrote: > >> Harald Welte wrote: >> >>> I don't understand why the order of the expectation list was changed, >>> though. Would you please explain why this was done? >> >> >> >> Because of the table dumping logic I was using. >> >> - if (ct->id <= *id) >> + if (ct->id >= *id) >> continue; > > > Please ignore the patches I've sent you. Damn, I'm tired, wrong patch again :(. Ignore it, I'll send a new digest that applies to SVN tomorrow morning. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:00 ` Pablo Neira 2005-06-28 2:12 ` Pablo Neira @ 2005-06-28 3:53 ` Patrick McHardy 2005-06-28 7:07 ` Harald Welte 2005-07-04 12:59 ` Amin Azez 2005-06-28 7:06 ` Harald Welte 2 siblings, 2 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-28 3:53 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Harald Welte wrote: >>> - Implement ip_conntrack_stats dumping and reset (accounting) >> >> you want to dump the statistics via netlink? i'm not sure whether that >> is required. lnstat is the only program using those counters, and using >> /proc seems fine for that. > > Could this be useful for accounting purposes? I mean, dump and reset as > a single operation, something similar to conntrack -L -z, say conntrack > -S -z. I think it is convenient for anyone wanting to access these statistics without parsing /proc to have everything accessible through a single interface. >>> - Implement get conntrack and destroy (accounting) >> >> sorry, what are you referring to? > > An atomical operation to dump conntrack information and then destroy it. > For example, if someone does: > > conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp > --orig-src-port 10 --orig-dst-port 20 > > the it sends the conntrack info and destroy it. This could be worth it > for accounting. destroy notifications are sent anyway, so we get the stats. What's the difference to normal destruction? > I could use multicast netlink subscrition, so the client can get > subscribed to those events that the user considers interesting. That sounds like the right thing to do. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 3:53 ` Patrick McHardy @ 2005-06-28 7:07 ` Harald Welte 2005-07-04 12:59 ` Amin Azez 1 sibling, 0 replies; 50+ messages in thread From: Harald Welte @ 2005-06-28 7:07 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira [-- Attachment #1: Type: text/plain, Size: 1558 bytes --] On Tue, Jun 28, 2005 at 05:53:22AM +0200, Patrick McHardy wrote: > Pablo Neira wrote: > > Harald Welte wrote: > >>> - Implement ip_conntrack_stats dumping and reset (accounting) > >> > >> you want to dump the statistics via netlink? i'm not sure whether that > >> is required. lnstat is the only program using those counters, and using > >> /proc seems fine for that. > > > > Could this be useful for accounting purposes? I mean, dump and reset as > > a single operation, something similar to conntrack -L -z, say conntrack > > -S -z. > > I think it is convenient for anyone wanting to access these statistics > without parsing /proc to have everything accessible through a single > interface. ... and it is already implemented in current svn ;) > destroy notifications are sent anyway, so we get the stats. What's the > difference to normal destruction? I agree. > > I could use multicast netlink subscrition, so the client can get > > subscribed to those events that the user considers interesting. > > That sounds like the right thing to do. well, but we still create a horrible amount of event messages, let's say in case of a syn flood... -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 3:53 ` Patrick McHardy 2005-06-28 7:07 ` Harald Welte @ 2005-07-04 12:59 ` Amin Azez 1 sibling, 0 replies; 50+ messages in thread From: Amin Azez @ 2005-07-04 12:59 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Pablo Neira wrote: > >>Harald Welte wrote: >> >>>>- Implement ip_conntrack_stats dumping and reset (accounting) >>> >>>you want to dump the statistics via netlink? i'm not sure whether that >>>is required. lnstat is the only program using those counters, and using >>>/proc seems fine for that. >> >>Could this be useful for accounting purposes? I mean, dump and reset as >>a single operation, something similar to conntrack -L -z, say conntrack >>-S -z. I use this, reading and zeroing conntracks at the same time. > I think it is convenient for anyone wanting to access these statistics > without parsing /proc to have everything accessible through a single > interface. Yes; it also preserves synchronicity. In other words it is nice to be sure if an event-update comes chronologically before or after the stats for the same conntrack were read. Over netlink this is easy, over netlink AND /proc is impossible. Azez ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:00 ` Pablo Neira 2005-06-28 2:12 ` Pablo Neira 2005-06-28 3:53 ` Patrick McHardy @ 2005-06-28 7:06 ` Harald Welte 2 siblings, 0 replies; 50+ messages in thread From: Harald Welte @ 2005-06-28 7:06 UTC (permalink / raw) To: Pablo Neira; +Cc: Netfilter Development Mailinglist, Patrick McHardy [-- Attachment #1: Type: text/plain, Size: 2310 bytes --] On Tue, Jun 28, 2005 at 04:00:06AM +0200, Pablo Neira wrote: > >I don't understand why the order of the expectation list was changed, > >though. Would you please explain why this was done? > > Because of the table dumping logic I was using. I see. > >>- Implement ip_conntrack_stats dumping and reset (accounting) > > > >you want to dump the statistics via netlink? i'm not sure whether that > >is required. lnstat is the only program using those counters, and using > >/proc seems fine for that. > > Could this be useful for accounting purposes? I mean, dump and reset as > a single operation, something similar to conntrack -L -z, say conntrack > -S -z. oh, you're referring to ip_conntrack_acct (the accounting), not ip_conntrack_stat (the statistics) ? /me is confused. Isn't that what the CTRZERO message type already does? > An atomical operation to dump conntrack information and then destroy it. > For example, if someone does: > > conntrack -D --orig-src 1.1.1.1 --orig-dst 2.2.2.2 -p tcp > --orig-src-port 10 --orig-dst-port 20 > > the it sends the conntrack info and destroy it. This could be worth it > for accounting. I think it could be useful, indeed. > >I'm not sure if we could somehow (cleanly) tie into the scoket > >close/destroy code. If yes, we could check if we can reduce the mask at > >socket destroy time. > > I think there's no sane way to do this but let me check the netlink code > again. I could use multicast netlink subscrition, so the client can get > subscribed to those events that the user considers interesting. Well, maybe we can extend the netlink code to achieve what we're interested in. Multicast netlink groups are already used for my clumsy tcp/udp/icmp/... distinction. I think it is not a good solution for ctnetlink, since the kernel then sends all events, independent of any process interested in them or not. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira 2005-06-27 20:26 ` Harald Welte @ 2005-06-27 21:31 ` Patrick McHardy 2005-06-28 2:15 ` Pablo Neira 2005-06-27 22:40 ` Patrick McHardy ` (3 subsequent siblings) 5 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-27 21:31 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Hi Harald, > > This patchset introduces tons of updates for the nfnetlink, ctnetlink > and the conntrack event API. I haven't attached the file since it's that > big, about 100K. > > You can get an incremental diff against SVN from: > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch I haven't looked at the patch for a long time, and it removes and adds the whole file, so not sure if it is new .. + /* This is tricky but it works. ip_nat_setup_info needs the + * hook number as parameter, so let's do the correct + * conversion and run away */ + if (*status & IPS_SRC_NAT_DONE) + hooknum = NF_IP_POST_ROUTING; /* IP_NAT_MANIP_SRC */ + else if (*status & IPS_DST_NAT_DONE) + hooknum = NF_IP_PRE_ROUTING; /* IP_NAT_MANIP_DST */ + else + return -EINVAL; /* Missing NAT flags */ This doesn't work reliably, locally generated packets never enter PRE_ROUTING but can be DNATed. I think the hook should be supplied by the user. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 21:31 ` Patrick McHardy @ 2005-06-28 2:15 ` Pablo Neira 2005-06-28 3:56 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-28 2:15 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Pablo Neira wrote: > > I haven't looked at the patch for a long time, and it removes > and adds the whole file, so not sure if it is new .. Yes, that's new. Ick, it seems I didn't express fine myself. Together with the big and confusing patch I uploaded a splitted version that makes easier the review as well: http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/ > + /* This is tricky but it works. ip_nat_setup_info needs the > + * hook number as parameter, so let's do the correct > + * conversion and run away */ > + if (*status & IPS_SRC_NAT_DONE) > + hooknum = NF_IP_POST_ROUTING; /* IP_NAT_MANIP_SRC */ > + else if (*status & IPS_DST_NAT_DONE) > + hooknum = NF_IP_PRE_ROUTING; /* IP_NAT_MANIP_DST */ > + else > + return -EINVAL; /* Missing NAT flags */ > > This doesn't work reliably, locally generated packets never enter > PRE_ROUTING but can be DNATed. I think the hook should be supplied > by the user. The macro HOOK2MANIP used is ip_nat_setup_info returns the same value (maniptype) for NF_IP_PRE_ROUTING and NF_IP_LOCAL_IN, so the same manipulation (DNAT) will be applied to such conntrack. Since we works with conntracks, I don't mind where the packets came from, just want to apply the NAT handling that the user has requested. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:15 ` Pablo Neira @ 2005-06-28 3:56 ` Patrick McHardy 0 siblings, 0 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-28 3:56 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Patrick McHardy wrote: >> This doesn't work reliably, locally generated packets never enter >> PRE_ROUTING but can be DNATed. I think the hook should be supplied >> by the user. > > The macro HOOK2MANIP used is ip_nat_setup_info returns the same value > (maniptype) for NF_IP_PRE_ROUTING and NF_IP_LOCAL_IN, so the same > manipulation (DNAT) will be applied to such conntrack. Since we works > with conntracks, I don't mind where the packets came from, just want to > apply the NAT handling that the user has requested. You're right, I'm still not familiar with the new NAT code .. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira 2005-06-27 20:26 ` Harald Welte 2005-06-27 21:31 ` Patrick McHardy @ 2005-06-27 22:40 ` Patrick McHardy 2005-06-28 2:16 ` Pablo Neira 2005-06-28 7:13 ` Harald Welte 2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson ` (2 subsequent siblings) 5 siblings, 2 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-27 22:40 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > This patchset introduces tons of updates for the nfnetlink, ctnetlink > and the conntrack event API. I haven't attached the file since it's that > big, about 100K. Looks good. A couple more comments on the patch, most are probably not related to recent changes. I remember beeing responsible for at least some of this :) +struct cta_proto { + unsigned char num_proto; /* Protocol number IPPROTO_X */ + union ip_conntrack_proto proto; +}; These should be changed not to use internal conntrack structures, it will hurt us when we want to change them. Instead it should replicate the fields interesting for the user. Also please use fixed-size types instead of unions etc. All structures including u64 types should be padded to multiples of 8 so they are equally sized on 32-bit and 64-bit systems. +#define CTA_HELP_MAXNAMESZ 31 + +struct cta_help { + char name[CTA_HELP_MAXNAMESZ]; /* name of conntrack helper */ + union ip_conntrack_help help; +}; I suggest to use 32, the length in the kernel is not fixed AFAICS and it will be padded to 32 in netlink messages anyway. +/* ctnetlink multicast groups: reports any change of ctinfo, + * ctstatus, or protocol state change. + */ +#define NFGRP_IPV4_CT_TCP 0x01 +#define NFGRP_IPV4_CT_UDP 0x02 +#define NFGRP_IPV4_CT_ICMP 0x04 +#define NFGRP_IPV4_CT_OTHER 0x08 I'm not sure how useful these groups are. I think groups for different event-types might be more useful to reduce the noise. + h->name[CTA_HELP_MAXNAMESZ - 1] = '\0'; + if (strcmp(helper->name, h->name)) + return -EINVAL; We changed ipt_helper to accept a null-byte string as wildcard string, probably a good idea to do the same here. + ct = ip_conntrack_alloc(otuple, rtuple); + if (ct == NULL || IS_ERR(ct)) + return -ENOMEM; ip_conntrack_alloc doesn't return ERR_PTR() + exp = ip_conntrack_expect_find_get(tuple); + if (!exp) + return -ENOENT; I couldn't find this function, but in 2.6.12 expectations aren't refcounted anymore. If they are again by this patch, the refcnt would be leaked in the following lines: + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb2) + return -ENOMEM; ++ ++ /* Events were delivered: loopback mustn't notify events twice */ ++ IPCT_DELIVERED_BIT = 11, ++ IPCT_DELIVERED = (1 << IPCT_DELIVERED_BIT), I couldn't find any users, probably since this can't work with a per-conntrack flag since events are generated by packets. ++ IPEXP_TIMEOUT_BIT = 1, ++ IPEXP_TIMEOUT = (1 << IPEXP_TIMEOUT_BIT), ++}; ++ No user here either. It would be great if you could send a patch without renaming the file, that makes reviewing the changes a lot easier. Renaming can then be done in a seperate patch that doesn't change anything. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 22:40 ` Patrick McHardy @ 2005-06-28 2:16 ` Pablo Neira 2005-06-28 4:03 ` Patrick McHardy 2005-06-28 7:13 ` Harald Welte 1 sibling, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-28 2:16 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Pablo Neira wrote: > >>This patchset introduces tons of updates for the nfnetlink, ctnetlink >>and the conntrack event API. I haven't attached the file since it's that >>big, about 100K. > > > Looks good. A couple more comments on the patch, most are probably not > related to recent changes. I remember beeing responsible for at least > some of this :) Fine. BTW, thanks a lot for the feedback you both, appreciated :) > +struct cta_proto { > + unsigned char num_proto; /* Protocol number IPPROTO_X */ > + union ip_conntrack_proto proto; > +}; > > These should be changed not to use internal conntrack structures, it > will hurt us when we want to change them. Instead it should replicate > the fields interesting for the user. Also please use fixed-size types > instead of unions etc. All structures including u64 types should be > padded to multiples of 8 so they are equally sized on 32-bit and 64-bit > systems. OK, I'll kill those in the next digest, added to my TODO. > +#define CTA_HELP_MAXNAMESZ 31 > + > +struct cta_help { > + char name[CTA_HELP_MAXNAMESZ]; /* name of conntrack helper */ > + union ip_conntrack_help help; > +}; > > I suggest to use 32, the length in the kernel is not fixed AFAICS and > it will be padded to 32 in netlink messages anyway. OK added. > +/* ctnetlink multicast groups: reports any change of ctinfo, > + * ctstatus, or protocol state change. > + */ > +#define NFGRP_IPV4_CT_TCP 0x01 > +#define NFGRP_IPV4_CT_UDP 0x02 > +#define NFGRP_IPV4_CT_ICMP 0x04 > +#define NFGRP_IPV4_CT_OTHER 0x08 > > I'm not sure how useful these groups are. I think groups for different > event-types might be more useful to reduce the noise. Yes, this looks fine. So we could kill those and use an event subscription. > > + h->name[CTA_HELP_MAXNAMESZ - 1] = '\0'; > + if (strcmp(helper->name, h->name)) > + return -EINVAL; > > We changed ipt_helper to accept a null-byte string as wildcard string, > probably a good idea to do the same here. > > > + ct = ip_conntrack_alloc(otuple, rtuple); > + if (ct == NULL || IS_ERR(ct)) > + return -ENOMEM; > > ip_conntrack_alloc doesn't return ERR_PTR() It didn't use to, but now it does in the updated patch: http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/02modifs.patch > + exp = ip_conntrack_expect_find_get(tuple); > + if (!exp) > + return -ENOENT; > > I couldn't find this function, but in 2.6.12 expectations aren't > refcounted anymore. If they are again by this patch, the refcnt would > be leaked in the following lines: > > + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); > + if (!skb2) > + return -ENOMEM; Thanks for catching up this, I'll fix it. I recovered the refcounting for expectations to avoid any possible race condition since I could be working with an expectation whose timeout has expired. > ++ > ++ /* Events were delivered: loopback mustn't notify events twice */ > ++ IPCT_DELIVERED_BIT = 11, > ++ IPCT_DELIVERED = (1 << IPCT_DELIVERED_BIT), > > I couldn't find any users, probably since this can't work with a > per-conntrack flag since events are generated by packets. This must dissapear, I introduced this to fix a loopback event notification (was done twice). This is ugly, so I'll kill it and reset nfcache once the events has been delivered to avoid such event duplication. > ++ IPEXP_TIMEOUT_BIT = 1, > ++ IPEXP_TIMEOUT = (1 << IPEXP_TIMEOUT_BIT), > ++}; > ++ > > No user here either. Right, not yet. I have problems to complete the expectation part of the conntrack event API since most events should be delivered holding ip_conntrack_lock and that's really ugly. > It would be great if you could send a patch without renaming the file, > that makes reviewing the changes a lot easier. Renaming can then be done > in a seperate patch that doesn't change anything. Does this help? Hope so. http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/04ctnetlink.patch I'll send another patch that includes all your suggestion. Thanks. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 2:16 ` Pablo Neira @ 2005-06-28 4:03 ` Patrick McHardy 0 siblings, 0 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-28 4:03 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Patrick McHardy wrote: > >> +/* ctnetlink multicast groups: reports any change of ctinfo, >> + * ctstatus, or protocol state change. >> + */ >> +#define NFGRP_IPV4_CT_TCP 0x01 >> +#define NFGRP_IPV4_CT_UDP 0x02 >> +#define NFGRP_IPV4_CT_ICMP 0x04 >> +#define NFGRP_IPV4_CT_OTHER 0x08 >> >> I'm not sure how useful these groups are. I think groups for different >> event-types might be more useful to reduce the noise. > > Yes, this looks fine. So we could kill those and use an event > subscription. Yes. The question is how to define the groups. One group per event seems wasteful, we probably need some sensible grouping. What events types are there currently? >> I couldn't find this function, but in 2.6.12 expectations aren't >> refcounted anymore. If they are again by this patch, the refcnt would >> be leaked in the following lines: >> >> + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); >> + if (!skb2) >> + return -ENOMEM; > > > Thanks for catching up this, I'll fix it. I recovered the refcounting > for expectations to avoid any possible race condition since I could be > working with an expectation whose timeout has expired. Great. I wanted to restore it anyway for "permanent expectations" for the H.323 helper. > Does this help? Hope so. > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/04ctnetlink.patch I'll continue tomorrow, its getting late here .. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 22:40 ` Patrick McHardy 2005-06-28 2:16 ` Pablo Neira @ 2005-06-28 7:13 ` Harald Welte 2005-06-28 16:02 ` Patrick McHardy 1 sibling, 1 reply; 50+ messages in thread From: Harald Welte @ 2005-06-28 7:13 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira [-- Attachment #1: Type: text/plain, Size: 2123 bytes --] On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote: > +struct cta_proto { > + unsigned char num_proto; /* Protocol number IPPROTO_X */ > + union ip_conntrack_proto proto; > +}; > > These should be changed not to use internal conntrack structures, it > will hurt us when we want to change them. Instead it should replicate > the fields interesting for the user. Also please use fixed-size types > instead of unions etc. All structures including u64 types should be > padded to multiples of 8 so they are equally sized on 32-bit and 64-bit > systems. agreed. However, we still don't have some kind of versioning in the protocol, too. I think we've learned by now that we need versioned structures ;) > +/* ctnetlink multicast groups: reports any change of ctinfo, > + * ctstatus, or protocol state change. > + */ > +#define NFGRP_IPV4_CT_TCP 0x01 > +#define NFGRP_IPV4_CT_UDP 0x02 > +#define NFGRP_IPV4_CT_ICMP 0x04 > +#define NFGRP_IPV4_CT_OTHER 0x08 > > I'm not sure how useful these groups are. I think groups for different > event-types might be more useful to reduce the noise. that was my idea in the beginning (since I didn't think of events at that point). Still, I think creating messages for any kind of event (even if noone listens) is too much overhead. netlink needs to be extended to deal with that issue. Maybe the 'which socket is subscribed to which group' accounting should be done by the core netlink layer, which would then only export a merged bitmask of all netlink sockets. This way ctnetlink can easily check whether it makes sense to create a certain event message or not. This should be useful for other netlink users, too. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 7:13 ` Harald Welte @ 2005-06-28 16:02 ` Patrick McHardy 2005-06-29 19:13 ` Pablo Neira 0 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-28 16:02 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, Pablo Neira Harald Welte wrote: > On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote: > >>These should be changed not to use internal conntrack structures, it >>will hurt us when we want to change them. Instead it should replicate >>the fields interesting for the user. Also please use fixed-size types >>instead of unions etc. All structures including u64 types should be >>padded to multiples of 8 so they are equally sized on 32-bit and 64-bit >>systems. > > agreed. However, we still don't have some kind of versioning in the > protocol, too. I think we've learned by now that we need versioned > structures ;) Netlink is easy to extend by adding new fields at the end if users only check for msgsize >= sizeof(struct). Do you think we should have versioning anyway? >>+/* ctnetlink multicast groups: reports any change of ctinfo, >>+ * ctstatus, or protocol state change. >>+ */ >>+#define NFGRP_IPV4_CT_TCP 0x01 >>+#define NFGRP_IPV4_CT_UDP 0x02 >>+#define NFGRP_IPV4_CT_ICMP 0x04 >>+#define NFGRP_IPV4_CT_OTHER 0x08 >> >>I'm not sure how useful these groups are. I think groups for different >>event-types might be more useful to reduce the noise. > > > that was my idea in the beginning (since I didn't think of events at > that point). > > Still, I think creating messages for any kind of event (even if noone > listens) is too much overhead. netlink needs to be extended to deal > with that issue. > > Maybe the 'which socket is subscribed to which group' accounting should > be done by the core netlink layer, which would then only export a > merged bitmask of all netlink sockets. This way ctnetlink can easily > check whether it makes sense to create a certain event message or not. > > This should be useful for other netlink users, too. Sounds like a good idea. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 16:02 ` Patrick McHardy @ 2005-06-29 19:13 ` Pablo Neira 2005-06-29 19:52 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-29 19:13 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Harald Welte wrote: > >>On Tue, Jun 28, 2005 at 12:40:46AM +0200, Patrick McHardy wrote: >> >> >>>These should be changed not to use internal conntrack structures, it >>>will hurt us when we want to change them. Instead it should replicate >>>the fields interesting for the user. Also please use fixed-size types >>>instead of unions etc. All structures including u64 types should be >>>padded to multiples of 8 so they are equally sized on 32-bit and 64-bit >>>systems. >> >>agreed. However, we still don't have some kind of versioning in the >>protocol, too. I think we've learned by now that we need versioned >>structures ;) > > Netlink is easy to extend by adding new fields at the end if users > only check for msgsize >= sizeof(struct). Do you think we should > have versioning anyway? I think that we could split the structure into fine grain fields. For example, CTA_TUPLE_ORIG would composed of: CTA_ORIG_IPV4_SRC CTA_ORIG_IPV4_DST CTA_L4_PROTONUM CTA_PROTO_IPV4_SRC CTA_PROTO_IPV4_DST CTA_DIR So, instead of sending a packet that contains a reference to an ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields (CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure. But I'll need a function to glue all the fields to create a ip_conntrack_tuple. Maybe too bloated? >>>+/* ctnetlink multicast groups: reports any change of ctinfo, >>>+ * ctstatus, or protocol state change. >>>+ */ >>>+#define NFGRP_IPV4_CT_TCP 0x01 >>>+#define NFGRP_IPV4_CT_UDP 0x02 >>>+#define NFGRP_IPV4_CT_ICMP 0x04 >>>+#define NFGRP_IPV4_CT_OTHER 0x08 >>> >>>I'm not sure how useful these groups are. I think groups for different >>>event-types might be more useful to reduce the noise. >> >> >>that was my idea in the beginning (since I didn't think of events at >>that point). >> >>Still, I think creating messages for any kind of event (even if noone >>listens) is too much overhead. netlink needs to be extended to deal >>with that issue. >> >>Maybe the 'which socket is subscribed to which group' accounting should >>be done by the core netlink layer, which would then only export a >>merged bitmask of all netlink sockets. This way ctnetlink can easily >>check whether it makes sense to create a certain event message or not. >> >>This should be useful for other netlink users, too. Isn't netlink broadcast subscription enough? netlink_broadcast doesn't enqueue packets for a socket that isn't subscribed to a group, so the process never gets useless packets. So I think that we can group event, say: level 1 (weak): - IPCT_NEW - IPCT_DESTROY level 2 (normal): - IPCT_NEW - IPCT_UPDATE - IPCT_STATUS - ... - IPCT_DESTROY At reserve some groups to let the user define some level whenever he wants. Although such level would be unique. OTOH, there are 10 events currently, why is that bad creating a group per event? -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-29 19:13 ` Pablo Neira @ 2005-06-29 19:52 ` Patrick McHardy 2005-06-29 20:16 ` Harald Welte 2005-06-30 0:34 ` Pablo Neira 0 siblings, 2 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-29 19:52 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > I think that we could split the structure into fine grain fields. For > example, CTA_TUPLE_ORIG would composed of: > > CTA_ORIG_IPV4_SRC > CTA_ORIG_IPV4_DST > CTA_L4_PROTONUM > CTA_PROTO_IPV4_SRC > CTA_PROTO_IPV4_DST > CTA_DIR > > So, instead of sending a packet that contains a reference to an > ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields > (CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure. > > But I'll need a function to glue all the fields to create a > ip_conntrack_tuple. Maybe too bloated? I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them. Please nest all attributes related to a tuple in CTA_TUPLE instead of adding them to the top-level. >>> Still, I think creating messages for any kind of event (even if noone >>> listens) is too much overhead. netlink needs to be extended to deal >>> with that issue. >>> >>> Maybe the 'which socket is subscribed to which group' accounting should >>> be done by the core netlink layer, which would then only export a >>> merged bitmask of all netlink sockets. This way ctnetlink can easily >>> check whether it makes sense to create a certain event message or not. >>> >>> This should be useful for other netlink users, too. > > > Isn't netlink broadcast subscription enough? netlink_broadcast doesn't > enqueue packets for a socket that isn't subscribed to a group, so the > process never gets useless packets. But we still spent time building them, just to have them thrown away. > So I think that we can group event, say: > > level 1 (weak): > - IPCT_NEW > - IPCT_DESTROY > > level 2 (normal): > - IPCT_NEW > - IPCT_UPDATE > - IPCT_STATUS > - ... > - IPCT_DESTROY > > At reserve some groups to let the user define some level whenever he > wants. Although such level would be unique. > > OTOH, there are 10 events currently, why is that bad creating a group > per event? I don't think its bad, but I fear hitting the 32 group limit some day. Probably that fear is unfounded and we're never going to reach 32 different types of events. Harald, what do you think? Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-29 19:52 ` Patrick McHardy @ 2005-06-29 20:16 ` Harald Welte 2005-06-30 0:27 ` Pablo Neira 2005-06-30 0:34 ` Pablo Neira 1 sibling, 1 reply; 50+ messages in thread From: Harald Welte @ 2005-06-29 20:16 UTC (permalink / raw) To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Pablo Neira [-- Attachment #1: Type: text/plain, Size: 1537 bytes --] On Wed, Jun 29, 2005 at 09:52:52PM +0200, Patrick McHardy wrote: > > So I think that we can group event, say: > > > > level 1 (weak): > > - IPCT_NEW > > - IPCT_DESTROY > > > > level 2 (normal): > > - IPCT_NEW > > - IPCT_UPDATE > > - IPCT_STATUS > > - ... > > - IPCT_DESTROY > > > > At reserve some groups to let the user define some level whenever he > > wants. Although such level would be unique. > > > > OTOH, there are 10 events currently, why is that bad creating a group > > per event? > > I don't think its bad, but I fear hitting the 32 group limit some day. > Probably that fear is unfounded and we're never going to reach 32 > different types of events. Harald, what do you think? I think it's not all that much of a problem. I doubt that conntrack will get 22 more events anytime soon. The other problem is, don't we overlap with other nfnetlink-based groups? nfnetlink acts as a a multiplex between netlink and the higher layers, so I guess we only have 32groups for nfnetlink as a whole... and there is other stuff like ip_queue and ULOG that is to be ported on top of nfnetlink... -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-29 20:16 ` Harald Welte @ 2005-06-30 0:27 ` Pablo Neira 2005-06-30 0:53 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-30 0:27 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, Patrick McHardy Harald Welte wrote: > On Wed, Jun 29, 2005 at 09:52:52PM +0200, Patrick McHardy wrote: > >>>So I think that we can group event, say: >>> >>>level 1 (weak): >>> - IPCT_NEW >>> - IPCT_DESTROY >>> >>>level 2 (normal): >>> - IPCT_NEW >>> - IPCT_UPDATE >>> - IPCT_STATUS >>> - ... >>> - IPCT_DESTROY >>> >>>At reserve some groups to let the user define some level whenever he >>>wants. Although such level would be unique. >>> >>>OTOH, there are 10 events currently, why is that bad creating a group >>>per event? >> >>I don't think its bad, but I fear hitting the 32 group limit some day. >>Probably that fear is unfounded and we're never going to reach 32 >>different types of events. Harald, what do you think? > > > I think it's not all that much of a problem. I doubt that conntrack > will get 22 more events anytime soon. The other problem is, don't we > overlap with other nfnetlink-based groups? > > nfnetlink acts as a a multiplex between netlink and the higher layers, > so I guess we only have 32groups for nfnetlink as a whole... and there > is other stuff like ip_queue and ULOG that is to be ported on top of > nfnetlink... You're right, we can't do that. In short: nfnetlink currently relies on the fact that is the user program that decides if a packet was sent to it or not. In case that it wasn't, the program drops it. Say we've got ip_queue and ip_conntrack_netlink on top of nfnetlink: ip_queue sends a packet to user space and ip_conntrack_netlink sends event notifications. If both the ipqueue userspace program and libctnetlink are subscribed to the same group, they both will receive spam from each other. To avoid this spamming situation, every subsystem could reserve just one group and user programs must get subscribed to the corresponding group that is interested in. In conclusion, nfnetlink would support up to 32 subsystems but user space programs won't get spam from other subsystem. So, back to ip_conntrack_netlink, we still have the performance issue with regards to the events. Userspace programs will decide if the event received is interesting for them or not. a) We could extend netlink code with some kind of ioctl that sets a mask to say what kind of events are interesting for a given userspace program. b) The fact that we can send events and nobody's listening to them is more difficult to catch up. Maybe a function that checks if there are listeners available before sending an event could help: In that case such function would be kind of racy because we can lose some messages while a userspace program is binding to a given netlink socket. This isn't that annoying, I prefer losing a couple of messages than hurting performance generating messages that won't be ever delivered. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 0:27 ` Pablo Neira @ 2005-06-30 0:53 ` Patrick McHardy 2005-06-30 9:47 ` Pablo Neira 0 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 0:53 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Harald Welte wrote: > >> I think it's not all that much of a problem. I doubt that conntrack >> will get 22 more events anytime soon. The other problem is, don't we >> overlap with other nfnetlink-based groups? >> >> nfnetlink acts as a a multiplex between netlink and the higher layers, >> so I guess we only have 32groups for nfnetlink as a whole... and there >> is other stuff like ip_queue and ULOG that is to be ported on top of >> nfnetlink... > > You're right, we can't do that. In short: > > nfnetlink currently relies on the fact that is the user program that > decides if a packet was sent to it or not. In case that it wasn't, the > program drops it. That's not true, A programm can do pure unicast communication with the kernel, but can also subscribe to the multicast-groups it is interested in. Events are sent to multicast groups and, if they happend in response to a request, to the unicast address. Same for dumps. > Say we've got ip_queue and ip_conntrack_netlink on top of nfnetlink: > ip_queue sends a packet to user space and ip_conntrack_netlink sends > event notifications. If both the ipqueue userspace program and > libctnetlink are subscribed to the same group, they both will receive > spam from each other. Sure, they both need different groups. > To avoid this spamming situation, every subsystem could reserve just one > group and user programs must get subscribed to the corresponding group > that is interested in. In conclusion, nfnetlink would support up to 32 > subsystems but user space programs won't get spam from other subsystem. That's how it should be done. Conntrack is an exceptional case in that it is the only part in the kernel I know of that generates asynchronous netlink messages at high speed, so I think reserving multiple groups is the right way. > So, back to ip_conntrack_netlink, we still have the performance issue > with regards to the events. Userspace programs will decide if the event > received is interesting for them or not. > > a) We could extend netlink code with some kind of ioctl that sets a mask > to say what kind of events are interesting for a given userspace program. Please no ioctls. Groups are fine, if one group per event is too much we need to group them somehow. Maybe: 1) creation/destruction 2) status changes 3) timeout 4) protoinfo 5) helper 6) nat 7) expectations > b) The fact that we can send events and nobody's listening to them is > more difficult to catch up. Maybe a function that checks if there are > listeners available before sending an event could help: In that case > such function would be kind of racy because we can lose some messages > while a userspace program is binding to a given netlink socket. This > isn't that annoying, I prefer losing a couple of messages than hurting > performance generating messages that won't be ever delivered. That race shouldn't be a problem. Userspace can't know when exactly it will start receiving events anyway. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 0:53 ` Patrick McHardy @ 2005-06-30 9:47 ` Pablo Neira 2005-06-30 21:30 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-30 9:47 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Pablo Neira wrote: >>To avoid this spamming situation, every subsystem could reserve just one >>group and user programs must get subscribed to the corresponding group >>that is interested in. In conclusion, nfnetlink would support up to 32 >>subsystems but user space programs won't get spam from other subsystem. > > > That's how it should be done. Conntrack is an exceptional case in that > it is the only part in the kernel I know of that generates asynchronous > netlink messages at high speed, so I think reserving multiple groups > is the right way. OK, I'm fine with the solution based on groups. Then I'll go that way. >>So, back to ip_conntrack_netlink, we still have the performance issue >>with regards to the events. Userspace programs will decide if the event >>received is interesting for them or not. >> >>a) We could extend netlink code with some kind of ioctl that sets a mask >>to say what kind of events are interesting for a given userspace program. > > > Please no ioctls. Groups are fine, if one group per event is too much we > need to group them somehow. Maybe: > > 1) creation/destruction > 2) status changes > 3) timeout > 4) protoinfo > 5) helper > 6) nat > 7) expectations > > >>b) The fact that we can send events and nobody's listening to them is >>more difficult to catch up. Maybe a function that checks if there are >>listeners available before sending an event could help: In that case >>such function would be kind of racy because we can lose some messages >>while a userspace program is binding to a given netlink socket. This >>isn't that annoying, I prefer losing a couple of messages than hurting >>performance generating messages that won't be ever delivered. > > > That race shouldn't be a problem. Userspace can't know when exactly it > will start receiving events anyway. So, should be a netlink function to check if there are currently listeners worth it? -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 9:47 ` Pablo Neira @ 2005-06-30 21:30 ` Patrick McHardy 0 siblings, 0 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 21:30 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > So, should be a netlink function to check if there are currently > listeners worth it? That sounds fine. Messages should then only be built if there is a listener for this group or if it is a reply to a unicast request. BTW, I noticed nfnetlink_send is always called with pid = 0 and echo = 0. This needs to be fixed, whenever anything happens in response to a request the correct pid and echo = 1 need to be used. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-29 19:52 ` Patrick McHardy 2005-06-29 20:16 ` Harald Welte @ 2005-06-30 0:34 ` Pablo Neira 2005-06-30 1:00 ` Patrick McHardy 1 sibling, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-30 0:34 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Patrick McHardy wrote: > Pablo Neira wrote: > >>I think that we could split the structure into fine grain fields. For >>example, CTA_TUPLE_ORIG would composed of: >> >>CTA_ORIG_IPV4_SRC >>CTA_ORIG_IPV4_DST >>CTA_L4_PROTONUM >>CTA_PROTO_IPV4_SRC >>CTA_PROTO_IPV4_DST >>CTA_DIR >> >>So, instead of sending a packet that contains a reference to an >>ip_conntrack_tuple (CTA_TUPLE_ORIG), we'll have a set of fields >>(CTA_ORIG_IPV4_SRC + CTA_ORIG_IPV4_DST + ...) that compose such structure. >> >>But I'll need a function to glue all the fields to create a >>ip_conntrack_tuple. Maybe too bloated? > > > I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be > CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them. That could be fine for nf_conntrack to differenciate between conntracks that represents an ipv4 and ipv6 connections. CTA_PROTO_IPV4_SRC and CTA_PROTO_IPV6_SRC > Please nest all attributes related to a tuple in CTA_TUPLE > instead of adding them to the top-level. Sorry, how that will look like? -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 0:34 ` Pablo Neira @ 2005-06-30 1:00 ` Patrick McHardy 2005-06-30 1:49 ` Thomas Graf 2005-06-30 17:06 ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira 0 siblings, 2 replies; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 1:00 UTC (permalink / raw) To: Pablo Neira; +Cc: Harald Welte, Netfilter Development Mailinglist Pablo Neira wrote: > Patrick McHardy wrote: >> >> I think its fine, but CTA_PROTO_IPV4_{SRC,DST} should be >> CTA_PROTO_{SRC,DST}, there's nothing related to IPv4 in them. > > > That could be fine for nf_conntrack to differenciate between conntracks > that represents an ipv4 and ipv6 connections. > > CTA_PROTO_IPV4_SRC and CTA_PROTO_IPV6_SRC It can do that by looking at the IP attributes (CTA_ORIG_IPV4_SRC and CTA_IPV6_SRC). >> Please nest all attributes related to a tuple in CTA_TUPLE >> instead of adding them to the top-level. > > Sorry, how that will look like? You add a new nfattr header with the type set to CTA_TUPLE and note the position of skb->tail. Then you add the nested attributes as usual. When you're done you set the length of the nfattr header to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle this for rtnetlink (include/linux/rtnetlink.h). Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 1:00 ` Patrick McHardy @ 2005-06-30 1:49 ` Thomas Graf 2005-06-30 1:53 ` Patrick McHardy 2005-06-30 17:06 ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira 1 sibling, 1 reply; 50+ messages in thread From: Thomas Graf @ 2005-06-30 1:49 UTC (permalink / raw) To: Patrick McHardy Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira * Patrick McHardy <42C34445.9020709@trash.net> 2005-06-30 03:00 > You add a new nfattr header with the type set to CTA_TUPLE and > note the position of skb->tail. Then you add the nested attributes > as usual. When you're done you set the length of the nfattr header > to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle > this for rtnetlink (include/linux/rtnetlink.h). At some point we should introduce a "generic" attribute architecture for all netlink families. The connector stuff will need it as well, so maybe the time has come to actually do it. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 1:49 ` Thomas Graf @ 2005-06-30 1:53 ` Patrick McHardy 2005-06-30 12:03 ` Thomas Graf 0 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 1:53 UTC (permalink / raw) To: Thomas Graf; +Cc: Harald Welte, Netfilter Development Mailinglist, Pablo Neira Thomas Graf wrote: > * Patrick McHardy <42C34445.9020709@trash.net> 2005-06-30 03:00 > >>You add a new nfattr header with the type set to CTA_TUPLE and >>note the position of skb->tail. Then you add the nested attributes >>as usual. When you're done you set the length of the nfattr header >>to skb->tail - old_tail. The RTA_NEST/RTA_NEST_END macros handle >>this for rtnetlink (include/linux/rtnetlink.h). > > > At some point we should introduce a "generic" attribute > architecture for all netlink families. The connector > stuff will need it as well, so maybe the time has come > to actually do it. I agree, most of the macros are just copied without modification anyway. One more thing I would like to change is the excessive use of RTA_PUT with structures on the stack. Using __RTA_PUT and putting together these structures in-place would be much nicer. nfnetlink has copied this part of rtnetlink. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 1:53 ` Patrick McHardy @ 2005-06-30 12:03 ` Thomas Graf 2005-06-30 13:27 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Thomas Graf @ 2005-06-30 12:03 UTC (permalink / raw) To: Patrick McHardy Cc: Harald Welte, netdev, Jamal Hadi Salim, Netfilter Development Mailinglist, Pablo Neira * Patrick McHardy <42C350A1.1030602@trash.net> 2005-06-30 03:53 > Thomas Graf wrote: > > At some point we should introduce a "generic" attribute > > architecture for all netlink families. The connector > > stuff will need it as well, so maybe the time has come > > to actually do it. > > I agree, most of the macros are just copied without modification > anyway. One more thing I would like to change is the excessive > use of RTA_PUT with structures on the stack. Using __RTA_PUT > and putting together these structures in-place would be much > nicer. nfnetlink has copied this part of rtnetlink. Good point, after all, I think structs have been used too often and we now suffer from backwards compatbility issues. The points likely to be argueable are things like trim responsibility in the error handling, i.e. delegate it down to the function which also created the header or trim on every level. Other than that the whole thing should be pretty straight forward. What do you think about naming it nlattr? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 12:03 ` Thomas Graf @ 2005-06-30 13:27 ` Patrick McHardy 2005-06-30 18:02 ` Thomas Graf 0 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 13:27 UTC (permalink / raw) To: Thomas Graf Cc: Harald Welte, netdev, Jamal Hadi Salim, Netfilter Development Mailinglist, Pablo Neira Thomas Graf wrote: > Good point, after all, I think structs have been used too often > and we now suffer from backwards compatbility issues. The > points likely to be argueable are things like trim responsibility > in the error handling, i.e. delegate it down to the function > which also created the header or trim on every level. Other than > that the whole thing should be pretty straight forward. I think for nested attributes error handling should happen on the outer level. Just trimming on the inner level would leave a half-finished nested attribute. > What do you think about naming it nlattr? Sounds good. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 13:27 ` Patrick McHardy @ 2005-06-30 18:02 ` Thomas Graf 2005-06-30 21:26 ` Patrick McHardy 0 siblings, 1 reply; 50+ messages in thread From: Thomas Graf @ 2005-06-30 18:02 UTC (permalink / raw) To: Patrick McHardy Cc: Harald Welte, netdev, Jamal Hadi Salim, Netfilter Development Mailinglist, Pablo Neira * Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27 > Thomas Graf wrote: > > Good point, after all, I think structs have been used too often > > and we now suffer from backwards compatbility issues. The > > points likely to be argueable are things like trim responsibility > > in the error handling, i.e. delegate it down to the function > > which also created the header or trim on every level. Other than > > that the whole thing should be pretty straight forward. > > I think for nested attributes error handling should happen on the > outer level. Just trimming on the inner level would leave a > half-finished nested attribute. I can agree with this, the question that remains is: do we want to trim in functions where no nesting is done at all? i.e. things like the generic network statistics dumping interface. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 18:02 ` Thomas Graf @ 2005-06-30 21:26 ` Patrick McHardy 2005-06-30 21:34 ` Thomas Graf 0 siblings, 1 reply; 50+ messages in thread From: Patrick McHardy @ 2005-06-30 21:26 UTC (permalink / raw) To: Thomas Graf Cc: Harald Welte, netdev, Jamal Hadi Salim, Netfilter Development Mailinglist, Pablo Neira Thomas Graf wrote: > * Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27 > >>I think for nested attributes error handling should happen on the >>outer level. Just trimming on the inner level would leave a >>half-finished nested attribute. > > I can agree with this, the question that remains is: do we want > to trim in functions where no nesting is done at all? i.e. things > like the generic network statistics dumping interface. Trimming isn't required in this case since the length is know and checked in advance. An error should still be propagated back of course so potential outer levels can do trimming. Regards Patrick ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 21:26 ` Patrick McHardy @ 2005-06-30 21:34 ` Thomas Graf 2005-06-30 21:49 ` David S. Miller 0 siblings, 1 reply; 50+ messages in thread From: Thomas Graf @ 2005-06-30 21:34 UTC (permalink / raw) To: Patrick McHardy Cc: Harald Welte, netdev, Jamal Hadi Salim, Netfilter Development Mailinglist, Pablo Neira * Patrick McHardy <42C46374.5000407@trash.net> 2005-06-30 23:26 > Thomas Graf wrote: > > * Patrick McHardy <42C3F35B.50805@trash.net> 2005-06-30 15:27 > > > >>I think for nested attributes error handling should happen on the > >>outer level. Just trimming on the inner level would leave a > >>half-finished nested attribute. > > > > I can agree with this, the question that remains is: do we want > > to trim in functions where no nesting is done at all? i.e. things > > like the generic network statistics dumping interface. > > Trimming isn't required in this case since the length is know and > checked in advance. An error should still be propagated back of > course so potential outer levels can do trimming. This is only true if one attribute is added. What I really meant is something like em_text_dump() in em_text.c. Assuming we rework areas like this one in order to avoid structs on the stack, we'll have a lot of such cases where we might hit the limit while not really knowing about the start of the nested tlv. Anyways, I think this is a minor issue and we can safely ignore it and delegate the trimming responsibility to the underlying layer which "started" the nested tlv. The gnet_stats interface was a bad example since it doesn't have such a case. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 21:34 ` Thomas Graf @ 2005-06-30 21:49 ` David S. Miller 2005-06-30 22:08 ` Thomas Graf 0 siblings, 1 reply; 50+ messages in thread From: David S. Miller @ 2005-06-30 21:49 UTC (permalink / raw) To: tgraf; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber From: Thomas Graf <tgraf@suug.ch> Date: Thu, 30 Jun 2005 23:34:59 +0200 > This is only true if one attribute is added. What I really meant > is something like em_text_dump() in em_text.c. Speaking of em_text.c, when one loads ematch rules into that module it does a printk(). I assume that's a thinko and should be removed? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 21:49 ` David S. Miller @ 2005-06-30 22:08 ` Thomas Graf 2005-06-30 22:08 ` David S. Miller 0 siblings, 1 reply; 50+ messages in thread From: Thomas Graf @ 2005-06-30 22:08 UTC (permalink / raw) To: David S. Miller; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber * David S. Miller <20050630.144942.74731532.davem@davemloft.net> 2005-06-30 14:49 > From: Thomas Graf <tgraf@suug.ch> > Date: Thu, 30 Jun 2005 23:34:59 +0200 > > > This is only true if one attribute is added. What I really meant > > is something like em_text_dump() in em_text.c. > > Speaking of em_text.c, when one loads ematch rules into that > module it does a printk(). I assume that's a thinko and should > be removed? Yes, it's a debugging leftover but I'll keep it in until rc3 so if problems appear I have some debugging information. Given that is acceptable. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-30 22:08 ` Thomas Graf @ 2005-06-30 22:08 ` David S. Miller 0 siblings, 0 replies; 50+ messages in thread From: David S. Miller @ 2005-06-30 22:08 UTC (permalink / raw) To: tgraf; +Cc: laforge, netdev, netfilter-devel, hadi, pablo, kaber From: Thomas Graf <tgraf@suug.ch> Date: Fri, 1 Jul 2005 00:08:14 +0200 > Yes, it's a debugging leftover but I'll keep it in until rc3 so if > problems appear I have some debugging information. Given that is acceptable. No problem. ^ permalink raw reply [flat|nested] 50+ messages in thread
* ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-06-30 1:00 ` Patrick McHardy 2005-06-30 1:49 ` Thomas Graf @ 2005-06-30 17:06 ` Pablo Neira 2005-07-11 16:30 ` Amin Azez 1 sibling, 1 reply; 50+ messages in thread From: Pablo Neira @ 2005-06-30 17:06 UTC (permalink / raw) To: Patrick McHardy; +Cc: Harald Welte, Netfilter Development Mailinglist Hi, Here below a candidate of attributes for ctnetlink. As you could see some attributes are still missing, for example, those to modify internal helper structures like ip_ct_ftp... but that would be easy to add later. enum ctattr_type { CTA_UNSPEC, CTA_TUPLE, CTA_STATUS, CTA_PROTOINFO, CTA_HELP, CTA_NAT, CTA_TIMEOUT, CTA_MARK, CTA_COUNTERS, CTA_EXPECT, __CTA_MAX }; enum ctattr_tuple { CTA_TUPLE_DIR, CTA_TUPLE_IP, CTA_TUPLE_PROTO, __CTA_TUPLE_MAX }; enum ctattr_ip { CTA_IP_V4_SRC, CTA_IP_V4_DST, CTA_IP_V6_SRC, CTA_IP_V6_DST, __CTA_IP_MAX }; enum ctattr_l4proto { CTA_PROTO_NUM, CTA_PROTO_TCP_SRC, CTA_PROTO_TCP_DST, CTA_PROTO_UDP_SRC, CTA_PROTO_UDP_DST, CTA_PROTO_ICMP_TYPE, CTA_PROTO_ICMP_CODE, CTA_PROTO_SCTP_CODE, __CTA_PROTO_MAX }; enum ctattr_protoinfo { CTA_PROTOINFO_TCP, __CTA_PROTOINFO_MAX }; enum ctattr_protoinfo_tcp { CTA_TCP_STATE, __CTA_TCP_MAX }; enum ctattr_counters { CTA_COUNTERS_PACKETS, CTA_COUNTERS_BYTES, __CTA_COUNTERS_MAX }; enum ctattr_nat { CTA_NAT_MINIP, CTA_NAT_MAXIP, CTA_NAT_PROTO, __CTA_NAT_MAX }; enum ctattr_protonat { CTA_PROTONAT_MINIP, CTA_PROTONAT_MAXIP, CTA_PROTONAT_TCP, CTA_PROTONAT_UDP, __CTA_PROTONAT_MAX }; enum ctattr_expect { CTA_EXPECT_TUPLE, CTA_EXPECT_MASK, CTA_EXPECT_TIMEOUT, __CTA_EXPECT_MAX }; enum ctattr_help { CTA_HELP_NAME, __CTA_HELP_MAX }; Comments welcome. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-06-30 17:06 ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira @ 2005-07-11 16:30 ` Amin Azez 2005-07-11 16:50 ` Jan Engelhardt 2005-07-11 17:10 ` Harald Welte 0 siblings, 2 replies; 50+ messages in thread From: Amin Azez @ 2005-07-11 16:30 UTC (permalink / raw) To: netfilter-devel; +Cc: Harald Welte I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just because of my fixation with MAC addresses but also to use conntrack to record flow data for non IP-related protocols. I realise it is not the current intent of conntrack to do this, but as much as conntrack solves the high-load problems of ulog, it is desirable for it to keep counters for non-ip protocols. As fast as protocol handlers can be devised it will become "conntrack" again instead of "flowtrack", but the universal identifier between hosts involved in a connection is the mac address and protocol. The IP addresses and direction and and even port numbers etc can only be resolved by some higher level protocol hanlders. The question is, how do others feel about conntrack tracking non-ip connections? I intend to provide patches to do this if it is welcome. Amin Pablo Neira wrote: > Hi, > > Here below a candidate of attributes for ctnetlink. As you could see > some attributes are still missing, for example, those to modify internal > helper structures like ip_ct_ftp... but that would be easy to add later. > > enum ctattr_type { > CTA_UNSPEC, > CTA_TUPLE, > CTA_STATUS, > CTA_PROTOINFO, > CTA_HELP, > CTA_NAT, > CTA_TIMEOUT, > CTA_MARK, > CTA_COUNTERS, > CTA_EXPECT, > __CTA_MAX > }; > > enum ctattr_tuple { > CTA_TUPLE_DIR, > CTA_TUPLE_IP, > CTA_TUPLE_PROTO, > __CTA_TUPLE_MAX > }; > > enum ctattr_ip { > CTA_IP_V4_SRC, > CTA_IP_V4_DST, > CTA_IP_V6_SRC, > CTA_IP_V6_DST, > __CTA_IP_MAX > }; > > enum ctattr_l4proto { > CTA_PROTO_NUM, > CTA_PROTO_TCP_SRC, > CTA_PROTO_TCP_DST, > CTA_PROTO_UDP_SRC, > CTA_PROTO_UDP_DST, > CTA_PROTO_ICMP_TYPE, > CTA_PROTO_ICMP_CODE, > CTA_PROTO_SCTP_CODE, > __CTA_PROTO_MAX > }; > > enum ctattr_protoinfo { > CTA_PROTOINFO_TCP, > __CTA_PROTOINFO_MAX > }; > > enum ctattr_protoinfo_tcp { > CTA_TCP_STATE, > __CTA_TCP_MAX > }; > > enum ctattr_counters { > CTA_COUNTERS_PACKETS, > CTA_COUNTERS_BYTES, > __CTA_COUNTERS_MAX > }; > > enum ctattr_nat { > CTA_NAT_MINIP, > CTA_NAT_MAXIP, > CTA_NAT_PROTO, > __CTA_NAT_MAX > }; > > enum ctattr_protonat { > CTA_PROTONAT_MINIP, > CTA_PROTONAT_MAXIP, > CTA_PROTONAT_TCP, > CTA_PROTONAT_UDP, > __CTA_PROTONAT_MAX > }; > > enum ctattr_expect { > CTA_EXPECT_TUPLE, > CTA_EXPECT_MASK, > CTA_EXPECT_TIMEOUT, > __CTA_EXPECT_MAX > }; > > enum ctattr_help { > CTA_HELP_NAME, > __CTA_HELP_MAX > }; > > Comments welcome. > > -- > Pablo > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 16:30 ` Amin Azez @ 2005-07-11 16:50 ` Jan Engelhardt 2005-07-11 17:11 ` Harald Welte 2005-07-11 17:10 ` Harald Welte 1 sibling, 1 reply; 50+ messages in thread From: Jan Engelhardt @ 2005-07-11 16:50 UTC (permalink / raw) To: Amin Azez; +Cc: Harald Welte, netfilter-devel >I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just >because of my fixation with MAC addresses but also to use conntrack to >record flow data for non IP-related protocols. Can't this be done with ebtables? On another note, how about bringing ebt and ipt closer together, maybe merge them? Jan Engelhardt -- ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 16:50 ` Jan Engelhardt @ 2005-07-11 17:11 ` Harald Welte 2005-07-11 17:40 ` Jan Engelhardt 0 siblings, 1 reply; 50+ messages in thread From: Harald Welte @ 2005-07-11 17:11 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel, Amin Azez [-- Attachment #1: Type: text/plain, Size: 933 bytes --] On Mon, Jul 11, 2005 at 06:50:20PM +0200, Jan Engelhardt wrote: > >I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just > >because of my fixation with MAC addresses but also to use conntrack to > >record flow data for non IP-related protocols. > > Can't this be done with ebtables? > > On another note, how about bringing ebt and ipt closer together, maybe merge > them? yes, the good old pkttables idea. Unfortunately the ever growing number of gpl infringing companies keeps me from tackling this rater large project :( -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 17:11 ` Harald Welte @ 2005-07-11 17:40 ` Jan Engelhardt 2005-07-12 7:54 ` Harald Welte 0 siblings, 1 reply; 50+ messages in thread From: Jan Engelhardt @ 2005-07-11 17:40 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel, Amin Azez >> On another note, how about bringing ebt and ipt closer together, maybe merge >> them? > >yes, the good old pkttables idea. Unfortunately the ever growing number >of gpl infringing companies keeps me from tackling this rater large >project :( Where is the (legal) problem? Jan Engelhardt -- | Alphagate Systems, http://alphagate.hopto.org/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 17:40 ` Jan Engelhardt @ 2005-07-12 7:54 ` Harald Welte 0 siblings, 0 replies; 50+ messages in thread From: Harald Welte @ 2005-07-12 7:54 UTC (permalink / raw) To: Jan Engelhardt; +Cc: netfilter-devel, Amin Azez [-- Attachment #1: Type: text/plain, Size: 1311 bytes --] On Mon, Jul 11, 2005 at 07:40:36PM +0200, Jan Engelhardt wrote: > > >> On another note, how about bringing ebt and ipt closer together, maybe merge > >> them? > > > >yes, the good old pkttables idea. Unfortunately the ever growing number > >of gpl infringing companies keeps me from tackling this rater large > >project :( > > Where is the (legal) problem? I'm not talking about legal problems. I'm just talking about the ever-growing percentage of my daily working time that is consumed with handling those cases. My initial hope was that a number of high-profile cases would cause enough publicity to reduce the amount of GPL violations we're seeing. At the moment I'd say rather is the opposite way. I hope we'll reach break-even at some point... Please follow-up to private mail, if there is the need to. This is off-topic on netfilter-devel, and I merely mentioned it as an excuse for my lack of time. -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 16:30 ` Amin Azez 2005-07-11 16:50 ` Jan Engelhardt @ 2005-07-11 17:10 ` Harald Welte 2005-07-11 17:45 ` Jan Engelhardt 2005-07-12 8:18 ` Amin Azez 1 sibling, 2 replies; 50+ messages in thread From: Harald Welte @ 2005-07-11 17:10 UTC (permalink / raw) To: Amin Azez; +Cc: Netfilter Development Mailinglist, Pablo Neira [-- Attachment #1: Type: text/plain, Size: 1873 bytes --] On Mon, Jul 11, 2005 at 05:30:56PM +0100, Amin Azez wrote: > I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just > because of my fixation with MAC addresses but also to use conntrack to > record flow data for non IP-related protocols. ctnetlink MAC attributes are IMHO nonsense because there is no mac address tracking in conntrack. > I realise it is not the current intent of conntrack to do this, but as > much as conntrack solves the high-load problems of ulog, it is desirable > for it to keep counters for non-ip protocols. Also, MAC address tracking doesn't really make sense since as an intermediate router there is no way that the assumption "source and destination mac never change" is ever valid. > As fast as protocol handlers can be devised it will become "conntrack" > again instead of "flowtrack", but the universal identifier between hosts > involved in a connection is the mac address and protocol. no. you always only see the next hop mac address. let's say you have an upsteram and a downstream router, then all your 'connections' would be between the same two mac addresses. > The question is, how do others feel about conntrack tracking non-ip > connections? I intend to provide patches to do this if it is welcome. nf_conntrack goes the way to layer-3-independent connection tracking. However, conntrack will remain to happen at layer3 and 4 (with minor exceptions to higher layers in the case of conntrack helpers). -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 17:10 ` Harald Welte @ 2005-07-11 17:45 ` Jan Engelhardt 2005-07-12 7:55 ` Harald Welte 2005-07-12 8:18 ` Amin Azez 1 sibling, 1 reply; 50+ messages in thread From: Jan Engelhardt @ 2005-07-11 17:45 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist >nf_conntrack goes the way to layer-3-independent connection tracking. >However, conntrack will remain to happen at layer3 and 4 (with minor >exceptions to higher layers in the case of conntrack helpers). Minor? Hehe that's good :p Reminds me of l7-filter.sf.net (It's not even a helper as in ip_conntrack_irc, but a match module) Jan Engelhardt -- ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 17:45 ` Jan Engelhardt @ 2005-07-12 7:55 ` Harald Welte 0 siblings, 0 replies; 50+ messages in thread From: Harald Welte @ 2005-07-12 7:55 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Netfilter Development Mailinglist [-- Attachment #1: Type: text/plain, Size: 971 bytes --] On Mon, Jul 11, 2005 at 07:45:17PM +0200, Jan Engelhardt wrote: > > >nf_conntrack goes the way to layer-3-independent connection tracking. > >However, conntrack will remain to happen at layer3 and 4 (with minor > >exceptions to higher layers in the case of conntrack helpers). > > Minor? Hehe that's good :p Reminds me of l7-filter.sf.net > (It's not even a helper as in ip_conntrack_irc, but a match module) exactly, it's an iptables match module and therefore not related to nf_conntrack or ip_conntrack. And there is a reason I don't like the idea of pushing it into mainline... -- - Harald Welte <laforge@netfilter.org> http://netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] 2005-07-11 17:10 ` Harald Welte 2005-07-11 17:45 ` Jan Engelhardt @ 2005-07-12 8:18 ` Amin Azez 1 sibling, 0 replies; 50+ messages in thread From: Amin Azez @ 2005-07-12 8:18 UTC (permalink / raw) To: Harald Welte; +Cc: Netfilter Development Mailinglist, Pablo Neira Harald Welte wrote: >On Mon, Jul 11, 2005 at 05:30:56PM +0100, Amin Azez wrote: > > >>I'm interested in adding MAC addr attributes and CTA_TUPLE_MAC; not just >>because of my fixation with MAC addresses but also to use conntrack to >>record flow data for non IP-related protocols. >> >> > >ctnetlink MAC attributes are IMHO nonsense because there is no mac >address tracking in conntrack. > > Indeed; I was proposing that it be so, I'm currently maintaining conntrack patches where mac addresses are part of the conntrack. >>I realise it is not the current intent of conntrack to do this, but as >>much as conntrack solves the high-load problems of ulog, it is desirable >>for it to keep counters for non-ip protocols. >> >> > >Also, MAC address tracking doesn't really make sense since as an >intermediate router there is no way that the assumption "source and >destination mac never change" is ever valid. > > This is also true, there are few circumstances that the mac address could change within the life of a connection and maintain the connection, and it is uncertain in the scenario I proposed how this should be handled. >>As fast as protocol handlers can be devised it will become "conntrack" >>again instead of "flowtrack", but the universal identifier between hosts >>involved in a connection is the mac address and protocol. >> >> > >no. you always only see the next hop mac address. let's say you have >an upsteram and a downstream router, then all your 'connections' would >be between the same two mac addresses. > > The use would be limited to switched or transparently bridged networks which is where most non-ip trafffic will be kept. I don't think this particularly limits the use; many protocols do not employ a global address space anyway. SPX protocol uses the MAC address in conjunction with a network number for node addressing. >>The question is, how do others feel about conntrack tracking non-ip >>connections? I intend to provide patches to do this if it is welcome. >> >> > >nf_conntrack goes the way to layer-3-independent connection tracking. >However, conntrack will remain to happen at layer3 and 4 (with minor >exceptions to higher layers in the case of conntrack helpers). > > So you are not against tracking non-IP traffic as such? I agree that it is ideal to break open the layer 3 addressing for use in the conntrack tuple, I was suggesting using mac address and protocol as a fallback when this was not available. Amin ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira ` (2 preceding siblings ...) 2005-06-27 22:40 ` Patrick McHardy @ 2005-06-28 23:44 ` Josh Samuelson 2005-06-29 19:14 ` Pablo Neira 2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez 2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez 5 siblings, 1 reply; 50+ messages in thread From: Josh Samuelson @ 2005-06-28 23:44 UTC (permalink / raw) To: Pablo Neira; +Cc: Netfilter Development Mailinglist On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote: > o conntrack event API > - IPCT_DELIVERED bit. Loopback reports event are reported twice, this > bit is set once event are delivered. I just came up with a better idea, > reset nfcache once the events have been delivered, but I'll apply this > change in the next patchset. Hi Pablo, I noticed this with my flow module, any interface connecting to itself would create two IPCT_NEW events. I had made my own patch to your linux-2.6.11.conntrack-event-api.patch to call ip_conntrack_event_cache_init() after the events were delivered. I was gonna send it your way but other things came up and there were other problems that I was trying to figure out that were not quite working the way I thought they should. I investigated further and think I've found the problem, but I'm not sure how to move further on it, so now it's time to ask the list: Can anyone explain why in ip_conntrack_proto_icmp.c we delete the conntrack of an ICMP packet as soon as all replies are seen instead of letting ip_conntrack_icmp_timeout take the conntrack like other protocol types? As far as I can see, this breaks the conntrack event API because I get the following sequence of notifications for an ICMP echo request to the host (this is with the ip_conntrack_event_cache_init() patch after notifications have been delivered so I don't get two IPCT_NEW events): IPCT_NEW IPCT_DESTROY IPCT_STATUS I think nfct->use is still held and the conntrack is never really destroyed after death_by_timeout(), so later when the reply comes the conntrack still exists in the hash and the conntrack API fires a IPCT_STATUS event after the IPCT_DESTROY event was fired, which I'm assuming is an invalid sequence of events. ;) So, what would be the harm in the following patch to ip_conntrack_proto_icmp.c? diff -u a/ip_conntrack_proto_icmp.c b/ip_conntrack_proto_icmp.c --- a/ip_conntrack_proto_icmp.c 2005-06-28 16:52:18.000000000 -0500 +++ b/ip_conntrack_proto_icmp.c 2005-06-28 16:52:46.000000000 -0500 @@ -92,15 +92,7 @@ const struct sk_buff *skb, enum ip_conntrack_info ctinfo) { - /* Try to delete connection immediately after all replies: - won't actually vanish as we still have skb, and del_timer - means this will only run once even if count hits zero twice - (theoretically possible with SMP) */ - if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) { - if (atomic_dec_and_test(&ct->proto.icmp.count) - && del_timer(&ct->timeout)) - ct->timeout.function((unsigned long)ct); - } else { + if (CTINFO2DIR(ctinfo) != IP_CT_DIR_REPLY) { atomic_inc(&ct->proto.icmp.count); ip_ct_refresh_acct(ct, ctinfo, skb, ip_ct_icmp_timeout); } Otherwise you'll need to move the IPCT_DESTROY event out of death_by_timeout() and into destroy_conntrack(). I like the former option because then ICMP can actually be utilized in the flow module. The only reservations I have with it lie in the other ip_conntrack_proto_* files that may have del_timer() followed by death_by_timeout(), this may create other gotchas in the connection event API sequence, I'm not sure. Perhaps both changes could be made? :) Cheers, -Josh ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson @ 2005-06-29 19:14 ` Pablo Neira 0 siblings, 0 replies; 50+ messages in thread From: Pablo Neira @ 2005-06-29 19:14 UTC (permalink / raw) To: Josh Samuelson; +Cc: Netfilter Development Mailinglist Hi Josh, Josh Samuelson wrote: > On Mon, Jun 27, 2005 at 08:02:22PM +0200, Pablo Neira wrote: > >>o conntrack event API > > >>- IPCT_DELIVERED bit. Loopback reports event are reported twice, this >>bit is set once event are delivered. I just came up with a better idea, >>reset nfcache once the events have been delivered, but I'll apply this >>change in the next patchset. > > I noticed this with my flow module, any interface connecting to itself > would create two IPCT_NEW events. I had made my own patch to your > linux-2.6.11.conntrack-event-api.patch to call > ip_conntrack_event_cache_init() after the events were delivered. Indeed, this is what I planned to fix this problem ;). Just killed the IPCT_DELIVERED bit, it will be included in next version of the ctevent-api. I'll post it soon. > I was gonna send it your way but other things came up and there > were other problems that I was trying to figure out that were not > quite working the way I thought they should. I investigated > further and think I've found the problem, but I'm not sure > how to move further on it, so now it's time to ask the list: > > Can anyone explain why in ip_conntrack_proto_icmp.c > we delete the conntrack of an ICMP packet as soon as all replies > are seen instead of letting ip_conntrack_icmp_timeout take the > conntrack like other protocol types? As far as I can see, this > breaks the conntrack event API because I get the following > sequence of notifications for an ICMP echo request to the host > (this is with the ip_conntrack_event_cache_init() patch after > notifications have been delivered so I don't get two IPCT_NEW events): > > IPCT_NEW > IPCT_DESTROY > IPCT_STATUS > > I think nfct->use is still held and the conntrack is never really > destroyed after death_by_timeout(), so later when the reply comes > the conntrack still exists in the hash and the conntrack API > fires a IPCT_STATUS event after the IPCT_DESTROY event was fired, > which I'm assuming is an invalid sequence of events. ;) OK, I moved the IPCT_DESTROY event to destroy_conntrack instead of death_by_timeout, this must fix the problem of out of sequence events. > So, what would be the harm in the following patch to > ip_conntrack_proto_icmp.c? > > diff -u a/ip_conntrack_proto_icmp.c b/ip_conntrack_proto_icmp.c > --- a/ip_conntrack_proto_icmp.c 2005-06-28 16:52:18.000000000 -0500 > +++ b/ip_conntrack_proto_icmp.c 2005-06-28 16:52:46.000000000 -0500 > @@ -92,15 +92,7 @@ > const struct sk_buff *skb, > enum ip_conntrack_info ctinfo) > { > - /* Try to delete connection immediately after all replies: > - won't actually vanish as we still have skb, and del_timer > - means this will only run once even if count hits zero twice > - (theoretically possible with SMP) */ > - if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) { > - if (atomic_dec_and_test(&ct->proto.icmp.count) > - && del_timer(&ct->timeout)) > - ct->timeout.function((unsigned long)ct); > - } else { > + if (CTINFO2DIR(ctinfo) != IP_CT_DIR_REPLY) { > atomic_inc(&ct->proto.icmp.count); > ip_ct_refresh_acct(ct, ctinfo, skb, ip_ct_icmp_timeout); > } I don't want to change the current icmp tracking logic, the connection tracking table could get full sooner because of icmp entries that has already seen their reply wanting to timeout. -- Pablo ^ permalink raw reply [flat|nested] 50+ messages in thread
* NETLINK_NETFILTER and NETLINK_FIB_LOOKUP 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira ` (3 preceding siblings ...) 2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson @ 2005-07-11 11:34 ` Amin Azez 2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez 5 siblings, 0 replies; 50+ messages in thread From: Amin Azez @ 2005-07-11 11:34 UTC (permalink / raw) To: netfilter-devel; +Cc: Harald Welte 2.6-git defines #define NETLINK_FIB_LOOKUP 10 which conflicts with #define NETLINK_NETFILTER 10 /* netfilter subsystem */ Are we going to have to change number again? Should we push this definition upstream just to get a fixed number reserved? I don't think netfilter is going away any time soon. Azez ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] updates for [nf|ct]netlink and event API 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira ` (4 preceding siblings ...) 2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez @ 2005-07-11 16:32 ` Amin Azez 5 siblings, 0 replies; 50+ messages in thread From: Amin Azez @ 2005-07-11 16:32 UTC (permalink / raw) To: netfilter-devel Pablo, I realise that there was a lot of dicussion on various parts of your rework of conntrack. Is the SVN patch at http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch still preferred, or has all that you intend been committed to SVN already? Thanks Amin Pablo Neira wrote: > Hi Harald, > > This patchset introduces tons of updates for the nfnetlink, ctnetlink > and the conntrack event API. I haven't attached the file since it's that > big, about 100K. > > You can get an incremental diff against SVN from: > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/SVN-patches/ctnetlink-ctevent-nfnetlink-update-2.6.12.patch > > > Please apply. > > > I've split this big patch above into four pieces to make it easier to > understand the changes: > http://people.netfilter.org/~pablo/ctnetlink-2.6.12/2.6.11-vs-2.6.12/ > > So these four patches shouldn't be applied, just they are meant to make > your life easier to track the changes. > > Summary of changes > ------------------ > > o conntrack event API > - Don't kill NFC_IP_* stuff, keep it there to ensure for old iptables > versions compilation. > - new file ip_conntrack_events.h that contains all event related > functions to reduce pollution in ip_conntrack.h > - IPCT_DELIVERED bit. Loopback reports event are reported twice, this > bit is set once event are delivered. I just came up with a better idea, > reset nfcache once the events have been delivered, but I'll apply this > change in the next patchset. > > o nfnetlink > - kill unused list. > - kill nfnl_exlock(), not needed anymore. > - kill duplicated check: NFNL_SUBSYS_ID(type) > NFNL_SUBSYS_COUNT. > - kill unneeded initialization of subsys_table to NULL, since it's in > BSS section (already set to zero). > - kill dead define CONFIG_NF_NETLINK. > > o ctnetlink > - merge ctnetlink_get_mcgroups and ctnetlink_get_exp_mcgroups > - implemented NAT handlings > - kill unused ctnetlink_kill > - use __u64 id's for conntracks > - stop using NLMSG_DONE to report the end of a dump, use explicite ACK > instead (NLM_F_ACK). > - fixed broken expectation timeout dumping. > - kill unused ctnetlink_exp_dump_proto > - kill ctnetlink_exp_dump: fairly small and just used once > - kill NFNL_SUBSYS_CTNETLINK_EXP, use NFNL_SUBSYS_CTNETLINK instead > - Fix expectation table dumping > - Fix expectation creation > - implemented flushing of the expect table > > TODO > ---- > > - Implement ip_conntrack_stats dumping and reset (accounting) > - Implement get conntrack and destroy (accounting) > - Kill event/dump mask based (?). Although it's unique, I think that it > could be useful for weak conntrack event notification (think of just > new, established and destroy event notification to reduce performance > impact). > > Once ip_conntrack_netlink gets fully featured and people don't report > bugs for quite some time. I'll create a nf_conntrack_netlink tree. > > -- > Pablo > > ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2005-07-12 8:18 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-06-27 18:02 [PATCH 1/2] updates for [nf|ct]netlink and event API Pablo Neira 2005-06-27 20:26 ` Harald Welte 2005-06-28 2:00 ` Pablo Neira 2005-06-28 2:12 ` Pablo Neira 2005-06-28 2:15 ` Pablo Neira 2005-06-28 3:53 ` Patrick McHardy 2005-06-28 7:07 ` Harald Welte 2005-07-04 12:59 ` Amin Azez 2005-06-28 7:06 ` Harald Welte 2005-06-27 21:31 ` Patrick McHardy 2005-06-28 2:15 ` Pablo Neira 2005-06-28 3:56 ` Patrick McHardy 2005-06-27 22:40 ` Patrick McHardy 2005-06-28 2:16 ` Pablo Neira 2005-06-28 4:03 ` Patrick McHardy 2005-06-28 7:13 ` Harald Welte 2005-06-28 16:02 ` Patrick McHardy 2005-06-29 19:13 ` Pablo Neira 2005-06-29 19:52 ` Patrick McHardy 2005-06-29 20:16 ` Harald Welte 2005-06-30 0:27 ` Pablo Neira 2005-06-30 0:53 ` Patrick McHardy 2005-06-30 9:47 ` Pablo Neira 2005-06-30 21:30 ` Patrick McHardy 2005-06-30 0:34 ` Pablo Neira 2005-06-30 1:00 ` Patrick McHardy 2005-06-30 1:49 ` Thomas Graf 2005-06-30 1:53 ` Patrick McHardy 2005-06-30 12:03 ` Thomas Graf 2005-06-30 13:27 ` Patrick McHardy 2005-06-30 18:02 ` Thomas Graf 2005-06-30 21:26 ` Patrick McHardy 2005-06-30 21:34 ` Thomas Graf 2005-06-30 21:49 ` David S. Miller 2005-06-30 22:08 ` Thomas Graf 2005-06-30 22:08 ` David S. Miller 2005-06-30 17:06 ` ctnetlink attributes [was: Re: [PATCH 1/2] updates for [nf|ct]netlink and event API] Pablo Neira 2005-07-11 16:30 ` Amin Azez 2005-07-11 16:50 ` Jan Engelhardt 2005-07-11 17:11 ` Harald Welte 2005-07-11 17:40 ` Jan Engelhardt 2005-07-12 7:54 ` Harald Welte 2005-07-11 17:10 ` Harald Welte 2005-07-11 17:45 ` Jan Engelhardt 2005-07-12 7:55 ` Harald Welte 2005-07-12 8:18 ` Amin Azez 2005-06-28 23:44 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Josh Samuelson 2005-06-29 19:14 ` Pablo Neira 2005-07-11 11:34 ` NETLINK_NETFILTER and NETLINK_FIB_LOOKUP Amin Azez 2005-07-11 16:32 ` [PATCH 1/2] updates for [nf|ct]netlink and event API Amin Azez
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.