* TODO list before feature freeze
@ 2002-07-18 9:34 Rusty Russell
2002-07-19 7:39 ` Balazs Scheidler
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: Rusty Russell @ 2002-07-18 9:34 UTC (permalink / raw)
To: netfilter-devel; +Cc: netdev, netfilter-core
Hi all,
With four months to go before the feature freeze, it's
important to compile a feature list for netfilter-related things. I
see the following coming up:
Connection tracking:
o TCP window tracking finally goes in.
o Fix the extremely low TCP RST timeout
o Fix the UDP timeout calculations to be per-port.
o Improve hashing
o Fix the massive timer performance problem.
o Zero-copy-safe the connection tracking framework
o ctnetlink support
iptables:
o Change over to a netlink interface
o Back to add/delete/replace interface + commit.
o Rewrite libiptc to use netlink (to port iptables).
o Write new ip extension for iptables.
o Zero-copy-safe the iptables framework
NAT:
o Zero-copy-safe the NAT framework
Please add feature requests: note that I have not been following the
lists, so "obvious" things may not be obvious to me.
Thanks for your patience,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: TODO list before feature freeze 2002-07-18 9:34 TODO list before feature freeze Rusty Russell @ 2002-07-19 7:39 ` Balazs Scheidler 2002-07-19 17:43 ` Michael Richardson 2002-07-29 10:57 ` jamal 2 siblings, 0 replies; 29+ messages in thread From: Balazs Scheidler @ 2002-07-19 7:39 UTC (permalink / raw) To: Rusty Russell; +Cc: netfilter-devel, netdev, netfilter-core On Thu, Jul 18, 2002 at 07:34:53PM +1000, Rusty Russell wrote: > Hi all, > > With four months to go before the feature freeze, it's > important to compile a feature list for netfilter-related things. I > see the following coming up: > > Connection tracking: > o TCP window tracking finally goes in. > o Fix the extremely low TCP RST timeout > o Fix the UDP timeout calculations to be per-port. > o Improve hashing > o Fix the massive timer performance problem. > o Zero-copy-safe the connection tracking framework > o ctnetlink support > > iptables: > o Change over to a netlink interface > o Back to add/delete/replace interface + commit. > o Rewrite libiptc to use netlink (to port iptables). > o Write new ip extension for iptables. > o Zero-copy-safe the iptables framework > > NAT: > o Zero-copy-safe the NAT framework > > Please add feature requests: note that I have not been following the > lists, so "obvious" things may not be obvious to me. I think conntrack exemptions and transparent proxy support should be added to the list. The latter is working for me in production at least for TCP connections. UDP support is to be dependant on conntrack exemptions, so that is not yet implemented. (at least the sendmsg side, the recvmsg side should be working) -- Bazsi PGP info: KeyID 9AF8D0A9 Fingerprint CD27 CFB0 802C 0944 9CFD 804E C82C 8EB1 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-18 9:34 TODO list before feature freeze Rusty Russell 2002-07-19 7:39 ` Balazs Scheidler @ 2002-07-19 17:43 ` Michael Richardson 2002-07-29 10:57 ` jamal 2 siblings, 0 replies; 29+ messages in thread From: Michael Richardson @ 2002-07-19 17:43 UTC (permalink / raw) To: Rusty Russell; +Cc: netfilter-devel, netdev, netfilter-core >>>>> "Rusty" == Rusty Russell <rusty@rustcorp.com.au> writes: Rusty> With four months to go before the feature freeze, it's Rusty> important to compile a feature list for netfilter-related things. I Rusty> see the following coming up: Rusty> Connection tracking: Rusty> o TCP window tracking finally goes in. Rusty> o Fix the extremely low TCP RST timeout Rusty> o Fix the UDP timeout calculations to be per-port. Rusty> o Improve hashing Rusty> o Fix the massive timer performance problem. Rusty> o Zero-copy-safe the connection tracking framework Rusty> o ctnetlink support Permit CT to deal with multihomed hosts. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-18 9:34 TODO list before feature freeze Rusty Russell 2002-07-19 7:39 ` Balazs Scheidler 2002-07-19 17:43 ` Michael Richardson @ 2002-07-29 10:57 ` jamal 2002-07-29 11:12 ` Andi Kleen 2002-07-29 22:14 ` Rusty Russell 2 siblings, 2 replies; 29+ messages in thread From: jamal @ 2002-07-29 10:57 UTC (permalink / raw) To: Rusty Russell; +Cc: netfilter-devel, netdev, netfilter-core On Thu, 18 Jul 2002, Rusty Russell wrote: > Hi all, > > With four months to go before the feature freeze, Really? ;-> > > Connection tracking: Fix perfomance problems with this thing. You may have seen reports of performance degradation it introduces. I was hoping to take a look at some point time hasnt been visiting this side. > > iptables: > o Change over to a netlink interface > o Back to add/delete/replace interface + commit. > o Rewrite libiptc to use netlink (to port iptables). I hope this resolves the current scheme where the whole add/delete/replace interface + commit happens in user space? If you use netlink it would make sense to do incremental updates to the kernel. cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 10:57 ` jamal @ 2002-07-29 11:12 ` Andi Kleen 2002-07-29 11:23 ` jamal 2002-07-29 15:25 ` Michael Richardson 2002-07-29 22:14 ` Rusty Russell 1 sibling, 2 replies; 29+ messages in thread From: Andi Kleen @ 2002-07-29 11:12 UTC (permalink / raw) To: jamal; +Cc: Rusty Russell, netfilter-devel, netdev, netfilter-core > > > > Connection tracking: > > Fix perfomance problems with this thing. You may have seen reports of > performance degradation it introduces. I was hoping to take a look at some > point time hasnt been visiting this side. One obvious problem that it has is that it uses vmalloc to allocate its big hash table. This will likely lead to TLB thrashing on a busy box. It should try to allocate the hashtable with get_free_pages() first and only fall back to vmalloc if that fails. This way it would run with large pages. (case in point: we have at least one report that routing performance breaks down with ip_conntrack when memory size is increased over 1GB on P3s. The hash table size depends on the memory size. The problem does not occur on P4s. P4s have larger TLBs than P3s.) -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:12 ` Andi Kleen @ 2002-07-29 11:23 ` jamal 2002-07-29 11:56 ` Andi Kleen 2002-07-29 16:26 ` Patrick Schaaf 2002-07-29 15:25 ` Michael Richardson 1 sibling, 2 replies; 29+ messages in thread From: jamal @ 2002-07-29 11:23 UTC (permalink / raw) To: Andi Kleen; +Cc: Rusty Russell, netfilter-devel, netdev, netfilter-core On Mon, 29 Jul 2002, Andi Kleen wrote: > One obvious problem that it has is that it uses vmalloc to allocate its > big hash table. This will likely lead to TLB thrashing on a busy box. It should > try to allocate the hashtable with get_free_pages() first and only fall > back to vmalloc if that fails. This way it would run with large pages. > > (case in point: we have at least one report that routing > performance breaks down with ip_conntrack when memory size is increased over > 1GB on P3s. The hash table size depends on the memory size. The problem does > not occur on P4s. P4s have larger TLBs than P3s.) > They also have a lot of problems with their per-packet computations. Robert and I spent a short time looking at "this thing that is making us look bad" (perfomance wise) and talked to Harald. Something that looked like needs improvement at first glance was the aging and hashing schemes. cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:23 ` jamal @ 2002-07-29 11:56 ` Andi Kleen 2002-07-29 15:40 ` Martin Josefsson 2002-07-29 22:43 ` Martin Josefsson 2002-07-29 16:26 ` Patrick Schaaf 1 sibling, 2 replies; 29+ messages in thread From: Andi Kleen @ 2002-07-29 11:56 UTC (permalink / raw) To: jamal; +Cc: Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core On Mon, Jul 29, 2002 at 07:23:49AM -0400, jamal wrote: > > > On Mon, 29 Jul 2002, Andi Kleen wrote: > > > One obvious problem that it has is that it uses vmalloc to allocate its > > big hash table. This will likely lead to TLB thrashing on a busy box. It should > > try to allocate the hashtable with get_free_pages() first and only fall > > back to vmalloc if that fails. This way it would run with large pages. > > > > (case in point: we have at least one report that routing > > performance breaks down with ip_conntrack when memory size is increased over > > 1GB on P3s. The hash table size depends on the memory size. The problem does > > not occur on P4s. P4s have larger TLBs than P3s.) > > > > They also have a lot of problems with their per-packet computations. > Robert and I spent a short time looking at "this thing that is making > us look bad" (perfomance wise) and talked to Harald. > Something that looked like needs improvement at first glance was the aging > and hashing schemes. Yes, some more tuning is probably needed. here is a patch for 2.4 that just makes it use get_free_pages to test the TLB theory. Another obvious improvement would be to not use list_heads for the hash table buckets - a single pointer would likely suffice and it would cut the hash table in half, saving cache, TLB and memory. -Andi --- linux-work/net/ipv4/netfilter/ip_conntrack_core.c-CONNTRACK Thu Jul 25 13:36:42 2002 +++ linux-work/net/ipv4/netfilter/ip_conntrack_core.c Mon Jul 29 13:48:33 2002 @@ -50,6 +50,7 @@ LIST_HEAD(protocol_list); static LIST_HEAD(helpers); unsigned int ip_conntrack_htable_size = 0; +static int ip_conntrack_vmalloc; static int ip_conntrack_max = 0; static atomic_t ip_conntrack_count = ATOMIC_INIT(0); struct list_head *ip_conntrack_hash; @@ -1053,6 +1054,15 @@ return 1; } +static void free_conntrack_hash(void) +{ + if (ip_conntrack_vmalloc) + vfree(ip_conntrack_hash); + else + free_pages((unsigned long)ip_conntrack_hash, + get_order(sizeof(struct list_head) * ip_conntrack_htable_size)); +} + /* Mishearing the voices in his head, our hero wonders how he's supposed to kill the mall. */ void ip_conntrack_cleanup(void) @@ -1075,7 +1085,7 @@ } kmem_cache_destroy(ip_conntrack_cachep); - vfree(ip_conntrack_hash); + free_conntrack_hash(); nf_unregister_sockopt(&so_getorigdst); } @@ -1109,8 +1119,17 @@ if (ret != 0) return ret; - ip_conntrack_hash = vmalloc(sizeof(struct list_head) - * ip_conntrack_htable_size); + /* AK: the hash table is twice as big than needed because it uses list_head. + it would be much nicer to caches to use a single pointer list head here. */ + ip_conntrack_vmalloc = 0; + ip_conntrack_hash = (void *)__get_free_pages(GFP_KERNEL, + get_order(sizeof(struct list_head) * + ip_conntrack_htable_size)); + if (!ip_conntrack_hash) { + ip_conntrack_vmalloc = 1; + printk("ip_conntrack: falling back to vmalloc. performance may be degraded.\n"); + ip_conntrack_hash = vmalloc(sizeof(struct list_head) * ip_conntrack_htable_size); + } if (!ip_conntrack_hash) { nf_unregister_sockopt(&so_getorigdst); return -ENOMEM; @@ -1121,7 +1140,7 @@ SLAB_HWCACHE_ALIGN, NULL, NULL); if (!ip_conntrack_cachep) { printk(KERN_ERR "Unable to create ip_conntrack slab cache\n"); - vfree(ip_conntrack_hash); + free_conntrack_hash(); nf_unregister_sockopt(&so_getorigdst); return -ENOMEM; } @@ -1145,7 +1164,7 @@ = register_sysctl_table(ip_conntrack_root_table, 0); if (ip_conntrack_sysctl_header == NULL) { kmem_cache_destroy(ip_conntrack_cachep); - vfree(ip_conntrack_hash); + free_conntrack_hash(); nf_unregister_sockopt(&so_getorigdst); return -ENOMEM; } ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:56 ` Andi Kleen @ 2002-07-29 15:40 ` Martin Josefsson 2002-07-29 16:15 ` Patrick Schaaf 2002-07-29 22:43 ` Martin Josefsson 1 sibling, 1 reply; 29+ messages in thread From: Martin Josefsson @ 2002-07-29 15:40 UTC (permalink / raw) To: Andi Kleen Cc: jamal, Rusty Russell, Netfilter-devel, netdev, netfilter-core, Patrick Schaaf On Mon, 2002-07-29 at 13:56, Andi Kleen wrote: > here is a patch for 2.4 that just makes it use get_free_pages to test the > TLB theory. Another obvious improvement would be to not use list_heads > for the hash table buckets - a single pointer would likely suffice and > it would cut the hash table in half, saving cache, TLB and memory. I think the list_heads are used for only one thing currently, for the early eviction in case of overload, then it scans backwards in the chains to find unreplied connections to evict, or so the comment in early_drop() says: /* Traverse backwards: gives us oldest, which is roughly LRU */ but then it uses the normal LIST_FIND macro which I think traverses the list in the normal forward direction. I havn't looked into the rest of the code but I can't seem to remember anything that needs list_heads. I think Patrick Schaaf is looking into conntrack as we speak. Maybe he has any ideas? I know I've had plans on rewriting the locking in conntrack which is quite frankly horrible, one giant rwlock used for almost everything (including the hashtable). One idea that has come to mind is using RCU (need to learn more about it) or maybe use one rwlock per N buckets or something. Looking at some stats from one of my routers I see that an average connection is over 130 packets long so the ratio between reads and writes is quite good. And this eviction which occurs at overload needs to be redone, we can't go around dropping one unreplied connection at a time, we need gang-eviction of unreplied connections. We've had some nasty DDoS's here in which our routers have been spending all cputime in conntrack trying to evict connections to make room for the SYN floods coming in at >130kpps. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 15:40 ` Martin Josefsson @ 2002-07-29 16:15 ` Patrick Schaaf 2002-07-29 17:12 ` Martin Josefsson 0 siblings, 1 reply; 29+ messages in thread From: Patrick Schaaf @ 2002-07-29 16:15 UTC (permalink / raw) To: Martin Josefsson Cc: Andi Kleen, jamal, Rusty Russell, Netfilter-devel, netdev, netfilter-core, Patrick Schaaf (warning: crystal ball engaged to parse from the quoted mail snippets. Maybe missing context. I'm just reading netfilter-devel) > On Mon, 2002-07-29 at 13:56, Andi Kleen wrote: > > > here is a patch for 2.4 that just makes it use get_free_pages to test the > > TLB theory. I presume this is about the vmalloc()ed hash bucket table? If yes, it's certainly an interesting experiment to try making it allocated from an area without TLB issues. We can expect a TLB miss on every packet with the current setup, allocating the bucket table from large-TLB memory would be a clear win of one memory roundtrip. The netfilter hook statistics patch I mentioned in the other mail, should be able to show the difference. If my guess is right, you could see a 5-10% improvement on the ip_conntrack hook functions. > > Another obvious improvement would be to not use list_heads > > for the hash table buckets - a single pointer would likely suffice and > > it would cut the hash table in half, saving cache, TLB and memory. > > Martin Josefsson wrote: > I think the list_heads are used for only one thing currently, for the > early eviction in case of overload, Don't forget the nonscanning list_del(), called whenever a conntrack is unhashed at it's death. However, with a suitable bucket number, i.e. low chain lengths, the scan on conntrack removal would be OK. The early_drop() scanning, if it wants to work backward, may as well work forward, keeping a "last unreplied found" pointer, and returning that when falling off the single list end. Thus, I also think that the list could be simple. >From the top of my head, here are other fields that we could get rid off: - the ctrack backpointer in each tuple. - the protocol field in each tuple. - the 20 byte infos[] array in ip_conntrack. - we could out-of-line ip_nat_info. With the current layout, when lists must be walked on a 32-byte-cacheline box, we are sure to always read two cache lines for the skipped-over tuples. > I know I've had plans on rewriting the locking in conntrack which is > quite frankly horrible, one giant rwlock used for almost everything > (including the hashtable). I'd like to see lockmeter statistics before this change. When you split the one lock into a sectored lock: each conntrack is hashed twice, so you need to be careful with lock order when adding or removing. (well, there is another possibility, but I won't go into that now) > One idea that has come to mind is using RCU I don't see RCU solving hash link list update problems. Care to explain how that would work? > And this eviction which occurs at overload needs to be redone, we can't > go around dropping one unreplied connection at a time, we need > gang-eviction of unreplied connections. I propose to put them all on a seperate LRU list, and reap the oldest. best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 16:15 ` Patrick Schaaf @ 2002-07-29 17:12 ` Martin Josefsson 2002-07-29 17:35 ` Nivedita Singhvi 0 siblings, 1 reply; 29+ messages in thread From: Martin Josefsson @ 2002-07-29 17:12 UTC (permalink / raw) To: Patrick Schaaf Cc: Andi Kleen, jamal, Rusty Russell, Netfilter-devel, netdev, netfilter-core On Mon, 2002-07-29 at 18:15, Patrick Schaaf wrote: > > Martin Josefsson wrote: > > I think the list_heads are used for only one thing currently, for the > > early eviction in case of overload, > > Don't forget the nonscanning list_del(), called whenever a conntrack > is unhashed at it's death. However, with a suitable bucket number, > i.e. low chain lengths, the scan on conntrack removal would be OK. Oh I forgot about that. > The early_drop() scanning, if it wants to work backward, may as well > work forward, keeping a "last unreplied found" pointer, and returning > that when falling off the single list end. > > Thus, I also think that the list could be simple. The comment says that we are scanning backwards but the code says we are scanning forward without keeping a "last unreplied found" pointer, we just return the first unreplied found. So going to a single linked list and adding that would probably be better than the current behaviour. > > I know I've had plans on rewriting the locking in conntrack which is > > quite frankly horrible, one giant rwlock used for almost everything > > (including the hashtable). > > I'd like to see lockmeter statistics before this change. When you split > the one lock into a sectored lock: each conntrack is hashed twice, so > you need to be careful with lock order when adding or removing. > (well, there is another possibility, but I won't go into that now) The main reason is that I'd like conntrack to hold up better under attacks. I'll see if I can produce some lockmeter statistics later this week. > > One idea that has come to mind is using RCU > > I don't see RCU solving hash link list update problems. Care to explain > how that would work? Have you seen the rtcache RCU patch? it almost halved the time spent doing the lookups because of no lock bouncing between cpu's. But RCU is best suited for things that can tolerate stale data on reads, something which we can not. I've spoken to Hana Linder who is one of the RCU people and she said that the dcache RCU patch uses some techniques to solve this as the dcache can't tolerate stale data either. I havn't investigated this yet but it got my attention. But the fact is that not many routers are SMP machines. Maybe it could help some very busy SMP servers? > > And this eviction which occurs at overload needs to be redone, we can't > > go around dropping one unreplied connection at a time, we need > > gang-eviction of unreplied connections. > > I propose to put them all on a seperate LRU list, and reap the oldest. I proposed this as well and then James Morris shot me down :) What about the case where someone tries to DoS a specific chain? -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 17:12 ` Martin Josefsson @ 2002-07-29 17:35 ` Nivedita Singhvi 0 siblings, 0 replies; 29+ messages in thread From: Nivedita Singhvi @ 2002-07-29 17:35 UTC (permalink / raw) To: Martin Josefsson Cc: Patrick Schaaf, Andi Kleen, jamal, Rusty Russell, Netfilter-devel, netdev, netfilter-core Quoting Martin Josefsson <gandalf@wlug.westbo.se>: > > I don't see RCU solving hash link list update problems. Care to > > explain how that would work? > > Have you seen the rtcache RCU patch? it almost halved the time > spent doing the lookups because of no lock bouncing between cpu's. > But RCU is best suited for things that can tolerate stale data > on reads, something which we can not. I've spoken to Hana Linder > who is one of the RCU people and she said that the dcache RCU patch > uses some techniques to solve this as the dcache can't tolerate > stale data either. The other environment parameter that RCU does best in is that read:write ratio is high, in order to swallow the overhead. I havent looked into netfilter code, dont have much experience in that area to suggest what your normal traffic would consist of. However, I've seen some posts and data from Dipankar and Hanna which suggest the tradeoff is at a lower ratio than we were used to (I was in Sequent's ptx/TCP/IP team where we used it heavily, but our machines were NUMA boxes with a staggering 10:1 penalty for going off quad, back in the olden days). > I havn't investigated this yet but it got my attention. > > But the fact is that not many routers are SMP machines. > Maybe it could help some very busy SMP servers? Hope so :) thanks, Nivedita ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:56 ` Andi Kleen 2002-07-29 15:40 ` Martin Josefsson @ 2002-07-29 22:43 ` Martin Josefsson 1 sibling, 0 replies; 29+ messages in thread From: Martin Josefsson @ 2002-07-29 22:43 UTC (permalink / raw) To: Andi Kleen; +Cc: jamal, Rusty Russell, Netfilter-devel, netdev, netfilter-core On Mon, 2002-07-29 at 13:56, Andi Kleen wrote: > here is a patch for 2.4 that just makes it use get_free_pages to test the > TLB theory. Another obvious improvement would be to not use list_heads > for the hash table buckets - a single pointer would likely suffice and > it would cut the hash table in half, saving cache, TLB and memory. ip_nat_core is also allocating it's hashtable via vmalloc and it's twice as large as the one in ip_conntrack. (or rather, it's two hashtables allocated at once, maybe they should be split up into two allocations?) diff -x *.orig -x *.rej -urN linux-2.4.19-rc3.old/net/ipv4/netfilter/ip_nat_core.c linux-2.4.19-rc3/net/ipv4/netfilter/ip_nat_core.c --- linux-2.4.19-rc3.old/net/ipv4/netfilter/ip_nat_core.c Thu Jul 25 18:26:42 2002 +++ linux-2.4.19-rc3/net/ipv4/netfilter/ip_nat_core.c Tue Jul 30 00:14:12 2002 @@ -43,6 +43,8 @@ /* Calculated at init based on memory size */ static unsigned int ip_nat_htable_size; +static int ip_nat_vmalloc; + static struct list_head *bysource; static struct list_head *byipsproto; LIST_HEAD(protos); @@ -958,8 +960,16 @@ /* Leave them the same for the moment. */ ip_nat_htable_size = ip_conntrack_htable_size; - /* One vmalloc for both hash tables */ - bysource = vmalloc(sizeof(struct list_head) * ip_nat_htable_size*2); + /* One allocation for both hash tables */ + ip_nat_vmalloc = 0; + bysource = (void *)__get_free_pages(GFP_KERNEL, + get_order(sizeof(struct list_head) * + ip_nat_htable_size * 2)); + if (!bysource) { + ip_nat_vmalloc = 1; + printk("ip_nat: falling back to vmalloc. performance may be degraded.\n"); + bysource = vmalloc(sizeof(struct list_head) * ip_nat_htable_size * 2); + } if (!bysource) { return -ENOMEM; } @@ -999,5 +1009,10 @@ { ip_ct_selective_cleanup(&clean_nat, NULL); ip_conntrack_destroyed = NULL; - vfree(bysource); + + if (ip_nat_vmalloc) + vfree(bysource); + else + free_pages((unsigned long)bysource, + get_order(sizeof(struct list_head) * ip_nat_htable_size * 2)); } -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:23 ` jamal 2002-07-29 11:56 ` Andi Kleen @ 2002-07-29 16:26 ` Patrick Schaaf 2002-07-29 16:31 ` Andi Kleen 2002-07-30 11:58 ` jamal 1 sibling, 2 replies; 29+ messages in thread From: Patrick Schaaf @ 2002-07-29 16:26 UTC (permalink / raw) To: jamal; +Cc: Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core Jamal, > They also have a lot of problems with their per-packet computations. > Robert and I spent a short time looking at "this thing that is making > us look bad" (perfomance wise) and talked to Harald. Do you have written up somewhere what kind of performance problems you were seeing, under which conditions (hash bucket count, number of tracked connections, packet load) > Something that looked like needs improvement at first glance was the aging > and hashing schemes. Regarding the hashing schemes, please see discussions on netfilter-devel over the last weeks: http://lists.netfilter.org/pipermail/netfilter-devel/2002-July/thread.html and a small presentation of various bucket sizes / hash functions for some real world scenarios: http://bei.bof.de/ex6/ This presentation, a bit terse on comments, links to a tarball which allows you to recreate the same presentation for any dump of /proc/net/ip_conntrack, varying bucket counts and hash functions. best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 16:26 ` Patrick Schaaf @ 2002-07-29 16:31 ` Andi Kleen 2002-07-29 16:42 ` Patrick Schaaf 2002-07-30 11:58 ` jamal 1 sibling, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-07-29 16:31 UTC (permalink / raw) To: Patrick Schaaf Cc: jamal, Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core > Regarding the hashing schemes, please see discussions on netfilter-devel > over the last weeks: > > http://lists.netfilter.org/pipermail/netfilter-devel/2002-July/thread.html > > and a small presentation of various bucket sizes / hash functions > for some real world scenarios: http://bei.bof.de/ex6/ > This presentation, a bit terse on comments, links to a tarball > which allows you to recreate the same presentation for any > dump of /proc/net/ip_conntrack, varying bucket counts and > hash functions. Have you done any profiling of real workloads to see where the actual overhead comes from? -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 16:31 ` Andi Kleen @ 2002-07-29 16:42 ` Patrick Schaaf 2002-07-29 16:45 ` Patrick Schaaf 0 siblings, 1 reply; 29+ messages in thread From: Patrick Schaaf @ 2002-07-29 16:42 UTC (permalink / raw) To: Andi Kleen Cc: Patrick Schaaf, jamal, Rusty Russell, netfilter-devel, netdev, netfilter-core > Have you done any profiling of real workloads to see where the actual > overhead comes from? Not yet. I've spent the last weeks learning enough about the code to make sense of profiles :) This week (probably wednesday) I'll put both my netfilter hook statistic patch, and enabled kernel profiling, onto a production box (the transproxy thing from the bucket occupation analysis). Right now I have totally undersized bucket count on that machine (7168 buckets for 10 times the tuples), so I'll first measure the "accidental long list walk" situation, and then retry with a suitable bucket size. best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 16:42 ` Patrick Schaaf @ 2002-07-29 16:45 ` Patrick Schaaf 0 siblings, 0 replies; 29+ messages in thread From: Patrick Schaaf @ 2002-07-29 16:45 UTC (permalink / raw) To: Patrick Schaaf Cc: Andi Kleen, jamal, Rusty Russell, netfilter-devel, netdev, netfilter-core > This week (probably wednesday) I'll put both my netfilter hook statistic > patch, and enabled kernel profiling, onto a production box (the transproxy > thing from the bucket occupation analysis). Right now I have totally > undersized bucket count on that machine (7168 buckets for 10 times > the tuples), so I'll first measure the "accidental long list walk" > situation, and then retry with a suitable bucket size. Before somebody get the wrong idea: the machine I mentioned, serves as a squid proxy for over 3000 narrowband dialup users (all web traffic), and it has no performance problems at all with that. For all I know, any optimization we may make regarding netfilter, won't make the squids on that box work perceivably better. I have permanent average and median service time monitoring to prove or disprove this assertion :-) best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 16:26 ` Patrick Schaaf 2002-07-29 16:31 ` Andi Kleen @ 2002-07-30 11:58 ` jamal 2002-07-30 12:27 ` Patrick Schaaf 1 sibling, 1 reply; 29+ messages in thread From: jamal @ 2002-07-30 11:58 UTC (permalink / raw) To: Patrick Schaaf Cc: Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core On Mon, 29 Jul 2002, Patrick Schaaf wrote: > Jamal, > > > They also have a lot of problems with their per-packet computations. > > Robert and I spent a short time looking at "this thing that is making > > us look bad" (perfomance wise) and talked to Harald. > > Do you have written up somewhere what kind of performance problems you were > seeing, under which conditions (hash bucket count, number of tracked > connections, packet load) > Just standard setup in forwarding. Infact you only need one or two flows active to verify this -- which leads me to believe hashing may play a smaller role in the whole problem. What i have seen and reported by many people (someone seems to have gone one step further and documented numbers, but cant find his email right now). Take Linux as a router, it routes at x% of wire rate. Load conntracking and watch it go down another 25% at least. > > Something that looked like needs improvement at first glance was the aging > > and hashing schemes. > > Regarding the hashing schemes, please see discussions on netfilter-devel > over the last weeks: > I think hashing is one of the problems. What performance improvemet are you seeing? (couldnt tell from looking at your data) cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 11:58 ` jamal @ 2002-07-30 12:27 ` Patrick Schaaf 2002-07-30 12:29 ` jamal 0 siblings, 1 reply; 29+ messages in thread From: Patrick Schaaf @ 2002-07-30 12:27 UTC (permalink / raw) To: jamal Cc: Patrick Schaaf, Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core > What i have seen and reported by many people (someone seems to have gone > one step further and documented numbers, but cant find his email right > now). Take Linux as a router, it routes at x% of wire rate. Load > conntracking and watch it go down another 25% at least. Unfortunately, this is insufficient information to pin down what was happening. As Andi Kleen mentioned, a simple kernel profile from such a test would be a good start. Most likely things leading to such a result, in no specific suborder: - skb linearization - always-defragment - ip_conntrack_lock contention - per-packet timer management I'm not personally interested in line rate routing, but I look forward to further results from such setups. I concentrate on real server work- loads, because that's where my job is. > I think hashing is one of the problems. What performance improvemet are > you seeing? (couldnt tell from looking at your data) We found that the autosizing tends to make the bucket count a multiple of two, and we found the currently used hash function does not like that, resulting in longer average bucket list lengths than neccessary. The crc32 hashes, and suitable modified abcd hashes, don't suffer from this deficiency, and they are almost identical to random (a pseudohash I used to depict the optimum). However, the "badness" of the current hash, given my datasets, results in less than one additional list element, on average. So we could save one memory roundtrip. Given that with my netfilter hook statistics patch, I see >3000 cycles (1Ghz processor) spent in ip_conntrack_in - about 10 memory round-trips - I don't think that you could measure the hash function improvement, except for artificial test cases. We can improve here, but not much. Changing the hash function is mostly interesting to make hash bucket length attacks more unlikely. The abcd hash family, with boottime chosen multipliers, could be of use here. best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 12:27 ` Patrick Schaaf @ 2002-07-30 12:29 ` jamal 2002-07-30 13:06 ` Patrick Schaaf 2002-07-30 13:08 ` Martin Josefsson 0 siblings, 2 replies; 29+ messages in thread From: jamal @ 2002-07-30 12:29 UTC (permalink / raw) To: Patrick Schaaf Cc: Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core On Tue, 30 Jul 2002, Patrick Schaaf wrote: > > What i have seen and reported by many people (someone seems to have gone > > one step further and documented numbers, but cant find his email right > > now). Take Linux as a router, it routes at x% of wire rate. Load > > conntracking and watch it go down another 25% at least. > > Unfortunately, this is insufficient information to pin down what was > happening. As Andi Kleen mentioned, a simple kernel profile from such > a test would be a good start. > Dont have time -- if i did youd probably see a patch from me. I gave you the testcase, you go do it. Infact you dont even need to be forwarding to reproduce this. Marc Boucher has a small udp traffic generator; you can use that on the lo device and reproduce this. > Most likely things leading to such a result, in no specific suborder: > > - skb linearization > - always-defragment > - ip_conntrack_lock contention > - per-packet timer management > > I'm not personally interested in line rate routing, but I look forward > to further results from such setups. I concentrate on real server work- > loads, because that's where my job is. > Has nothing to do with forwarding (Marcs tool is a client-server setup) You only need one or two flows. I am almost certain that if you solve that simple case, you end up solving the larger problem. > > I think hashing is one of the problems. What performance improvemet are > > you seeing? (couldnt tell from looking at your data) > > We found that the autosizing tends to make the bucket count a multiple > of two, and we found the currently used hash function does not like > that, resulting in longer average bucket list lengths than neccessary. > The crc32 hashes, and suitable modified abcd hashes, don't suffer from > this deficiency, and they are almost identical to random (a pseudohash > I used to depict the optimum). > > However, the "badness" of the current hash, given my datasets, results > in less than one additional list element, on average. So we could save > one memory roundtrip. Given that with my netfilter hook statistics patch, > I see >3000 cycles (1Ghz processor) spent in ip_conntrack_in - about > 10 memory round-trips - I don't think that you could measure the hash > function improvement, except for artificial test cases. > > We can improve here, but not much. Changing the hash function is mostly > interesting to make hash bucket length attacks more unlikely. The abcd > hash family, with boottime chosen multipliers, could be of use here. > I think this is a good start, but insufficient; in general i think you are looking at the wrong thing. Hashing is an issue, no doubt but i dont think it is the main issue. Did you start chasing hashing because you saw some large numbers in profiles? If i was to use instinct i would say the last two items you list are probably the things you may want to chase. cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 12:29 ` jamal @ 2002-07-30 13:06 ` Patrick Schaaf 2002-07-30 13:42 ` jamal 2002-07-30 13:08 ` Martin Josefsson 1 sibling, 1 reply; 29+ messages in thread From: Patrick Schaaf @ 2002-07-30 13:06 UTC (permalink / raw) To: jamal Cc: Patrick Schaaf, Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core > Hashing is an issue, no doubt but i dont think it is the main issue. I fully agree. I mention this all the time. > Did you start chasing hashing because you saw some large numbers > in profiles? No. I looked at the hash function and bucket sizes, because I now have machines in the field running conntrack, and I wanted to understand how I should chose an appropriate size. So I took real world conntrack snapshots on those machines, and analysed. I learned all there is to learn about the question, and present the results for others to use like they see fit. End of story. Again, I have no performance trouble with conntracking on any of my production systems. best regards Patrick -- everything is under control. (Hardware) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 13:06 ` Patrick Schaaf @ 2002-07-30 13:42 ` jamal 0 siblings, 0 replies; 29+ messages in thread From: jamal @ 2002-07-30 13:42 UTC (permalink / raw) To: Patrick Schaaf Cc: Andi Kleen, Rusty Russell, netfilter-devel, netdev, netfilter-core On Tue, 30 Jul 2002, Patrick Schaaf wrote: > > Again, I have no performance trouble with conntracking on any of my > production systems. > Then you have embarked on a meaningless science project, i am afraid. This is where Rustys comments that he needs numbers makes a lot of sense. cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 12:29 ` jamal 2002-07-30 13:06 ` Patrick Schaaf @ 2002-07-30 13:08 ` Martin Josefsson 2002-07-30 15:54 ` Filip Sneppe (Cronos) 1 sibling, 1 reply; 29+ messages in thread From: Martin Josefsson @ 2002-07-30 13:08 UTC (permalink / raw) To: jamal Cc: Patrick Schaaf, Andi Kleen, Rusty Russell, Netfilter-devel, netdev, netfilter-core On Tue, 2002-07-30 at 14:29, jamal wrote: > On Tue, 30 Jul 2002, Patrick Schaaf wrote: > > Most likely things leading to such a result, in no specific suborder: > > > > - skb linearization > > - always-defragment > > - ip_conntrack_lock contention > > - per-packet timer management > If i was to use instinct i would say > the last two items you list are probably the things you may want to chase. Here's two small patches. The first is a small patch to avoid updating the per-connection timer for every packet. With this patch you get one update per second per connection. Things are complicated by the fact that connections can change timeouts. This patch isn't verified for correctness, YMMV. (the pptp helper needs updating to work in combination with this patch) The second patch changes the hashtable lookup slightly so we don't hash the tuple each iteration, once is enough. I don't have any numbers for these patches and I can't find the url to the tests one of the netfilter-devel people has done. diff -x *.orig -urN linux.orig/net/ipv4/netfilter/ip_conntrack_core.c linux/net/ipv4/netfilter/ip_conntrack_core.c --- linux.orig/net/ipv4/netfilter/ip_conntrack_core.c Tue Jul 30 14:38:41 2002 +++ linux/net/ipv4/netfilter/ip_conntrack_core.c Tue Jul 30 14:40:06 2002 @@ -855,8 +855,10 @@ if (!is_confirmed(ct)) ct->timeout.expires = extra_jiffies; else { - /* Need del_timer for race avoidance (may already be dying). */ - if (del_timer(&ct->timeout)) { + /* Don't update timer for each packet, only if it's been >HZ + * ticks since last update or change is negative. + * Need del_timer for race avoidance (may already be dying). */ + if ((unsigned long)(jiffies + extra_jiffies - ct->timeout.expires) >= HZ && del_timer(&ct->timeout)) { ct->timeout.expires = jiffies + extra_jiffies; add_timer(&ct->timeout); } --- linux-2.4.19-pre10/net/ipv4/netfilter/ip_conntrack_core.c.orig Sat Jun 8 00:48:59 2002 +++ linux-2.4.19-pre10/net/ipv4/netfilter/ip_conntrack_core.c Sat Jun 8 00:49:56 2002 @@ -292,9 +292,10 @@ const struct ip_conntrack *ignored_conntrack) { struct ip_conntrack_tuple_hash *h; + size_t hash = hash_conntrack(tuple); MUST_BE_READ_LOCKED(&ip_conntrack_lock); - h = LIST_FIND(&ip_conntrack_hash[hash_conntrack(tuple)], + h = LIST_FIND(&ip_conntrack_hash[hash], conntrack_tuple_cmp, struct ip_conntrack_tuple_hash *, tuple, ignored_conntrack); -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-30 13:08 ` Martin Josefsson @ 2002-07-30 15:54 ` Filip Sneppe (Cronos) 0 siblings, 0 replies; 29+ messages in thread From: Filip Sneppe (Cronos) @ 2002-07-30 15:54 UTC (permalink / raw) To: Martin Josefsson Cc: jamal, Patrick Schaaf, Andi Kleen, Rusty Russell, Netfilter-devel, netdev, netfilter-core On Tue, 2002-07-30 at 15:08, Martin Josefsson wrote: > On Tue, 2002-07-30 at 14:29, jamal wrote: > > > If i was to use instinct i would say > > the last two items you list are probably the things you may want to chase. > > Here's two small patches. > > The first is a small patch to avoid updating the per-connection timer > for every packet. With this patch you get one update per second per > connection. Things are complicated by the fact that connections can > change timeouts. This patch isn't verified for correctness, YMMV. > (the pptp helper needs updating to work in combination with this patch) > > The second patch changes the hashtable lookup slightly so we don't hash > the tuple each iteration, once is enough. > > I don't have any numbers for these patches and I can't find the url to > the tests one of the netfilter-devel people has done. > Hi Martin, These may be the patches I did some very basic testing (and readprofile profiling) on. http://www.filip.sneppe.yucom.be/linux/netfilter/performance/benchmarks.htm I don't have a lot to add to the discussion except that I can make time to test patches/ideas provided someone tells me *how* to test, what to look for, etc. For instance, a lot of the current numbers on that page with the varying mtu sizes are, in retrospect, basically stupid tests that don't reveal a lot of new stuff :-/. Regards, Filip Regards, Filip ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 11:12 ` Andi Kleen 2002-07-29 11:23 ` jamal @ 2002-07-29 15:25 ` Michael Richardson 2002-07-29 15:52 ` Patrick Schaaf 2002-07-29 20:51 ` Andi Kleen 1 sibling, 2 replies; 29+ messages in thread From: Michael Richardson @ 2002-07-29 15:25 UTC (permalink / raw) To: netfilter-devel, netdev, netfilter-core >>>>> "Andi" == Andi Kleen <ak@suse.de> writes: Andi> (case in point: we have at least one report that routing Andi> performance breaks down with ip_conntrack when memory size is Andi> increased over 1GB on P3s. The hash table size depends on the Andi> memory size. The problem does not occur on P4s. P4s have larger Andi> TLBs than P3s.) That's a non-obvious result. I'll bet that most of the memory-size based hash tables in the kernel suffer from similar problems. A good topic for a paper, I'd say. ] Internet Security. Have encryption, will travel |1 Fish/2 Fish [ ] Michael Richardson, Sandelman Software Works, Ottawa, ON |Red F./Blow F [ ]mcr@sandelman.ottawa.on.ca http://www.sandelman.ottawa.on.ca/ |strong crypto [ ] At the far end of some dark fiber - wait that's dirt! |for everyone [ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 15:25 ` Michael Richardson @ 2002-07-29 15:52 ` Patrick Schaaf 2002-07-29 20:51 ` Andi Kleen 1 sibling, 0 replies; 29+ messages in thread From: Patrick Schaaf @ 2002-07-29 15:52 UTC (permalink / raw) To: Michael Richardson; +Cc: netfilter-devel, netdev, netfilter-core > >>>>> "Andi" == Andi Kleen <ak@suse.de> writes: > Andi> (case in point: we have at least one report that routing > Andi> performance breaks down with ip_conntrack when memory size is > Andi> increased over 1GB on P3s. The hash table size depends on the > Andi> memory size. The problem does not occur on P4s. P4s have larger > Andi> TLBs than P3s.) > > That's a non-obvious result. > > I'll bet that most of the memory-size based hash tables in the kernel > suffer from similar problems. A good topic for a paper, I'd say. That's for sure - but I don't see the relevance of TLBs. The only place where I expect any in-CPU caches to matter, is for synthetic test cases where there's a very small number of conntracks (fitting into CPU caches), and a huge load of packets (to look good in a benchmark). Under real life operation, we either have very light loads - then conntrack lookup does not matter at all - or we have high load, several 10000 packets per second. Then, things may get slow in conntrack when you don't have enough hash buckets - two times the number of concurrent connections is appropriate. Or, if that's not the problem, you will already spread lookup so far across the hash table, in a random fashion, that you'll incurr at least two TLB faults plus several cache line loads for each packet. When that point is reached, further increase in packet load should not make things worse. Andi, what report are you referring to? Any specifics I can read? In case somebody isn't aware, we have been over the hash function and hash bucket thing during the last month. See lots of mails in the netfilter-devel archive. I'm prepared to take on any presumed inefficiency in the current conntracking code. I know some things that may be relevant that I did not write about during the last weeks, but I have no real life indication that they matter - I'd love to have the opportunity to see such a situation. So if anybody got the time to work on such a perceived performance problem, please come to the netfilter-devel mailing lest, and let's talk specifics. As it were, I published a small netfilter performance counter patch over the weekend, which you can find in the archive at http://lists.netfilter.org/pipermail/netfilter-devel/2002-July/008792.html I hope to see some really worrying output from you :) best regards Patrick - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 15:25 ` Michael Richardson 2002-07-29 15:52 ` Patrick Schaaf @ 2002-07-29 20:51 ` Andi Kleen 2002-07-30 7:26 ` Patrick Schaaf 1 sibling, 1 reply; 29+ messages in thread From: Andi Kleen @ 2002-07-29 20:51 UTC (permalink / raw) To: Michael Richardson; +Cc: netfilter-devel, netdev, netfilter-core On Mon, Jul 29, 2002 at 11:25:28AM -0400, Michael Richardson wrote: > > >>>>> "Andi" == Andi Kleen <ak@suse.de> writes: > Andi> (case in point: we have at least one report that routing > Andi> performance breaks down with ip_conntrack when memory size is > Andi> increased over 1GB on P3s. The hash table size depends on the > Andi> memory size. The problem does not occur on P4s. P4s have larger > Andi> TLBs than P3s.) > > That's a non-obvious result. > > I'll bet that most of the memory-size based hash tables in the kernel > suffer from similar problems. A good topic for a paper, I'd say. In fact there have been papers about this like http://www.citi.umich.edu/projects/linux-scalability/reports/hash.html but the results were unfortunately not adapted. This has been discussed for a long time. Linux hash tables suffer often from poor hash functions (some are good, but others are not so great), excessive sizing to cover the poor functions and using double pointer heads when not needed (=hash table twice as big). Excessive sizing wastes memory (several MB just for hash tables on a standard system including some gems like a incredible big mount hash table that near nobody needs to manage their 5-10 mounts) I wrote a slist.h that works like the linux list.h some time ago, but uses lists instead of rings with a single pointer head to at least avoid the last problem. In the following discussion the preference was for a more generic hash table ADT instead of another list abstraction. So if you wanted to put some work into this I would: - Develop some simple and tasteful and linux like (most of the existing ones in other software packages fail at least one of these) generic hash table abstraction. - Convert all the big hash tables to this generic code. - Let it use single pointer heads. - Make it implement the sizing based on memory size in common code with a single knob to tune it per system. In fact I think it should definitely take L2 cache size in account, not only main memory. - Add generic statistics as CONFIG option so that you can see hit rates for all your hash tables and how much space they need. - Make it either use the existing hash function per table or a generic good hash function like http://burtleburtle.net/bob/hash/evahash.html Try out all these knobs and write a paper about it. - Try to get it merged with the best results as default options Unfortunately netfilter hash is a bad example for this, because its DoS handling requirements (LRU etc.) are more complex than what most other linux hash tables need and I am not sure if it would make sense to put it into generic code. On the other hand if the generic code is flexible enough it would be possible to implement it on top of it. -Andi ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 20:51 ` Andi Kleen @ 2002-07-30 7:26 ` Patrick Schaaf 0 siblings, 0 replies; 29+ messages in thread From: Patrick Schaaf @ 2002-07-30 7:26 UTC (permalink / raw) To: Andi Kleen; +Cc: netfilter-devel, netdev, netfilter-core > Unfortunately netfilter hash is a bad example for this, because its > DoS handling requirements (LRU etc.) are more complex than what most other > linux hash tables need and I am not sure if it would make sense to > put it into generic code. There is actually one issue with the current netfilter hash code: The code intentionally does not 0 out the next pointer when a conntrack is removed from the hashes; only new, never-yet-hashed conntracks have their next field be 0, and the confirm logic relies on that. Could be easily changed to use an appropriate flag bit in struct ip_conntrack. As a consequence, the single linked list I'm prototyping must be a ring list, with the hash bucket pointer within the list - same scheme as with the doubly linked list. It's oopsing on me as I type :) A non-ring implementation would be smaller, so I think we really want that flag bit for the confirmations. Rusty? All other cases could be handled by a general hash implementation with per-list-entry user supplied comparison callback, and a per-table hash function. I'm sure that any real DoS handling will work by varying constants used in the hash function. That's the result of the recent "abcd" hashing work. The thing that worries me, even with the current setup, is the idea of a general boottime sizing of all such general hash tables. The things are hard to override once loaded, so sizes must fit what's needed in the real world, and that's over a _mix_ of various tables that all play together under this or that workload. Maybe runtime rehashing is the way to go here, to make this fully adaptive. best regards Patrick ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 10:57 ` jamal 2002-07-29 11:12 ` Andi Kleen @ 2002-07-29 22:14 ` Rusty Russell 2002-07-30 12:04 ` jamal 1 sibling, 1 reply; 29+ messages in thread From: Rusty Russell @ 2002-07-29 22:14 UTC (permalink / raw) To: jamal; +Cc: netfilter-devel, netdev In message <Pine.GSO.4.30.0207290648020.12604-100000@shell.cyberus.ca> you writ e: > > Connection tracking: > > Fix perfomance problems with this thing. You may have seen reports of > performance degradation it introduces. I was hoping to take a look at some > point time hasnt been visiting this side. There are several simple things to do here. One is to improve the hashing (fine for internet traffic, but frequently sucks under LAN conditions), which is easy. The other is to modify the one-timer-per-connection approach to a "sweep once a second, or when full" approach. Both these are simple patches, but I want to see benchmarks showing that they improve things. > > iptables: > > o Change over to a netlink interface > > o Back to add/delete/replace interface + commit. > > o Rewrite libiptc to use netlink (to port iptables). > > I hope this resolves the current scheme where the whole > add/delete/replace interface + commit happens in user space? > If you use netlink it would make sense to do incremental updates to the > kernel. Yes, that's exactly the plan. It'd be more like the old-style insert/delete (probably not replace), except with a "commit" interface, implemented by copying the rules when they start modifying. Hope that helps, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: TODO list before feature freeze 2002-07-29 22:14 ` Rusty Russell @ 2002-07-30 12:04 ` jamal 0 siblings, 0 replies; 29+ messages in thread From: jamal @ 2002-07-30 12:04 UTC (permalink / raw) To: Rusty Russell; +Cc: netfilter-devel, netdev On Tue, 30 Jul 2002, Rusty Russell wrote: > In message <Pine.GSO.4.30.0207290648020.12604-100000@shell.cyberus.ca> you writ > e: > > > Connection tracking: > > > > Fix perfomance problems with this thing. You may have seen reports of > > performance degradation it introduces. I was hoping to take a look at some > > point time hasnt been visiting this side. > > There are several simple things to do here. One is to improve the > hashing (fine for internet traffic, but frequently sucks under LAN > conditions), which is easy. The other is to modify the > one-timer-per-connection approach to a "sweep once a second, or when > full" approach. > Thats the right direction. From code inspection, fixing the later problem would give you a lot more punch. > Both these are simple patches, but I want to see benchmarks showing > that they improve things. > Indeed. > Yes, that's exactly the plan. It'd be more like the old-style > insert/delete (probably not replace), except with a "commit" > interface, implemented by copying the rules when they start modifying. > Why not take a look at the way tc does things and emulate that? cheers, jamal ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2002-07-30 15:54 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-07-18 9:34 TODO list before feature freeze Rusty Russell 2002-07-19 7:39 ` Balazs Scheidler 2002-07-19 17:43 ` Michael Richardson 2002-07-29 10:57 ` jamal 2002-07-29 11:12 ` Andi Kleen 2002-07-29 11:23 ` jamal 2002-07-29 11:56 ` Andi Kleen 2002-07-29 15:40 ` Martin Josefsson 2002-07-29 16:15 ` Patrick Schaaf 2002-07-29 17:12 ` Martin Josefsson 2002-07-29 17:35 ` Nivedita Singhvi 2002-07-29 22:43 ` Martin Josefsson 2002-07-29 16:26 ` Patrick Schaaf 2002-07-29 16:31 ` Andi Kleen 2002-07-29 16:42 ` Patrick Schaaf 2002-07-29 16:45 ` Patrick Schaaf 2002-07-30 11:58 ` jamal 2002-07-30 12:27 ` Patrick Schaaf 2002-07-30 12:29 ` jamal 2002-07-30 13:06 ` Patrick Schaaf 2002-07-30 13:42 ` jamal 2002-07-30 13:08 ` Martin Josefsson 2002-07-30 15:54 ` Filip Sneppe (Cronos) 2002-07-29 15:25 ` Michael Richardson 2002-07-29 15:52 ` Patrick Schaaf 2002-07-29 20:51 ` Andi Kleen 2002-07-30 7:26 ` Patrick Schaaf 2002-07-29 22:14 ` Rusty Russell 2002-07-30 12:04 ` jamal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).