* PATCH: extra conntrack stats
@ 2003-04-19 21:42 Rolf Fokkens
2003-04-25 8:23 ` Patrick Schaaf
` (2 more replies)
0 siblings, 3 replies; 42+ messages in thread
From: Rolf Fokkens @ 2003-04-19 21:42 UTC (permalink / raw)
To: netfilter-devel
Hi!
This patch adds a little statistical information to the connection tracking:
the number of packets and bytes that have gone throught each connection.
We've been using the connection tracking info so far to count concurrently
connected users on an Oracle environment. Now our customer would like us to
measure the size of print jobs and the nunmber of print jobs - this patch may
be an option for us. Of course one can also measure other kinds of traffic.
It might even be possible now to limit the amount of data that has gone
throught a connection by implementing other targets.
Well, it's a small patch that may not do any harm and please some people.
Cheers,
Rolf
diff -ruN linux-2.4.20.orig/include/linux/netfilter_ipv4/ip_conntrack_tuple.h linux-2.4.20/include/linux/netfilter_ipv4/ip_conntrack_tuple.h
--- linux-2.4.20.orig/include/linux/netfilter_ipv4/ip_conntrack_tuple.h Mon Feb 25 20:38:13 2002
+++ linux-2.4.20/include/linux/netfilter_ipv4/ip_conntrack_tuple.h Fri Apr 18 20:26:48 2003
@@ -60,6 +60,8 @@
/* The protocol. */
u_int16_t protonum;
} dst;
+
+ unsigned long npackets, nbytes;
};
enum ip_conntrack_dir
diff -ruN linux-2.4.20.orig/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.20/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.4.20.orig/net/ipv4/netfilter/ip_conntrack_core.c Fri Nov 29 00:53:15 2002
+++ linux-2.4.20/net/ipv4/netfilter/ip_conntrack_core.c Sat Apr 19 15:13:34 2003
@@ -140,6 +140,8 @@
else if (iph->ihl * 4 + 8 > len)
return 0;
+ tuple->npackets = 0;
+ tuple->nbytes = 0;
tuple->src.ip = iph->saddr;
tuple->dst.ip = iph->daddr;
tuple->dst.protonum = iph->protocol;
@@ -158,6 +160,8 @@
inverse->src.ip = orig->dst.ip;
inverse->dst.ip = orig->src.ip;
inverse->dst.protonum = orig->dst.protonum;
+ inverse->npackets = 0;
+ inverse->nbytes = 0;
return protocol->invert_tuple(inverse, orig);
}
@@ -760,7 +764,10 @@
if (IS_ERR(h))
return (void *)h;
}
-
+ WRITE_LOCK(&ip_conntrack_lock);
+ h->tuple.npackets ++;
+ h->tuple.nbytes += skb->len;
+ WRITE_UNLOCK(&ip_conntrack_lock);
/* It exists; we have (non-exclusive) reference. */
if (DIRECTION(h) == IP_CT_DIR_REPLY) {
*ctinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY;
diff -ruN linux-2.4.20.orig/net/ipv4/netfilter/ip_conntrack_standalone.c linux-2.4.20/net/ipv4/netfilter/ip_conntrack_standalone.c
--- linux-2.4.20.orig/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Nov 29 00:53:15 2002
+++ linux-2.4.20/net/ipv4/netfilter/ip_conntrack_standalone.c Sat Apr 19 11:51:26 2003
@@ -48,8 +48,11 @@
{
int len;
- len = sprintf(buffer, "src=%u.%u.%u.%u dst=%u.%u.%u.%u ",
- NIPQUAD(tuple->src.ip), NIPQUAD(tuple->dst.ip));
+ len = sprintf(buffer, "src=%u.%u.%u.%u dst=%u.%u.%u.%u "
+ "packets=%lu bytes=%lu ",
+ NIPQUAD(tuple->src.ip), NIPQUAD(tuple->dst.ip),
+ tuple->npackets, tuple->nbytes
+ );
len += proto->print_tuple(buffer + len, tuple);
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: PATCH: extra conntrack stats 2003-04-19 21:42 PATCH: extra conntrack stats Rolf Fokkens @ 2003-04-25 8:23 ` Patrick Schaaf 2003-04-27 12:48 ` Harald Welte 2003-04-28 9:13 ` vecna 2 siblings, 0 replies; 42+ messages in thread From: Patrick Schaaf @ 2003-04-25 8:23 UTC (permalink / raw) To: Rolf Fokkens; +Cc: netfilter-devel Hi Rolf, first of all: thanks for doing that work! > This patch adds a little statistical information to the connection tracking: > the number of packets and bytes that have gone throught each connection. > > We've been using the connection tracking info so far to count concurrently > connected users on an Oracle environment. Now our customer would like us to > measure the size of print jobs and the nunmber of print jobs - this patch may > be an option for us. Of course one can also measure other kinds of traffic. > > It might even be possible now to limit the amount of data that has gone > throught a connection by implementing other targets. It could be useful if you add a suitable minimal match, for demonstration. > Well, it's a small patch that may not do any harm and please some people. I don't think the normal processing takes a WRITE_LOCK(&ip_conntrack_lock) for each and every packet. Thus, your patch will have a performance impact under high packet load. Another pitfall is the 'unsigned long' nbytes: I'm sure I'm not the only one who has TCP connections transporting more than 4GB, so that counter will overflow in practise. It would be better to make it 64 bit from the start. 'unsigned long long' and '%Lu' work in the kernel, as far as I know. That aside: nice work, nice minimal addition. I urge the core team to evaluate this (and earlier) approaches. The general idea to have per-conntrack usage information, is very powerful. Teaser: "log me the first five packets of each HTTP connection". best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-19 21:42 PATCH: extra conntrack stats Rolf Fokkens 2003-04-25 8:23 ` Patrick Schaaf @ 2003-04-27 12:48 ` Harald Welte 2003-04-27 15:23 ` Rolf Fokkens 2003-04-28 9:13 ` vecna 2 siblings, 1 reply; 42+ messages in thread From: Harald Welte @ 2003-04-27 12:48 UTC (permalink / raw) To: Rolf Fokkens; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1011 bytes --] On Sat, Apr 19, 2003 at 11:42:01PM +0200, Rolf Fokkens wrote: > Hi! > > This patch adds a little statistical information to the connection tracking: > the number of packets and bytes that have gone throught each connection. > > Well, it's a small patch that may not do any harm and please some people. It will. You are adding a counter that is written to for every packet. This causes tremendous cache line ping-pong on SMP systems. This is inacceptable, the only way to implement this is via per-cpu counters that are added at the time you read them out (like the iptables per-cpu local counters) > Cheers, > Rolf -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-27 12:48 ` Harald Welte @ 2003-04-27 15:23 ` Rolf Fokkens 2003-04-27 20:42 ` Harald Welte 2003-04-29 22:15 ` Jozsef Kadlecsik 0 siblings, 2 replies; 42+ messages in thread From: Rolf Fokkens @ 2003-04-27 15:23 UTC (permalink / raw) To: Harald Welte; +Cc: netfilter-devel On Sunday 27 April 2003 02:48 pm, Harald Welte wrote: > This is inacceptable, the only way to implement this is via per-cpu > counters that are added at the time you read them out (like the iptables > per-cpu local counters) Oh boy! It's even worse! Even without this patch we have a concurrency problem: the timout! It's updated frequently I guess, and it's also protected by the same lock: ip_conntrack_lock! Well, after all this panic and calming down something constructive: Should a lock per contrack entry be considered? This increases the concurrency for different connections anyhow. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-27 15:23 ` Rolf Fokkens @ 2003-04-27 20:42 ` Harald Welte 2003-04-28 6:13 ` Patrick Schaaf 2003-04-29 22:15 ` Jozsef Kadlecsik 1 sibling, 1 reply; 42+ messages in thread From: Harald Welte @ 2003-04-27 20:42 UTC (permalink / raw) To: Rolf Fokkens; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1329 bytes --] On Sun, Apr 27, 2003 at 05:23:30PM +0200, Rolf Fokkens wrote: > On Sunday 27 April 2003 02:48 pm, Harald Welte wrote: > > This is inacceptable, the only way to implement this is via per-cpu > > counters that are added at the time you read them out (like the iptables > > per-cpu local counters) > > Oh boy! It's even worse! Even without this patch we have a concurrency > problem: the timout! It's updated frequently I guess, and it's also protected > by the same lock: ip_conntrack_lock! Oh yes, sorry. This issue never was fixed, although I seem to remember some patches implementing a solution where the timeout was not updated for every packet. > Should a lock per contrack entry be considered? This increases the > concurrency for different connections anyhow. Rusty has written some patches about this quite some time ago, unfortunately they seem to be discontinued. IIRC Martin Josefsson wanted to look into them again. -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-27 20:42 ` Harald Welte @ 2003-04-28 6:13 ` Patrick Schaaf 0 siblings, 0 replies; 42+ messages in thread From: Patrick Schaaf @ 2003-04-28 6:13 UTC (permalink / raw) To: Harald Welte, Rolf Fokkens, netfilter-devel On Sun, Apr 27, 2003 at 10:42:18PM +0200, Harald Welte wrote: > On Sun, Apr 27, 2003 at 05:23:30PM +0200, Rolf Fokkens wrote: > > On Sunday 27 April 2003 02:48 pm, Harald Welte wrote: > > > This is inacceptable, the only way to implement this is via per-cpu > > > counters that are added at the time you read them out (like the iptables > > > per-cpu local counters) > > > > Oh boy! It's even worse! Even without this patch we have a concurrency > > problem: the timout! It's updated frequently I guess, and it's also > > protected by the same lock: ip_conntrack_lock! Let's look at things non-polemically, please. The first add_timer() you allude to, happens in __ip_conntrack_confirm(). Indeed, we take a WRITE_LOCK there. But, we do this not to protect the add_timer(), but to protect the conntrack list insertion happening there. Of course, we need a WRITE_LOCK for list insertion [*]. But, note that this only happens for unconfirmed conntracks, i.e. about once per individual connection, for the first packet of the connection. Otherwise, __ip_conntrack_confirm() returns _before_ doing the WRITE_LOCK. The more general issue with timers, is the regular update on further packets. This happens in ip_ct_refresh(). Indeed, it happens under conntrack WRITE_LOCK. It's probably this place that you remembered. IF a new WRITE_LOCK reason appears, e.g. for the counter update discussed here, I would strongly advise to do it in ip_ct_refresh(), i.e. use the already taken WRITE_LOCK, don't take it twice. If you take it twice in a row, chances are you just doubled the ping pong frequncy under load, i.e. halved the packets-per-second worst case. > > Should a lock per contrack entry be considered? This increases the > > concurrency for different connections anyhow. That would help in that place. However, we'd still do a del_timer/add_timer, and the kernel timer implementation will again do some locking. Harald remembered: > Oh yes, sorry. This issue never was fixed, although I seem to remember > some patches implementing a solution where the timeout was not updated > for every packet. Correct. I implemented (and lightly tested) such a solution last year, here's the pointer to the archives: http://lists.netfilter.org/pipermail/netfilter-devel/2002-August/008859.html Note that it does NOT get rid of the WRITE_LOCK, because this solution needs to store the new timeout target in conntrack, after calculating it - which would be racy without the lock. So, Rolf's suggestion of a per-conntrack lock would indeed help here to further reduce pingpong. As far as I remember, ip_ct_refresh() was the only WRITE_LOCK we take per-packet, at least at the conntrack layer. To recapitulate, for the packet/byte counter patch, one should either use per-cpu counters, or put the counter update into ip_ct_refresh(), somehow. best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-27 15:23 ` Rolf Fokkens 2003-04-27 20:42 ` Harald Welte @ 2003-04-29 22:15 ` Jozsef Kadlecsik 2003-04-29 22:38 ` Martin Josefsson 1 sibling, 1 reply; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-04-29 22:15 UTC (permalink / raw) To: Rolf Fokkens; +Cc: Harald Welte, netfilter-devel On Sun, 27 Apr 2003, Rolf Fokkens wrote: > Well, after all this panic and calming down something constructive: > > Should a lock per contrack entry be considered? This increases the > concurrency for different connections anyhow. Yep, I'm working on such a patch. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-29 22:15 ` Jozsef Kadlecsik @ 2003-04-29 22:38 ` Martin Josefsson 2003-04-30 10:49 ` Jozsef Kadlecsik 0 siblings, 1 reply; 42+ messages in thread From: Martin Josefsson @ 2003-04-29 22:38 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Rolf Fokkens, Harald Welte, Netfilter-devel On Wed, 2003-04-30 at 00:15, Jozsef Kadlecsik wrote: > On Sun, 27 Apr 2003, Rolf Fokkens wrote: > > > Well, after all this panic and calming down something constructive: > > > > Should a lock per contrack entry be considered? This increases the > > concurrency for different connections anyhow. > > Yep, I'm working on such a patch. Care to describe how you are going to build it? -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-29 22:38 ` Martin Josefsson @ 2003-04-30 10:49 ` Jozsef Kadlecsik 2003-04-30 11:19 ` Martin Josefsson 2003-05-01 0:06 ` Rusty Russell 0 siblings, 2 replies; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-04-30 10:49 UTC (permalink / raw) To: Martin Josefsson Cc: Rolf Fokkens, Harald Welte, Rusty Russell, Netfilter-devel On 30 Apr 2003, Martin Josefsson wrote: > On Wed, 2003-04-30 at 00:15, Jozsef Kadlecsik wrote: > > On Sun, 27 Apr 2003, Rolf Fokkens wrote: > > > > > Well, after all this panic and calming down something constructive: > > > > > > Should a lock per contrack entry be considered? This increases the > > > concurrency for different connections anyhow. > > > > Yep, I'm working on such a patch. > > Care to describe how you are going to build it? As usual, it's Rusty's locking patch what triggered me to think over the locking issues. In that patch Rusty actually eliminates ip_conntrack lock: - functionality structures (helpers etc.) are protected by the netproto lock instead - expectations (lists and data) are protected by the ip_conntrack_expect lock - and per-chain locking introduced in the hash table, protecting the conntrack lists and entries I think that eliminating ip_conntrack lock might uncover a (potential) bottleneck: the tcp_lock, the lock in conntrack for the predominant protocol of the Internet. By adding a data lock to the conntrack entries can solve that (potential) problem. The data lock would make unnecessary the per conntrack helper locks too (unfortunately the buffers introduced in the zerocopy patch undo that). Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-30 10:49 ` Jozsef Kadlecsik @ 2003-04-30 11:19 ` Martin Josefsson 2003-04-30 23:05 ` Martin Josefsson 2003-05-01 0:06 ` Rusty Russell 1 sibling, 1 reply; 42+ messages in thread From: Martin Josefsson @ 2003-04-30 11:19 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Rolf Fokkens, Harald Welte, Rusty Russell, Netfilter-devel On Wed, 2003-04-30 at 12:49, Jozsef Kadlecsik wrote: > > > > Should a lock per contrack entry be considered? This increases the > > > > concurrency for different connections anyhow. > > > > > > Yep, I'm working on such a patch. > > > > Care to describe how you are going to build it? > > As usual, it's Rusty's locking patch what triggered me to think over the > locking issues. In that patch Rusty actually eliminates ip_conntrack > lock: > > - functionality structures (helpers etc.) are protected by the netproto > lock instead > - expectations (lists and data) are protected by the ip_conntrack_expect > lock > - and per-chain locking introduced in the hash table, protecting the > conntrack lists and entries I was a little scared that you were going to try implementing per conntrack locks :) > I think that eliminating ip_conntrack lock might uncover a (potential) > bottleneck: the tcp_lock, the lock in conntrack for the predominant > protocol of the Internet. Yes that will be a problem. > By adding a data lock to the conntrack entries can solve that > (potential) problem. The data lock would make unnecessary the per > conntrack helper locks too (unfortunately the buffers introduced in the > zerocopy patch undo that). A per conntrack lock is probably the best for solving that bottleneck. The conntrack helpers could linearize the packets (NAT code does that for helpers anyway?) or copy them using skb_copy()? The linearize will impact performance a little even when not using NAT. We should try to go RCU later. I'll write up a small list of things I think we should fix in conntrack. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-30 11:19 ` Martin Josefsson @ 2003-04-30 23:05 ` Martin Josefsson 2003-05-01 4:05 ` Rusty Russell 2003-05-01 11:26 ` Harald Welte 0 siblings, 2 replies; 42+ messages in thread From: Martin Josefsson @ 2003-04-30 23:05 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Rolf Fokkens, Harald Welte, Rusty Russell, Netfilter-devel On Wed, 2003-04-30 at 13:19, Martin Josefsson wrote: > > I think that eliminating ip_conntrack lock might uncover a (potential) > > bottleneck: the tcp_lock, the lock in conntrack for the predominant > > protocol of the Internet. > > Yes that will be a problem. > > > By adding a data lock to the conntrack entries can solve that > > (potential) problem. The data lock would make unnecessary the per > > conntrack helper locks too (unfortunately the buffers introduced in the > > zerocopy patch undo that). > > A per conntrack lock is probably the best for solving that bottleneck. If we have per bucket spinlocks we don't have to have a tcp_lock if I rememberthe code correctly. If we go RCU we might need the per conntrack lock. > I'll write up a small list of things I think we should fix in conntrack. - Switch to using hlists for the hashtable - Rearrange struct ip_conntrack to be more cachefriendly if possible - Add prefetching in list-searching - Turn protocol_list into an array - Switch to a better hashfunction - Remove pointless timer-updating - Rework locking to be finer grained, start with per bucket spinlocks (goal: RCU?) - Remove tcp_lock if using per bucket spinlocks, otherwise move it into the entries - Remove pointless rehashing of tuples - Rework overload support (early_drop) - Avoid as many memory-writes as possible, no need to dirty cachelines if we don't have to - Eliminate listhelp.h and lockhelp.h by request from hch - Try to shrink struct ip_conntrack -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-30 23:05 ` Martin Josefsson @ 2003-05-01 4:05 ` Rusty Russell 2003-05-01 6:05 ` Patrick Schaaf ` (2 more replies) 2003-05-01 11:26 ` Harald Welte 1 sibling, 3 replies; 42+ messages in thread From: Rusty Russell @ 2003-05-01 4:05 UTC (permalink / raw) To: Martin Josefsson Cc: Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf In message <1051743903.8213.188.camel@tux.rsn.bth.se> you write: > On Wed, 2003-04-30 at 13:19, Martin Josefsson wrote: > - Switch to using hlists for the hashtable OK. Please implement hlist_for_each_entry(), though, a-la list_for_each_entry. > - Rearrange struct ip_conntrack to be more cachefriendly if possible Leave 'til last, I think, since that structure will be changing anyway. Most important is that next ptr and tuple are at the start (ie. only one cacheline per hash chain entry). > - Add prefetching in list-searching Can be done, but it's a micro-optimization and you'd want to check carefully that recent gcc's don't do this anyway. > - Turn protocol_list into an array Or hardcode the three builtins and use the list for others. > - Switch to a better hashfunction Yes! See comments below on secret hashing... > - Remove pointless timer-updating We could revisit this altogether: go for one timer which sweeps the hash chains and an "expiry" date on each one. You end up with some icky deletion issues, but... > - Rework locking to be finer grained, start with per bucket spinlocks > (goal: RCU?) Well, might as well go straight to RCU for the infrastructure locking. It's a little tricky. RCU can be used for the read side, but since writes are not uncommon (I always guesstimated 1 in 10), we still probably want per-chain locks. Hmm, actually, since that bloats the hash, let's start with one lock and see how it goes. The protection of the conntrack objects themselves should become a lock per conntrack I think. We currently use the timer lock as a form of synchronization: if we have a conntrack.lock we should use that. > - Remove tcp_lock if using per bucket spinlocks, otherwise move it into > the entries Agreed: use conntrack.lock. > - Remove pointless rehashing of tuples Hmmm, does this happen? > - Rework overload support (early_drop) OK, you've probably seen my "hash with secret key" scheme. Unfortunately, it relies on being able to grab the network brlock to stop all activity while it redoes the hash. But Dave is removing the brlock, so this becomes more tricky (ie. readers have to be aware that there could be two hash tables, ick). It might still be worth it though (this same trick would allow us to resize the hash through /proc). Needs more thought. > - Avoid as many memory-writes as possible, no need to dirty cachelines if > we don't have to Well, yes, but it's usually secondary after correctness. > - Eliminate listhelp.h and lockhelp.h by request from hch Yeah, list.h is more sophisticated now, and we have lock debugging > - Try to shrink struct ip_conntrack Ignoring NAT for the moment, and using a 32-bit arch: struct nf_conntrack ct_general; 8 bytes: hard to shrink. struct ip_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX]; 2 x 28 bytes: we can get rid of one. unsigned long status; 8 bytes. Needs to be ulong for bitops. struct timer_list timeout; 48 bytes: could be one ulong (time for expiry). struct list_head sibling_list; 8 bytes. Hard to remove without getting tricky... unsigned int expecting; 4 bytes. Maybe could merge this with upper bits of status, maybe... struct ip_conntrack_expect *master; 4 bytes. Needed. struct ip_conntrack_helper *helper; 4 bytes. Needed. struct nf_ct_info infos[IP_CT_NUMBER]; 7 * 4 bytes. We could eliminate this by adding an skb field to hold the state. union ip_conntrack_proto proto; 8 bytes, will get bigger with tcp tracking. union ip_conntrack_help help; 16 bytes, could use flags in status for ftp's seq_aft_nl_set and cut to 8 bytes. So, we're about 192 bytes. We could get to 80 bytes. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 4:05 ` Rusty Russell @ 2003-05-01 6:05 ` Patrick Schaaf 2003-05-01 6:46 ` Rusty Russell 2003-05-01 9:58 ` Martin Josefsson 2003-05-01 11:32 ` Harald Welte 2 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-05-01 6:05 UTC (permalink / raw) To: Rusty Russell Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf > struct nf_ct_info infos[IP_CT_NUMBER]; > 7 * 4 bytes. We could eliminate this by adding an skb field to hold > the state. Given a guaranteed cacheline alignment of our 'struct ip_conntrack', we could make the current pointer on the skbuffs a tagged pointer, i.e. stuff our 3 info bits into the low 3 bits, without consuming more room in the skbuffs. The skb_nf_forget() cleanup patch I made last year, was a preparation for that. With that patch applied, the ugliness of such a tagged pointer is not visible in the base network stack (skb_nf_forget() makes the base stack callers oblivious of what exactly is stored on an skbuff wrt conntracking). See http://lists.netfilter.org/pipermail/netfilter-devel/2002-July/008703.html best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 6:05 ` Patrick Schaaf @ 2003-05-01 6:46 ` Rusty Russell 2003-05-01 7:04 ` Patrick Schaaf 0 siblings, 1 reply; 42+ messages in thread From: Rusty Russell @ 2003-05-01 6:46 UTC (permalink / raw) To: Patrick Schaaf Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel In message <20030501060522.GB15143@oknodo.bof.de> you write: > > struct nf_ct_info infos[IP_CT_NUMBER]; > > 7 * 4 bytes. We could eliminate this by adding an skb field to hold > > the state. > > Given a guaranteed cacheline alignment of our 'struct ip_conntrack', > we could make the current pointer on the skbuffs a tagged pointer, > i.e. stuff our 3 info bits into the low 3 bits, without consuming > more room in the skbuffs. That was my original implementation. It was fairly ugly, but worse, I can imagine an arch where it would break. None would at the moment, though, since kmalloc cacheline aligns (although we'd have to be a little careful to explicitly align a conntrack if we were to use a dummy one for a "notrack" target. > See http://lists.netfilter.org/pipermail/netfilter-devel/2002-July/008703.html That looks like a good patch anyway... Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 6:46 ` Rusty Russell @ 2003-05-01 7:04 ` Patrick Schaaf 2003-05-01 7:38 ` Rusty Russell 0 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-05-01 7:04 UTC (permalink / raw) To: Rusty Russell Cc: Patrick Schaaf, Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel On Thu, May 01, 2003 at 04:46:44PM +1000, Rusty Russell wrote: > In message <20030501060522.GB15143@oknodo.bof.de> you write: > > > struct nf_ct_info infos[IP_CT_NUMBER]; > > > 7 * 4 bytes. We could eliminate this by adding an skb field to hold > > > the state. > > > > Given a guaranteed cacheline alignment of our 'struct ip_conntrack', > > we could make the current pointer on the skbuffs a tagged pointer, > > i.e. stuff our 3 info bits into the low 3 bits, without consuming > > more room in the skbuffs. > > That was my original implementation. It was fairly ugly, but worse, I > can imagine an arch where it would break. None would at the moment, > though, since kmalloc cacheline aligns This boils down to the question whether SLAB_HWCACHE_ALIGN is a guarantee, or a hint. I would have expected it to be a guarantee. Argh. Are you worried about architectures with 8 byte or less cache lines? Well, use another field in the skbuff, then. I would recycle nfcache, but that's just because nobody ever explained to me what it would be used for... :) > (although we'd have to be a > little careful to explicitly align a conntrack if we were to use a > dummy one for a "notrack" target. Unless you are after static initializability, what keeps us from allocating such dummies in the same way as we allocate the rest? best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 7:04 ` Patrick Schaaf @ 2003-05-01 7:38 ` Rusty Russell 0 siblings, 0 replies; 42+ messages in thread From: Rusty Russell @ 2003-05-01 7:38 UTC (permalink / raw) To: Patrick Schaaf Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel In message <20030501070408.GD15143@oknodo.bof.de> you write: > Argh. Are you worried about architectures with 8 byte or less cache lines? Yes, an embedded chip might value space above all else. But noone seems to, so it's probably OK. > Well, use another field in the skbuff, then. I would recycle nfcache, > but that's just because nobody ever explained to me what it would be > used for... :) It was so you can optimize fastroute: you can cache decisions made by netfilter hooks. But it was never actually used. It could be dropped in 2.7. > > (although we'd have to be a > > little careful to explicitly align a conntrack if we were to use a > > dummy one for a "notrack" target. > > Unless you are after static initializability, what keeps us from > allocating such dummies in the same way as we allocate the rest? As I said, you'd have to be careful. The logical way would be to simply declare a static variable, which will still work as long as you align it. Cheers! Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 4:05 ` Rusty Russell 2003-05-01 6:05 ` Patrick Schaaf @ 2003-05-01 9:58 ` Martin Josefsson 2003-05-01 11:32 ` Harald Welte 2 siblings, 0 replies; 42+ messages in thread From: Martin Josefsson @ 2003-05-01 9:58 UTC (permalink / raw) To: Rusty Russell Cc: Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf On Thu, 2003-05-01 at 06:05, Rusty Russell wrote: > In message <1051743903.8213.188.camel@tux.rsn.bth.se> you write: > > On Wed, 2003-04-30 at 13:19, Martin Josefsson wrote: > > - Switch to using hlists for the hashtable > > OK. Please implement hlist_for_each_entry(), though, a-la > list_for_each_entry. That's the plan. > > - Rearrange struct ip_conntrack to be more cachefriendly if possible > > Leave 'til last, I think, since that structure will be changing > anyway. Most important is that next ptr and tuple are at the start > (ie. only one cacheline per hash chain entry). I'll try to do some measurements of this so we know how much we are talking about. > > - Turn protocol_list into an array > > Or hardcode the three builtins and use the list for others. It would probably be faster with just the array, especially for the not builtin case. The upside of the hardcoding is that it's contained in the instructioncache. > > - Switch to a better hashfunction > > Yes! See comments below on secret hashing... Jozsef continued the work on this and has come to some conclusions I think. > > - Remove pointless timer-updating > > We could revisit this altogether: go for one timer which sweeps the > hash chains and an "expiry" date on each one. You end up with some > icky deletion issues, but... Patrick made a patch sometime ago that still had the per entry timer but only updated it with a new timeout once it expired. The timer-handling code in kernel seems to have rather low overhead even with lots of timers, it's just that it disables interrupts twice for each packet, and the fact that the updates are pointless (often updated to the exact same jiffy as it had before). > > - Rework locking to be finer grained, start with per bucket spinlocks > > (goal: RCU?) > > Well, might as well go straight to RCU for the infrastructure > locking. > > It's a little tricky. RCU can be used for the read side, but since > writes are not uncommon (I always guesstimated 1 in 10), we still > probably want per-chain locks. Hmm, actually, since that bloats the > hash, let's start with one lock and see how it goes. Noo, please don't, with one lock for inserts/deletes it could still be awful during a fairly large DDoS on an SMP machine. If per-chain locks are too much bloat (many many locks which large hashtables) I'd say sectored locks like Patrick suggested, but not one global. > The protection of the conntrack objects themselves should become a > lock per conntrack I think. We currently use the timer lock as a form > of synchronization: if we have a conntrack.lock we should use that. > > > - Remove tcp_lock if using per bucket spinlocks, otherwise move it into > > the entries > > Agreed: use conntrack.lock. We only read conntrack->proto.tcp twice and update it twice, both things can be grouped together. I currently know too little about RCU, don't know if it would be possible to skip the locking here. > > - Remove pointless rehashing of tuples > > Hmmm, does this happen? Oh yes, currently we actually rehash the tuple for each entry searched in the lists. There's a patch in optimization/ in p-o-m that fixes that. But I think we have more rehashing going on and iirc Patrick made a patch for that as well. > > - Rework overload support (early_drop) > > OK, you've probably seen my "hash with secret key" scheme. > Unfortunately, it relies on being able to grab the network brlock to > stop all activity while it redoes the hash. But Dave is removing the > brlock, so this becomes more tricky (ie. readers have to be aware that > there could be two hash tables, ick). > > It might still be worth it though (this same trick would allow us to > resize the hash through /proc). Needs more thought. The only time rehashing is a positive thing is when you have one or a few attacked chain or you manually want to resize it. Otherwise it will probably just slow things down when the rehash happens. I like to just give the hashtable a proper size from the beginning. So we need a way to tell if someone is attacking a chain or if it's just overload due to poor sizing. > > - Avoid as many memory-writes as possible, no need to dirty cachelines if > > we don't have to > > Well, yes, but it's usually secondary after correctness. Of course. > > - Try to shrink struct ip_conntrack > > Ignoring NAT for the moment, and using a 32-bit arch: > > struct nf_conntrack ct_general; > 8 bytes: hard to shrink. > > struct ip_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX]; > 2 x 28 bytes: we can get rid of one. If we try to change the hashing yes. > unsigned long status; > 8 bytes. Needs to be ulong for bitops. 4 bytes > So, we're about 192 bytes. We could get to 80 bytes. That'd be nice. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 4:05 ` Rusty Russell 2003-05-01 6:05 ` Patrick Schaaf 2003-05-01 9:58 ` Martin Josefsson @ 2003-05-01 11:32 ` Harald Welte 2 siblings, 0 replies; 42+ messages in thread From: Harald Welte @ 2003-05-01 11:32 UTC (permalink / raw) To: Rusty Russell Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Netfilter-devel, Patrick Schaaf [-- Attachment #1: Type: text/plain, Size: 712 bytes --] On Thu, May 01, 2003 at 02:05:52PM +1000, Rusty Russell wrote: > Ignoring NAT for the moment, and using a 32-bit arch: > So, we're about 192 bytes. We could get to 80 bytes. JFYI: including PPTP nat (which expands struct ip_conntrack_tuple), sizeof(struct ip_conntrack) is about 340 bytes on a 32bit arch. > Cheers, > Rusty. -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-30 23:05 ` Martin Josefsson 2003-05-01 4:05 ` Rusty Russell @ 2003-05-01 11:26 ` Harald Welte 2003-05-02 12:18 ` Jozsef Kadlecsik 1 sibling, 1 reply; 42+ messages in thread From: Harald Welte @ 2003-05-01 11:26 UTC (permalink / raw) To: Martin Josefsson Cc: Jozsef Kadlecsik, Rolf Fokkens, Rusty Russell, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1580 bytes --] On Thu, May 01, 2003 at 01:05:03AM +0200, Martin Josefsson wrote: > > I'll write up a small list of things I think we should fix in conntrack. > > - Switch to using hlists for the hashtable > - Rearrange struct ip_conntrack to be more cachefriendly if possible > - Add prefetching in list-searching > - Turn protocol_list into an array > - Switch to a better hashfunction > - Remove pointless timer-updating > - Rework locking to be finer grained, start with per bucket spinlocks > (goal: RCU?) > - Remove tcp_lock if using per bucket spinlocks, otherwise move it into > the entries > - Remove pointless rehashing of tuples > - Rework overload support (early_drop) > - Avoid as many memory-writes as possible, no need to dirty cachelines if > we don't have to > - Eliminate listhelp.h and lockhelp.h by request from hch > - Try to shrink struct ip_conntrack this list is too long for a change that is going to happen during 2.4.x soon. I would really like to see a simple patch changing the hash function before. This patch can be pushed to davem after a relatively short time period of testing, since it shouldn't touch too much of the conntrack core. > /Martin -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 11:26 ` Harald Welte @ 2003-05-02 12:18 ` Jozsef Kadlecsik 2003-05-02 12:30 ` Martin Josefsson 0 siblings, 1 reply; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-05-02 12:18 UTC (permalink / raw) To: Harald Welte Cc: Martin Josefsson, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On Thu, 1 May 2003, Harald Welte wrote: > I would really like to see a simple patch changing the hash function > before. This patch can be pushed to davem after a relatively short time > period of testing, since it shouldn't touch too much of the conntrack > core. Still sitting and chewing the problem which hash function to choose, decided yet undecided... I'm working on an updated raw patch (+ splitted up to internal logging + raw patches) and an updated and further tuned tcp window tracking patch (+ splitted up to proc + tcp window tracking patches). It'll take probably one more week and then I'll submit a patch on a better hash function for conntrack. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 12:18 ` Jozsef Kadlecsik @ 2003-05-02 12:30 ` Martin Josefsson 2003-05-02 21:51 ` Jozsef Kadlecsik 0 siblings, 1 reply; 42+ messages in thread From: Martin Josefsson @ 2003-05-02 12:30 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On Fri, 2003-05-02 at 14:18, Jozsef Kadlecsik wrote: > I'm working on an updated raw patch (+ splitted up to internal logging + > raw patches) and an updated and further tuned tcp window tracking patch (+ > splitted up to proc + tcp window tracking patches). It'll take probably > one more week and then I'll submit a patch on a better hash function for > conntrack. While on the subject of the raw patch, please don't forget to add the fix for the NFC_ clash I added to cvs a little while ago. The userspace/raw.patch depends on pending/27_include-nfc-order.patch Can I get a feature-request? :) -j NOTRACK --only-search that would be nice, search the hashtables but don't allocate new entries if not found and if not found mark as UNTRACKED. It will do the same as dropping --state NEW later but avoids the alloc and init. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 12:30 ` Martin Josefsson @ 2003-05-02 21:51 ` Jozsef Kadlecsik 2003-05-02 21:58 ` Martin Josefsson 0 siblings, 1 reply; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-05-02 21:51 UTC (permalink / raw) To: Martin Josefsson Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On 2 May 2003, Martin Josefsson wrote: > On Fri, 2003-05-02 at 14:18, Jozsef Kadlecsik wrote: > > > I'm working on an updated raw patch (+ splitted up to internal logging + > > raw patches) and an updated and further tuned tcp window tracking patch (+ > > splitted up to proc + tcp window tracking patches). It'll take probably > > one more week and then I'll submit a patch on a better hash function for > > conntrack. > > While on the subject of the raw patch, please don't forget to add the > fix for the NFC_ clash I added to cvs a little while ago. The > userspace/raw.patch depends on pending/27_include-nfc-order.patch I noticed it and the new patch takes into account the clash. > Can I get a feature-request? :) > -j NOTRACK --only-search > > that would be nice, search the hashtables but don't allocate new entries > if not found and if not found mark as UNTRACKED. It will do the same as > dropping --state NEW later but avoids the alloc and init. Maybe it's too late, but I don't understand you: - the raw table has got no notion on conntrack - conntrack skips the packet "marked" by the NOTRACK target of the raw table. No new hash table entries are generated, one dummy entry is used for every untracked packet. - packets "marked" with NOTRACK has got the state UNTRACKED Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 21:51 ` Jozsef Kadlecsik @ 2003-05-02 21:58 ` Martin Josefsson 2003-05-05 9:24 ` Jozsef Kadlecsik 2003-05-05 12:38 ` Jozsef Kadlecsik 0 siblings, 2 replies; 42+ messages in thread From: Martin Josefsson @ 2003-05-02 21:58 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On Fri, 2003-05-02 at 23:51, Jozsef Kadlecsik wrote: > > Can I get a feature-request? :) > > -j NOTRACK --only-search > > > > that would be nice, search the hashtables but don't allocate new entries > > if not found and if not found mark as UNTRACKED. It will do the same as > > dropping --state NEW later but avoids the alloc and init. > > Maybe it's too late, but I don't understand you: > > - the raw table has got no notion on conntrack > - conntrack skips the packet "marked" by the NOTRACK target of the raw > table. No new hash table entries are generated, one dummy entry is used > for every untracked packet. > - packets "marked" with NOTRACK has got the state UNTRACKED --only-search wouldn't skip conntrack completely. It still does the search but if no entry is found we don't allocate a new, instead we bail out and set skb->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW]; so it from now on looks just like a "regular" -j NOTRACK packet. The purpose of this feature is to still allow outbound connections during a DDoS but deny all new inbound with as little overhead as possible. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 21:58 ` Martin Josefsson @ 2003-05-05 9:24 ` Jozsef Kadlecsik 2003-05-05 12:38 ` Jozsef Kadlecsik 1 sibling, 0 replies; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-05-05 9:24 UTC (permalink / raw) To: Martin Josefsson Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On 2 May 2003, Martin Josefsson wrote: > On Fri, 2003-05-02 at 23:51, Jozsef Kadlecsik wrote: > > > > Can I get a feature-request? :) > > > -j NOTRACK --only-search > > --only-search wouldn't skip conntrack completely. It still does the > search but if no entry is found we don't allocate a new, instead we bail > out and set > > skb->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW]; > > so it from now on looks just like a "regular" -j NOTRACK packet. > > The purpose of this feature is to still allow outbound connections > during a DDoS but deny all new inbound with as little overhead as > possible. I see. I'll try to add it without introducing a new NFC flag. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 21:58 ` Martin Josefsson 2003-05-05 9:24 ` Jozsef Kadlecsik @ 2003-05-05 12:38 ` Jozsef Kadlecsik 2003-05-05 13:07 ` Martin Josefsson 1 sibling, 1 reply; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-05-05 12:38 UTC (permalink / raw) To: Martin Josefsson Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On 2 May 2003, Martin Josefsson wrote: > On Fri, 2003-05-02 at 23:51, Jozsef Kadlecsik wrote: > > > > Can I get a feature-request? :) > > > -j NOTRACK --only-search > > --only-search wouldn't skip conntrack completely. It still does the > search but if no entry is found we don't allocate a new, instead we bail > out and set > > skb->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW]; > > so it from now on looks just like a "regular" -j NOTRACK packet. > > The purpose of this feature is to still allow outbound connections > during a DDoS but deny all new inbound with as little overhead as > possible. Oh, there is no need for a flag: that was a bug in the NOTRACK target. It should have not marked packets belonging to existing connections. Thank you for spotting this. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-05 12:38 ` Jozsef Kadlecsik @ 2003-05-05 13:07 ` Martin Josefsson 0 siblings, 0 replies; 42+ messages in thread From: Martin Josefsson @ 2003-05-05 13:07 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Harald Welte, Rolf Fokkens, Rusty Russell, Patrick Schaaf, Netfilter-devel On Mon, 2003-05-05 at 14:38, Jozsef Kadlecsik wrote: > > --only-search wouldn't skip conntrack completely. It still does the > > search but if no entry is found we don't allocate a new, instead we bail > > out and set > > > > skb->nfct = &ip_conntrack_untracked.infos[IP_CT_NEW]; > > > > so it from now on looks just like a "regular" -j NOTRACK packet. > > > > The purpose of this feature is to still allow outbound connections > > during a DDoS but deny all new inbound with as little overhead as > > possible. > > Oh, there is no need for a flag: that was a bug in the NOTRACK target. > It should have not marked packets belonging to existing connections. > > Thank you for spotting this. Hehe ok, I thought that was how you wanted it to work. If the DDoS is _really_ massive you will use -j DROP and not -j NOTRACK anyway so this change in behaviour would be really nice. (I see >140kpps DDoS's sometimes) I don't think there's that many people using NOTRACK so the change probably won't break any existing configurations. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-30 10:49 ` Jozsef Kadlecsik 2003-04-30 11:19 ` Martin Josefsson @ 2003-05-01 0:06 ` Rusty Russell 2003-05-01 5:48 ` Patrick Schaaf ` (2 more replies) 1 sibling, 3 replies; 42+ messages in thread From: Rusty Russell @ 2003-05-01 0:06 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: Rolf Fokkens, Harald Welte, Martin Josefsson, Netfilter-devel, Patrick Schaaf In message <Pine.LNX.4.33.0304301227530.3328-100000@blackhole.kfki.hu> you write: > As usual, it's Rusty's locking patch what triggered me to think over the > locking issues. In that patch Rusty actually eliminates ip_conntrack > lock: Ah, but I think I might have a better trick. Patrick cc'd since he's the hashing man. Can we compromise the hash so that every packet hashes to the same chain as its reply? If we can do this without making our hash really suck, we only need one entry per conntrack (and we then compare both ways). This means a smaller table, and hence better cache utilization, as well as simpler locking. Thoughts? > I think that eliminating ip_conntrack lock might uncover a (potential) > bottleneck: the tcp_lock, the lock in conntrack for the predominant > protocol of the Internet. > > By adding a data lock to the conntrack entries can solve that > (potential) problem. The data lock would make unnecessary the per > conntrack helper locks too (unfortunately the buffers introduced in the > zerocopy patch undo that). Yeah, per-entry locks make sense I think. I can change the helpers to do an skb_copy (suggested by Alexey) which will be lockless, or just implement the damn non-linear search. Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 0:06 ` Rusty Russell @ 2003-05-01 5:48 ` Patrick Schaaf 2003-05-01 10:01 ` Martin Josefsson 2003-05-01 9:06 ` Martin Josefsson 2003-05-05 8:43 ` PATCH: extra conntrack stats Jozsef Kadlecsik 2 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-05-01 5:48 UTC (permalink / raw) To: Rusty Russell Cc: Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Martin Josefsson, Netfilter-devel, Patrick Schaaf > > As usual, it's Rusty's locking patch what triggered me to think over the > > locking issues. In that patch Rusty actually eliminates ip_conntrack > > lock: > > Ah, but I think I might have a better trick. Patrick cc'd since he's > the hashing man. > > Can we compromise the hash so that every packet hashes to the same > chain as its reply? > > If we can do this without making our hash really suck, we only need one > entry per conntrack (and we then compare both ways). This means a > smaller table, and hence better cache utilization, as well as simpler > locking. Thanks for the Cc, I have indeed thought about such an approach. Basically, it is possible for non-NAT conntracks. Those have what I'd call "mirror" key material. Sort the keys, resulting in A) a "flipped" bit, and B) first the smaller key, then the larger, you have broken down both sides of the connection to the same key (and the flipped bit, giving the direction); look up the "sorted" key in the hash table, then choose the real tuple to compare to based on the flipped bit. Unfortunately, once NAT is set up for a conntrack, the keys for both connections lose the mirror property. The approach then becomes a small loss, compared to what we have now. That means we have a tradeoff based on how we are deployed. I stopped thinking about it at that point... As an aside, and independant of the key sorting/flipping, we could eliminate completely the info struct array we now have in each conntrack. Interested? Then I'll write how I envision that. Finally, regarding the locking, I would like to mention the memory conservative "sectored locks" approach. Break the hash table into sectors, about N*NR_CPU of them. Use one rwlock per sector. Given an equally distributing hash function, the probability that two of our NR_CPU cpus will lock the same sector at the same time, becomes small with growing N, fast. This way, a lot less locks than with one-lock-per-chain or one-lock-per-conntrack can be used, without compromising concurrency too much. best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 5:48 ` Patrick Schaaf @ 2003-05-01 10:01 ` Martin Josefsson 0 siblings, 0 replies; 42+ messages in thread From: Martin Josefsson @ 2003-05-01 10:01 UTC (permalink / raw) To: Patrick Schaaf Cc: Rusty Russell, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel On Thu, 2003-05-01 at 07:48, Patrick Schaaf wrote: > As an aside, and independant of the key sorting/flipping, > we could eliminate completely the info struct array we > now have in each conntrack. Interested? Then I'll write > how I envision that. Please do. > Finally, regarding the locking, I would like to mention > the memory conservative "sectored locks" approach. > Break the hash table into sectors, about N*NR_CPU > of them. Use one rwlock per sector. Given an equally > distributing hash function, the probability that two > of our NR_CPU cpus will lock the same sector at the > same time, becomes small with growing N, fast. > > This way, a lot less locks than with one-lock-per-chain > or one-lock-per-conntrack can be used, without compromising > concurrency too much. I like this, if you have 512k buckets you don't want 512k locks if it's a 2way SMP machine. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 0:06 ` Rusty Russell 2003-05-01 5:48 ` Patrick Schaaf @ 2003-05-01 9:06 ` Martin Josefsson 2003-05-02 5:31 ` Rusty Russell 2003-05-05 8:43 ` PATCH: extra conntrack stats Jozsef Kadlecsik 2 siblings, 1 reply; 42+ messages in thread From: Martin Josefsson @ 2003-05-01 9:06 UTC (permalink / raw) To: Rusty Russell Cc: Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf On Thu, 2003-05-01 at 02:06, Rusty Russell wrote: > In message <Pine.LNX.4.33.0304301227530.3328-100000@blackhole.kfki.hu> you write: > > As usual, it's Rusty's locking patch what triggered me to think over the > > locking issues. In that patch Rusty actually eliminates ip_conntrack > > lock: > > Ah, but I think I might have a better trick. Patrick cc'd since he's > the hashing man. > > Can we compromise the hash so that every packet hashes to the same > chain as its reply? > > If we can do this without making our hash really suck, we only need one > entry per conntrack (and we then compare both ways). This means a > smaller table, and hence better cache utilization, as well as simpler > locking. > > Thoughts? As Patrick already pointed out, it gets ugly wrt. NAT, the reply tuple can be quite diffrent. > > I think that eliminating ip_conntrack lock might uncover a (potential) > > bottleneck: the tcp_lock, the lock in conntrack for the predominant > > protocol of the Internet. > > > > By adding a data lock to the conntrack entries can solve that > > (potential) problem. The data lock would make unnecessary the per > > conntrack helper locks too (unfortunately the buffers introduced in the > > zerocopy patch undo that). > > Yeah, per-entry locks make sense I think. I can change the helpers to > do an skb_copy (suggested by Alexey) which will be lockless, or just > implement the damn non-linear search. The question is... how do you free such an entry with the guarantee that there are no waiters? It's probably possible but I can't see a clean nice way to do it. -- /Martin ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 9:06 ` Martin Josefsson @ 2003-05-02 5:31 ` Rusty Russell 2003-05-02 7:06 ` Patrick Schaaf 0 siblings, 1 reply; 42+ messages in thread From: Rusty Russell @ 2003-05-02 5:31 UTC (permalink / raw) To: Martin Josefsson Cc: Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf In message <1051780007.8215.200.camel@tux.rsn.bth.se> you write: > On Thu, 2003-05-01 at 02:06, Rusty Russell wrote: > > If we can do this without making our hash really suck, we only need one > > entry per conntrack (and we then compare both ways). This means a > > smaller table, and hence better cache utilization, as well as simpler > > locking. > > > > Thoughts? > > As Patrick already pointed out, it gets ugly wrt. NAT, the reply tuple > can be quite diffrent. Yes, you're both right, I'm stupid. Scratch this "great idea". > > Yeah, per-entry locks make sense I think. I can change the helpers to > > do an skb_copy (suggested by Alexey) which will be lockless, or just > > implement the damn non-linear search. > > The question is... how do you free such an entry with the guarantee that > there are no waiters? It's probably possible but I can't see a clean > nice way to do it. Reference counts. Currently, you delete from the hash, then dec reference count (hash reference holds one reference count). With RCU, you'd do delete from hash, and call_rcu(drop_refcount) to do the atomic_dec_and_test after you *know* noone else can reach it. It's easy to confuse infrastructure protection (ie. protecting the hash table chains and existence of the elements), with structure protection (protecting the contents of each structure). Currently we use one big lock for both of them, but they're different. Thanks, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 5:31 ` Rusty Russell @ 2003-05-02 7:06 ` Patrick Schaaf 2003-05-02 8:57 ` Rusty Russell 0 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-05-02 7:06 UTC (permalink / raw) To: Rusty Russell Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel, Patrick Schaaf On Fri, May 02, 2003 at 03:31:50PM +1000, Rusty Russell wrote: > In message <1051780007.8215.200.camel@tux.rsn.bth.se> you write: > > On Thu, 2003-05-01 at 02:06, Rusty Russell wrote: > > > If we can do this without making our hash really suck, we only need one > > > entry per conntrack (and we then compare both ways). This means a > > > smaller table, and hence better cache utilization, as well as simpler > > > locking. > > > > > > Thoughts? > > > > As Patrick already pointed out, it gets ugly wrt. NAT, the reply tuple > > can be quite diffrent. > > Yes, you're both right, I'm stupid. Scratch this "great idea". Sorry, but you are not stupid. I do care for non-NAT workloads, and the approach would help greatly, there. In the normal ip_conntrack, one tuple is left, which is normally used for both directions of a non-NAT connection. In addition, there's a pointer, ordinarily 0, which can point to a NAT-reverse-tuple. When NAT starts to manipulate headers, it modifies ip_conntrack anyway; in that place, it can also allocate a suitable NAT-reverse-tuple, attach that to the conntrack, and put it into the hashes. Hmm, we could have a list of such reverse-tuples per conntrack, and call them expectations... In fact, this setup gives a nice extra feature: for NATted connections, we now logically have four flows in our hashes (physically two, due to the key mirror property). Two of those four flows are INVALID, and this can easily be determined when initially looking for the conntrack of an incoming skb. Not that I know the area well, but I feel that this could make some NAT setup decisions a lot simpler. best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-02 7:06 ` Patrick Schaaf @ 2003-05-02 8:57 ` Rusty Russell 2003-05-02 9:54 ` SNAT and IP ID Patrick Schaaf 0 siblings, 1 reply; 42+ messages in thread From: Rusty Russell @ 2003-05-02 8:57 UTC (permalink / raw) To: Patrick Schaaf Cc: Martin Josefsson, Jozsef Kadlecsik, Rolf Fokkens, Harald Welte, Netfilter-devel In message <20030502070651.GG15143@oknodo.bof.de> you write: > On Fri, May 02, 2003 at 03:31:50PM +1000, Rusty Russell wrote: > > Yes, you're both right, I'm stupid. Scratch this "great idea". > > Sorry, but you are not stupid. Sorry, no, I am stupid. > I do care for non-NAT workloads, and the > approach would help greatly, there. > > In the normal ip_conntrack, one tuple is left, which is normally used > for both directions of a non-NAT connection. In addition, there's a > pointer, ordinarily 0, which can point to a NAT-reverse-tuple. > > When NAT starts to manipulate headers, it modifies ip_conntrack anyway; > in that place, it can also allocate a suitable NAT-reverse-tuple, attach > that to the conntrack, and put it into the hashes. The initial reason I wanted to get rid of the duplicates is the locking pain it produces. If we go for RCU, then it's not a problem, though. We'll need to keep around the back-pointer inside the hash entry: if we really only had one, we could use container_of(). We could cover this up in some helper macro I guess... > In fact, this setup gives a nice extra feature: for NATted connections, > we now logically have four flows in our hashes (physically two, due > to the key mirror property). Two of those four flows are INVALID, and > this can easily be determined when initially looking for the conntrack > of an incoming skb. Not that I know the area well, but I feel that this > could make some NAT setup decisions a lot simpler. Not sure this is correct. But anyway, I think some nice helper functions #ifdef CONFIG_IP_NF_NAT_NEEDED could make this workable: not so sure about optimizing for the "we could NAT, but most connections aren't NAT" case... Cheers, Rusty. -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. ^ permalink raw reply [flat|nested] 42+ messages in thread
* SNAT and IP ID 2003-05-02 8:57 ` Rusty Russell @ 2003-05-02 9:54 ` Patrick Schaaf 2003-05-02 15:43 ` Harald Welte 0 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-05-02 9:54 UTC (permalink / raw) To: netfilter-devel Hi all, I have a question that seems too simple, somehow. Anyway, out with it: Given a basic SNAT application, with two hosts on the left, an iptables box in the middle, and a server to the right. By using SNAT, both hosts's source IP addresses are changed to a single shared source IP address, on the way from the iptables box, to the server. Every packet sent by our two hosts, has a 16-bit IP ID field, for use in fragmentation and reassembly. Together with source and destination IP addresses, the ID field provides the key under which the server will gather incoming fragments belonging to the same IP packet. Now, consider both hosts sending out an IP packet with identical ID field. Will the IDs be made unique by the SNAT process? If not, what happens when, after the SNAT, the packets get fragmented on their way to the server? The server cannot sanely reassemble. Has this been discussed here, before? Googling [1] results in quite a set of hits that mention the issue, it seems customary to ignore it? best regards Patrick [1] http://www.google.de/search?q=NAT+fragmentation+problems+IP+ID ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: SNAT and IP ID 2003-05-02 9:54 ` SNAT and IP ID Patrick Schaaf @ 2003-05-02 15:43 ` Harald Welte 0 siblings, 0 replies; 42+ messages in thread From: Harald Welte @ 2003-05-02 15:43 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1450 bytes --] On Fri, May 02, 2003 at 11:54:21AM +0200, Patrick Schaaf wrote: > Hi all, > > I have a question that seems too simple, somehow. Anyway, out with it: one which has already been mentioned, not sure if on this list or on netdev@oss.sgi.com a couple of months ago.;) > Now, consider both hosts sending out an IP packet with identical ID > field. Will the IDs be made unique by the SNAT process? If not, what > happens when, after the SNAT, the packets get fragmented on their > way to the server? The server cannot sanely reassemble. yes, they will clash. DaveM said something like IP ID's can already clash on a single host at gbit speeds... so it's not worth considering this issue. I said, that it can easily be fixed by creating a mangle table IPID target which would reset the IPID on the NAT gateway. This target (if used on nat boxes) would also prevent the recently-published way to heuristically guess that NAT is used (because of the different incrementing sequences IPID's that reveal multiple hosts). > best regards > Patrick > -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-05-01 0:06 ` Rusty Russell 2003-05-01 5:48 ` Patrick Schaaf 2003-05-01 9:06 ` Martin Josefsson @ 2003-05-05 8:43 ` Jozsef Kadlecsik 2 siblings, 0 replies; 42+ messages in thread From: Jozsef Kadlecsik @ 2003-05-05 8:43 UTC (permalink / raw) To: Rusty Russell Cc: Rolf Fokkens, Harald Welte, Martin Josefsson, Netfilter-devel, Patrick Schaaf On Thu, 1 May 2003, Rusty Russell wrote: > > By adding a data lock to the conntrack entries can solve that > > (potential) problem. The data lock would make unnecessary the per > > conntrack helper locks too (unfortunately the buffers introduced in the > > zerocopy patch undo that). > > Yeah, per-entry locks make sense I think. I can change the helpers to > do an skb_copy (suggested by Alexey) which will be lockless, or just > implement the damn non-linear search. I afraid the non-linear search is not general enough. We have to support protocols which carry structures in the payload and in that case the non-linear search would be a nightmare. Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-19 21:42 PATCH: extra conntrack stats Rolf Fokkens 2003-04-25 8:23 ` Patrick Schaaf 2003-04-27 12:48 ` Harald Welte @ 2003-04-28 9:13 ` vecna 2003-04-28 13:47 ` Patrick Schaaf 2 siblings, 1 reply; 42+ messages in thread From: vecna @ 2003-04-28 9:13 UTC (permalink / raw) To: netfilter-devel [-- Attachment #1.1: Type: text/plain, Size: 739 bytes --] On Sat, Apr 19, 2003 at 11:42:01PM +0200, Rolf Fokkens wrote: > Hi! > > This patch adds a little statistical information to the connection tracking: > the number of packets and bytes that have gone throught each connection. in attach I've put a simple patch for make proc entry that report number of connection tracked, if anyone is interesting ... I've used them because cat /proc/net/ip_conntrack | wc -l is too low with a large number of connection tracked. in addiction, for large tracking of connection, I think that the problem is on the manage of linked list, I'm thinking to implement a system for apply merge sort over sortder array of connection hashs, did you think that this should be a good idea ? bye [-- Attachment #1.2: counter.diff --] [-- Type: text/plain, Size: 3223 bytes --] --- kernel-2.4.20/net/ipv4/netfilter/ip_conntrack_core.c 2002-11-29 02:53:15.000000000 +0300 +++ linux/net/ipv4/netfilter/ip_conntrack_core.c 2003-04-28 13:23:28.000000000 +0400 @@ -61,7 +61,8 @@ LIST_HEAD(protocol_list); static LIST_HEAD(helpers); unsigned int ip_conntrack_htable_size = 0; -static int ip_conntrack_max = 0; +static unsigned ip_conntrack_max = 0; +static unsigned ip_conntrack_cnt = 0; static atomic_t ip_conntrack_count = ATOMIC_INIT(0); struct list_head *ip_conntrack_hash; static kmem_cache_t *ip_conntrack_cachep; @@ -640,8 +641,9 @@ hash = hash_conntrack(tuple); - if (ip_conntrack_max && - atomic_read(&ip_conntrack_count) >= ip_conntrack_max) { + ip_conntrack_cnt =atomic_read(&ip_conntrack_count); + + if (ip_conntrack_max && ip_conntrack_cnt >= ip_conntrack_max) { /* Try dropping from random chain, or else from the chain about to put into (in case they're trying to bomb one hash chain). */ @@ -1348,8 +1350,12 @@ #define NET_IP_CONNTRACK_MAX 2089 #define NET_IP_CONNTRACK_MAX_NAME "ip_conntrack_max" +#define NET_IP_CONNTRACK_CNT 2090 +#define NET_IP_CONNTRACK_CNT_NAME "ip_conntrack_cnt" + #ifdef CONFIG_SYSCTL -static struct ctl_table_header *ip_conntrack_sysctl_header; +static struct ctl_table_header *ip_conntrack_sysctl_hdr_max; +static struct ctl_table_header *ip_conntrack_sysctl_hdr_cnt; static ctl_table ip_conntrack_table[] = { { NET_IP_CONNTRACK_MAX, NET_IP_CONNTRACK_MAX_NAME, &ip_conntrack_max, @@ -1366,6 +1372,23 @@ {CTL_NET, "net", NULL, 0, 0555, ip_conntrack_dir_table, 0, 0, 0, 0, 0}, { 0 } }; + +static ctl_table ip_conntrack_t_cnt[] = { + { NET_IP_CONNTRACK_CNT, NET_IP_CONNTRACK_CNT_NAME, &ip_conntrack_cnt, + sizeof(ip_conntrack_cnt), 0644, NULL, proc_dointvec }, + { 0 } +}; + +static ctl_table ip_conntrack_dir_t_cnt[] = { + { NET_IPV4, "ipv4", NULL, 0, 0555, ip_conntrack_t_cnt, 0, 0, 0, 0, 0}, + { 0 } +}; + +static ctl_table ip_conntrack_root_t_cnt[] = { + {CTL_NET, "net", NULL, 0, 0555, ip_conntrack_dir_t_cnt, 0, 0, 0, 0, 0}, + { 0 } +}; + #endif /*CONFIG_SYSCTL*/ static int kill_all(const struct ip_conntrack *i, void *data) @@ -1378,7 +1401,10 @@ void ip_conntrack_cleanup(void) { #ifdef CONFIG_SYSCTL - unregister_sysctl_table(ip_conntrack_sysctl_header); + /* max of connection tracked */ + unregister_sysctl_table(ip_conntrack_sysctl_hdr_max); + /* counter of connection tracked */ + unregister_sysctl_table(ip_conntrack_sysctl_hdr_cnt); #endif ip_ct_attach = NULL; /* This makes sure all current packets have passed through @@ -1462,9 +1488,14 @@ the CONFIG_SYSCTL unless you don't want to detect errors. Grrr... --RR */ #ifdef CONFIG_SYSCTL - ip_conntrack_sysctl_header + ip_conntrack_sysctl_hdr_max = register_sysctl_table(ip_conntrack_root_table, 0); - if (ip_conntrack_sysctl_header == NULL) { + if (ip_conntrack_sysctl_hdr_max == NULL) { + goto err_free_ct_cachep; + } + ip_conntrack_sysctl_hdr_cnt + = register_sysctl_table(ip_conntrack_root_t_cnt, 0); + if (ip_conntrack_sysctl_hdr_cnt == NULL) { goto err_free_ct_cachep; } #endif /*CONFIG_SYSCTL*/ [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: PATCH: extra conntrack stats 2003-04-28 9:13 ` vecna @ 2003-04-28 13:47 ` Patrick Schaaf 2003-04-28 15:07 ` possible target SBALANCE ? vecna 0 siblings, 1 reply; 42+ messages in thread From: Patrick Schaaf @ 2003-04-28 13:47 UTC (permalink / raw) To: vecna; +Cc: netfilter-devel > in attach I've put a simple patch for make proc entry that report number of > connection tracked, if anyone is interesting ... I've used them because > cat /proc/net/ip_conntrack | wc -l is too low with a large number of > connection tracked. I use 'grep conntrack /proc/slabinfo' for this task (first number column is the number of conntracks currently allocated). Regarding your patch, it introduces another new place for cache line pingpong, happening with the same frequency as creation of new conntracks. That's rather rare, so the new pingpong won't have as much impact as the per-packet cases, but it's still there. > in addiction, for large tracking of connection, I think that the problem > is on the manage of linked list, I'm thinking to implement a system for > apply merge sort over sortder array of connection hashs, did you think > that this should be a good idea ? That is not a good idea, it's not the way hashing is supposed to work. There is no need for ordering anywhere, so no need to sort anything, as a mergesort would do. If you have overly long hash chains, use more hash buckets, and, if neccessary, a better distributing hash function. best regards Patrick ^ permalink raw reply [flat|nested] 42+ messages in thread
* possible target SBALANCE ? 2003-04-28 13:47 ` Patrick Schaaf @ 2003-04-28 15:07 ` vecna 2003-04-29 14:48 ` Harald Welte 0 siblings, 1 reply; 42+ messages in thread From: vecna @ 2003-04-28 15:07 UTC (permalink / raw) To: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1953 bytes --] (sorry for my bad english :) On Mon, Apr 28, 2003 at 03:47:13PM +0200, Patrick Schaaf wrote: > I use 'grep conntrack /proc/slabinfo' for this task (first number column > is the number of conntracks currently allocated). uu :) I ever listen about :) tnx :) > Regarding your patch, it introduces another new place for cache line > pingpong, happening with the same frequency as creation of new conntracks. excuse me, what do you mean with pingpong ? > That is not a good idea, it's not the way hashing is supposed to work. > There is no need for ordering anywhere, so no need to sort anything, > as a mergesort would do. > If you have overly long hash chains, use more hash buckets, and, if > neccessary, a better distributing hash function. umh, I've always view as hard work the calling of __ip_conntrack_find, that call LIST_FIND that one check at the wrong possibility all present connection tracked ... but, about the subject of the list, on the documentation of a switch for compactPCI arch (http://www.znyx.com/products/software/openarchitect/default.htm) them explain our system for make a traffic division more or less similar to a load balancer, them suppose that statistically all internet connection should be (more or less) equally distributed if is applied a matematic computation over session params (ip source, ip dest, source porc, dest port). if we take a look at the last byte of ip addres and port, we could use this number and associate them (with a simple LAST_BYTENUMBER % POSSIBLE_OUTPUT) to one output interface, for make load balancing applied to session without doing session tracking. (because on between direction of sessions, ip source ip dest source port dest port, is alwayes the some, also with inverted position). I've some time to use for make a possible -j SBALANCE target for do a forward on a list of output interface ... did you think that should be nice ? bye, ~c [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: possible target SBALANCE ? 2003-04-28 15:07 ` possible target SBALANCE ? vecna @ 2003-04-29 14:48 ` Harald Welte 2003-04-30 11:59 ` vecna 0 siblings, 1 reply; 42+ messages in thread From: Harald Welte @ 2003-04-29 14:48 UTC (permalink / raw) To: vecna; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2710 bytes --] On Mon, Apr 28, 2003 at 05:07:27PM +0200, vecna@s0ftpj.org wrote: > > Regarding your patch, it introduces another new place for cache line > > pingpong, happening with the same frequency as creation of new conntracks. > excuse me, what do you mean with pingpong ? he is referring to cache line ping pong on SMP systems. It means that only one of the per-cpu caches can have writeable data. as soon as you write to a certain memory location (like incrementing a counter), all cached values of this memory location on the other CPU's get invalidated. This basically results in a cache miss at every time you are referencing this memory location, making the CPU idle to wait for the data from memory to arrive. > umh, I've always view as hard work the calling of __ip_conntrack_find, > that call LIST_FIND that one check at the wrong possibility all present > connection tracked ... __ip_conntrack_find is an inline function and LIST_FIND a macro. so there are not really any function calls involved. > them explain our system for make a traffic division more or less similar > to a load balancer, them suppose that statistically all internet connection > should be (more or less) equally distributed if is applied a matematic > computation over session params (ip source, ip dest, source porc, dest port). > > if we take a look at the last byte of ip addres and port, we could use > this number and associate them (with a simple LAST_BYTENUMBER % > POSSIBLE_OUTPUT) to one output interface, for make load balancing > applied to session without doing session tracking. (because on between > direction of sessions, ip source ip dest source port dest port, is > alwayes the some, also with inverted position). this is not new. However: - you cannot have outgoing connections from your individual nodes, since you cannot make sure that the reply packets will end up in the correct hash bucket - you cannot deal with complex protocols like FTP,IRC,H.323,SIP,PPTP,... - your load balancing is static and depends on the equal distribution of (source) IP addresses > I've some time to use for make a possible -j SBALANCE target for do > a forward on a list of output interface ... did you think that > should be nice ? I personally don't think this would be very helpful. > ~c -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: possible target SBALANCE ? 2003-04-29 14:48 ` Harald Welte @ 2003-04-30 11:59 ` vecna 2003-04-30 13:02 ` Roberto Nibali 0 siblings, 1 reply; 42+ messages in thread From: vecna @ 2003-04-30 11:59 UTC (permalink / raw) To: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1369 bytes --] > __ip_conntrack_find is an inline function and LIST_FIND a macro. so > there are not really any function calls involved. ok, i don't speak about latency caused by continuos jump, but the complexity of the algorithm is O(N) with N =number of connection, with merge sort this should became O(Log(N)), and simple when is added to the list you needed to follow incremental order for the key to search (the hash) > this is not new. However: > - you cannot have outgoing connections from your individual nodes, since > you cannot make sure that the reply packets will end up in the correct > hash bucket > - you cannot deal with complex protocols like FTP,IRC,H.323,SIP,PPTP,... > - your load balancing is static and depends on the equal distribution of > (source) IP addresses I've felt the necessity to balance a large size of traffic, and all the other solutions (proposed in LVS) involves connection tracking (and LVS have the "bug" o "limitation" to follow session computing hash with source ip and destination port only, not with between couple for trace also the replies, and this could not be fine for split traffic for IDS, because most needed for trace connection to read also the replis) but if this is not ad interesting feature for netfilter (is a firewall :) could be better if I ask on LVS mailing list ? bye, vecna [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: possible target SBALANCE ? 2003-04-30 11:59 ` vecna @ 2003-04-30 13:02 ` Roberto Nibali 0 siblings, 0 replies; 42+ messages in thread From: Roberto Nibali @ 2003-04-30 13:02 UTC (permalink / raw) To: vecna; +Cc: Netfilter Developers Ciao Vecna, :) > I've felt the necessity to balance a large size of traffic, and all > the other solutions (proposed in LVS) involves connection tracking (and But very simple connection tracking. > LVS have the "bug" o "limitation" to follow session computing hash > with source ip and destination port only, not with between couple for Well, to be correct, it's a triplet of <proto, vaddr, vport> for basic hashing, but you can as well implement a new scheduler and register your own hashing, as it is done for example with the lblcr scheduler. > trace also the replies, and this could not be fine for split traffic for > IDS, because most needed for trace connection to read also the replis) It's changeable, the hash could be extended if there is a real need for it. Just bring it on on the LVS list ;). > but if this is not ad interesting feature for netfilter (is a firewall :) > could be better if I ask on LVS mailing list ? Yes, we can discuss it there, if you prefer. Cheers, Roberto Nibali, ratz -- echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2003-05-05 13:07 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-04-19 21:42 PATCH: extra conntrack stats Rolf Fokkens 2003-04-25 8:23 ` Patrick Schaaf 2003-04-27 12:48 ` Harald Welte 2003-04-27 15:23 ` Rolf Fokkens 2003-04-27 20:42 ` Harald Welte 2003-04-28 6:13 ` Patrick Schaaf 2003-04-29 22:15 ` Jozsef Kadlecsik 2003-04-29 22:38 ` Martin Josefsson 2003-04-30 10:49 ` Jozsef Kadlecsik 2003-04-30 11:19 ` Martin Josefsson 2003-04-30 23:05 ` Martin Josefsson 2003-05-01 4:05 ` Rusty Russell 2003-05-01 6:05 ` Patrick Schaaf 2003-05-01 6:46 ` Rusty Russell 2003-05-01 7:04 ` Patrick Schaaf 2003-05-01 7:38 ` Rusty Russell 2003-05-01 9:58 ` Martin Josefsson 2003-05-01 11:32 ` Harald Welte 2003-05-01 11:26 ` Harald Welte 2003-05-02 12:18 ` Jozsef Kadlecsik 2003-05-02 12:30 ` Martin Josefsson 2003-05-02 21:51 ` Jozsef Kadlecsik 2003-05-02 21:58 ` Martin Josefsson 2003-05-05 9:24 ` Jozsef Kadlecsik 2003-05-05 12:38 ` Jozsef Kadlecsik 2003-05-05 13:07 ` Martin Josefsson 2003-05-01 0:06 ` Rusty Russell 2003-05-01 5:48 ` Patrick Schaaf 2003-05-01 10:01 ` Martin Josefsson 2003-05-01 9:06 ` Martin Josefsson 2003-05-02 5:31 ` Rusty Russell 2003-05-02 7:06 ` Patrick Schaaf 2003-05-02 8:57 ` Rusty Russell 2003-05-02 9:54 ` SNAT and IP ID Patrick Schaaf 2003-05-02 15:43 ` Harald Welte 2003-05-05 8:43 ` PATCH: extra conntrack stats Jozsef Kadlecsik 2003-04-28 9:13 ` vecna 2003-04-28 13:47 ` Patrick Schaaf 2003-04-28 15:07 ` possible target SBALANCE ? vecna 2003-04-29 14:48 ` Harald Welte 2003-04-30 11:59 ` vecna 2003-04-30 13:02 ` Roberto Nibali
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.