* conntrack performance/DoS formula @ 2002-06-27 20:46 Don Cohen 2002-06-28 6:23 ` Patrick Schaaf 2002-07-02 14:38 ` conntrack performance/DoS formula Harald Welte 0 siblings, 2 replies; 55+ messages in thread From: Don Cohen @ 2002-06-27 20:46 UTC (permalink / raw) To: netfilter-devel I had mentioned that I was hoping to remember/find the appropriate formula for the probability of a hash bucket reaching its limit. I got out my prob-stat book from I-won't-mention-how-many years ago. Suppose there's a limit of L connections to be put in a hash table with B buckets, each bucket allowed to hold a max of M entries. I assume that the hash function maps a connection to each bucket with equal probability, P = 1/B. Suppose the table is full, meaning it has L connections in it. The probability that a given bucket has n connections in it is: c(L,n) (P ^ n) ((1-P) ^ (L-n)) [called binomial distribution] where c(L,n) is the number of ways of choosing n things out of L, which is L! / (L-n)! n!, i.e., 10x9x8/3x2x1 ways to choose 3 out of 10. The binomial formula comes from the fact that we've chosen n connections out of L to put in this bucket, each of those with probability P, and of course the other L-n went into other buckets with probability 1-P. BTW, this is closely approximated by a Poisson distribution. Anyhow, on to some answers. My initial guess was that with L/2 buckets (so the average bucket length is 2), max bucket size M=10 would result in only a microscopic chance of overflow. The chance is bigger than I expected, about 1E-5. Raising M to 15 reduces it to about 1E-10 and raising M to 20 makes it about 1E-15, which seems quite respectable. There's a pretty straight forward tradeoff between space and time. Instead of increasing the time (# iterations = M) you can increase the space (# buckets). Changing from L/2 buckets to L we get M=10 probability 1E-9 of filling a bucket M=15 probability 1E-14 M=20 probability 1E-20 And while we're at it, with 2L buckets (average occupancy .5) M=10 prob ~ 1E-11 M=15 prob ~ 1E-18 M=20 prob ~ 1E-25 My guess is that the amount of time required to do 20 iterations comparing connection data with elements of a list in a bucket is very small compared to the amount of time otherwise spent on a packet. On the other hand, a hash bucket is also pretty cheap. The data for a connection takes much more space than 2 hash buckets, or even 10. And with 10L buckets the chance of overflow at M=10 is only 1E-19, for M=20 it's somewhere around 1E-40. So, there you have it. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-27 20:46 conntrack performance/DoS formula Don Cohen @ 2002-06-28 6:23 ` Patrick Schaaf 2002-06-28 17:53 ` Don Cohen 2002-07-02 14:38 ` conntrack performance/DoS formula Harald Welte 1 sibling, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-28 6:23 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel Hello Don, > I had mentioned that I was hoping to remember/find the appropriate > formula for the probability of a hash bucket reaching its limit. [SNIP calculation - thanks for sharing your research!] > I assume that the hash function maps a connection to each bucket > with equal probability, P = 1/B. Can you find an analysis that permits some non-uniformity, given as a parameter? I am pretty sure that the hash function we use for conntracking is nonuniform in lots of real world cases. Thus, if you want to propose using that probability derived length limit, then you must prove uniformity of our hash function. I have the vague feeling (being a programmer, not a mathematician :) that even small nonuniformities will have large impact on your probability distribution. May I suggest a different, experimental approach: extract the hash function from the code, creating a small userlevel tool which reads in the saved contents of some /proc/net/ip_conntrack, and with a hash bucket and limit specified on the commandline, counts the distribution for that specific set of connections, maybe comparing it with the theoretical prediction, resulting in a "how does that real world conntrack table fit to theory" metric? Once done, such a tool could permit command line selection of different hash functions, for comparison. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 6:23 ` Patrick Schaaf @ 2002-06-28 17:53 ` Don Cohen 2002-06-28 18:36 ` Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-06-28 17:53 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf writes: > > I assume that the hash function maps a connection to each bucket > > with equal probability, P = 1/B. > > Can you find an analysis that permits some non-uniformity, given as > a parameter? I am pretty sure that the hash function we use for > conntracking is nonuniform in lots of real world cases. Thus, if > you want to propose using that probability derived length limit, > then you must prove uniformity of our hash function. The effect of nonuniformity in the hash function is very easy to analyze. It just changes the probability of different buckets. For any bucket b the same formula works, just replacing P with P[b]. So if we take the most commonly used bucket, call its probability P', then a pessimistic approximation is that the chance of a new connection encountering a full bucket is given by the earlier formula with the number of buckets reduced to 1/P'. > I have the vague feeling (being a programmer, not a mathematician :) > that even small nonuniformities will have large impact on your > probability distribution. I guess it depends on what you consider "small" non-uniformities. I think the opposite is true. The danger in a hash function is that there will be a high correlation among the hashes of the items that arrive. In this case I don't see much danger. > May I suggest a different, experimental approach: extract the hash function > from the code, creating a small userlevel tool which reads in the saved > contents of some /proc/net/ip_conntrack, and with a hash bucket and limit > specified on the commandline, counts the distribution for that specific > set of connections, maybe comparing it with the theoretical prediction, > resulting in a "how does that real world conntrack table fit to theory" > metric? Once done, such a tool could permit command line selection of > different hash functions, for comparison. Sounds easy enough. You have some "real" (or at least "realistic") data? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 17:53 ` Don Cohen @ 2002-06-28 18:36 ` Patrick Schaaf 2002-06-28 19:03 ` Don Cohen 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-28 18:36 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, netfilter-devel > > May I suggest a different, experimental approach: extract the hash function > > from the code, creating a small userlevel tool which reads in the saved > > contents of some /proc/net/ip_conntrack, and with a hash bucket and limit > > specified on the commandline, counts the distribution for that specific > > set of connections, maybe comparing it with the theoretical prediction, > > resulting in a "how does that real world conntrack table fit to theory" > > metric? Once done, such a tool could permit command line selection of > > different hash functions, for comparison. > > Sounds easy enough. You have some "real" (or at least "realistic") data? I have real data from an IRC server (one of the german IRCnet hubs), and from several boxen providing transparent proxy service to dialup customers, 3000 customers per box peak, running DNS and squid towards the Internet. With the peak load, the proxy boxen have up to 100000 conntrack entries (according to slabinfo; /proc/net/ip_conntrack is too lame to show them all). Others will certainly be able to contribute data from production routing firewalls. I have one of those, too, but they only protect an ~50 workstation LAN, so these are not that significant. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 18:36 ` Patrick Schaaf @ 2002-06-28 19:03 ` Don Cohen 2002-06-28 19:35 ` Patrick Schaaf 2002-07-02 14:40 ` Harald Welte 0 siblings, 2 replies; 55+ messages in thread From: Don Cohen @ 2002-06-28 19:03 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf writes: > I have real data from an IRC server (one of the german IRCnet hubs), and > from several boxen providing transparent proxy service to dialup customers, > 3000 customers per box peak, running DNS and squid towards the Internet. > With the peak load, the proxy boxen have up to 100000 conntrack entries > (according to slabinfo; /proc/net/ip_conntrack is too lame to show them all). > > Others will certainly be able to contribute data from production > routing firewalls. I have one of those, too, but they only protect > an ~50 workstation LAN, so these are not that significant. If you can put some data on a web page that I could download I'd be happy to write and run the program that analyzes it and compares it to the results expected for a uniform hash function. Let me know what hash functions particularly interest you other than the one currently in conntrack and the slight modification I suggested. Just in case you consider this to be a drawback, I might as well mention up front that this program will be written in lisp. (I consider it to be a benefit.) I suppose /proc/net/ip_conntrack is the preferred data format, but if you have data in another format that others can easily generate that would also be fine. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 19:03 ` Don Cohen @ 2002-06-28 19:35 ` Patrick Schaaf 2002-06-28 19:39 ` Patrick Schaaf 2002-06-28 21:10 ` Don Cohen 2002-07-02 14:40 ` Harald Welte 1 sibling, 2 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-06-28 19:35 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1542 bytes --] Hi Don, > > I have real data [...] from several boxen providing transparent proxy > > service to dialup customers, 3000 customers per box peak, running DNS > > and squid towards the Internet. > > With the peak load, the proxy boxen have up to 100000 conntrack entries > > If you can put some data on a web page that I could download I'd be > happy to write and run the program that analyzes it and compares it > to the results expected for a uniform hash function. I have put a current sample, with ~33000 entries from one of those proxy boxen, at http://bei.bof.de/ip_conntrack.1.gz To protect the innoccent (i.e. my customers and their surfing habits), I have modified the top 16 bit of the IP addresses (keeping a random but unique-once-determined mapping). The low order 16 bit are unmodified. I hope this is sufficient to combine realism wrt hashing with privacy. The Perl script I used for transforming the ip_conntrack input, is appended for review. > Just in case you consider this to be a drawback, I might as well > mention up front that this program will be written in lisp. > (I consider it to be a benefit.) That's your choice. I would write it in Perl or C. > I suppose /proc/net/ip_conntrack is the preferred data format The data format should be nearly /proc/net/ip_conntrack, the only modification is that I compressed multiple whitespace down to a single character, and the abovementioned mangling of the IP addresses. I'm looking forward to your results. Tell me if you want more data. best regards Patrick [-- Attachment #2: ctmangle --] [-- Type: text/plain, Size: 680 bytes --] #! /usr/bin/perl -w use strict; # tcp 6 114 TIME_WAIT src=XX.XXX.211.64 dst=YYY.Y.144.66 sport=34685 dport=80 src=YYY.Y.144.66 dst=XX.XXX.211.64 sport=80 dport=34685 [ASSURED] use=1 my %ipmap = (); sub domap { my $ip = $_[0]; if (!defined($ipmap{$ip})) { my $r = rand (2**16)-1; my $m = ''; $m .= ($r & 255); $r >>= 8; $m .= '.' . ($r & 255); $ipmap{$ip} = $m; } return $ipmap{$ip}; } while (<>) { chop; my @F = split; my $out = ''; my $sep = ''; for my $token (@F) { if ($token =~ /^([^=]+)=(\d+\.\d+)(\.\d+\.\d+)$/) { my $ip = domap($2); $out .= "$sep${1}=$ip${3}"; } else { $out .= "$sep$token"; } $sep = ' '; } print "$out\n"; } ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 19:35 ` Patrick Schaaf @ 2002-06-28 19:39 ` Patrick Schaaf 2002-06-28 21:10 ` Don Cohen 1 sibling, 0 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-06-28 19:39 UTC (permalink / raw) To: netfilter-devel (I wrote) > I have put a current sample, with ~33000 entries from one of those > proxy boxen, at http://bei.bof.de/ip_conntrack.1.gz I forgot to mention that on this box, I run with the default module load time calculated hash bucket count of 7168 (much too low, I know), and ip_conntrack_max at 200000. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 19:35 ` Patrick Schaaf 2002-06-28 19:39 ` Patrick Schaaf @ 2002-06-28 21:10 ` Don Cohen 2002-06-28 21:28 ` Patrick Schaaf 1 sibling, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-06-28 21:10 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf writes: > I have put a current sample, with ~33000 entries from one of those > proxy boxen, at http://bei.bof.de/ip_conntrack.1.gz I started to test and found that I got longer buckets than I expected. The problem seems to be that you have a lot of duplicate entries! This should not happen in a connection table, right? (setf ht (make-hash-table :test 'equal)) (loop for d in data do (incf (gethash d ht 0))) (maphash (lambda (x y) (when (> y 1) (print (list x y)))) ht) ... 620 lines !!! ((6 161731392 2776778530 3228 1770) 2) = 9.163.211.64 165.130.71.34 ((6 2776779313 974068 1241 80) 2) ((6 2776779190 2386095140 2596 80) 2) these are protocol, src/dst ip, src/dst port I predict that when I remove the duplicates I'll get the expected distribution. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 21:10 ` Don Cohen @ 2002-06-28 21:28 ` Patrick Schaaf 2002-06-28 21:49 ` Don Cohen 2002-06-28 22:30 ` Don Cohen 0 siblings, 2 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-06-28 21:28 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, netfilter-devel On Fri, Jun 28, 2002 at 02:10:50PM -0700, Don Cohen wrote: > Patrick Schaaf writes: > > I have put a current sample, with ~33000 entries from one of those > > proxy boxen, at http://bei.bof.de/ip_conntrack.1.gz > > I started to test and found that I got longer buckets than I expected. > The problem seems to be that you have a lot of duplicate entries! > This should not happen in a connection table, right? I see about 550 duplicates in 33000 entries. This is most likely due to 'cat /proc/net/ip_conntrack' being terminally broken. But that's not news to me. Sorry I did not check before. > I predict that when I remove the duplicates I'll get the expected > distribution. sort ip_conntrack.1|uniq regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 21:28 ` Patrick Schaaf @ 2002-06-28 21:49 ` Don Cohen 2002-06-28 22:30 ` Don Cohen 1 sibling, 0 replies; 55+ messages in thread From: Don Cohen @ 2002-06-28 21:49 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Below is some code along with some results after I removed the duplicates. You'll see I just simulated tables of different sizes with the first N lines of your data, always using buckets with expected occupancy=1. The buckets are still a little longer than expected. Perhaps this is related to ... 'cat /proc/net/ip_conntrack' being terminally broken. Or perhaps something else strange in your data. Or maybe something inherent in "real" data that has not yet occurred to me. ================ (in-package :user) (defun c(n m &aux (ans 1)) "number of ways of choosing m out of n" (loop for i from 1 to m do (setf ans (/ (* ans (- (+ 1 n) i)) i))) ans) #+clisp (setf (ext:long-float-digits) 64) (defun f (n m p) "prob. that exactly m out of n entries end up in a bucket which gets an entry with prob. p" (* (c n m) (expt p m)(expt (- 1l0 p) (- n m)))) (defun parse-data-line (string) "input is a line from /proc/net/ip_conntrack, output is (proto src dst sport dport)" (let (proto src dst sport dport) (with-input-from-string (s string) (read s) (setf proto (read s))) (unless (integerp proto) (error "strange proto?")) (setf src (find-ip string "src=") dst (find-ip string "dst=") sport (find-port string "sport=") dport (find-port string "dport=")) (list proto src dst sport dport))) (defun find-port (string label) (let (pos) (unless (setf pos (search label string)) (error "can't find ~A" label)) (parse-integer string :start (+ pos (length label)) :junk-allowed t))) (defun find-ip (string label) (let (pos int (ans 0)) (unless (setf pos (search label string)) (error "can't find ~A" label)) (incf pos (length label)) (loop for i below 4 do (multiple-value-setq (int pos) (parse-integer string :start pos :junk-allowed t)) (setf ans (+ (* ans 256) int) pos (1+ pos))) ans)) (defun hash1 (tablesize proto src dst sport dport) (mod (+ proto src dst sport dport) tablesize)) (defun hash2 (tablesize proto src dst sport dport) (mod (+ proto src dst sport dport sport) tablesize)) ;; is this the code now in conntrack? (defun test-hash (hashfn data tablesize entries) (let ((table (make-array tablesize :initial-element 0)) max table2) (loop for i below entries as d in data do (incf (aref table (apply hashfn tablesize d)))) (setf max (loop for i below tablesize maximize (aref table i))) (setf table2 (make-array (1+ max) :initial-element 0)) (loop for i below tablesize do (incf (aref table2 (aref table i)))) (loop for i from 0 to max collect (list i (aref table2 i))))) (length (setf data (with-open-file (f "/tmp/ip_conntrack.1") (loop while (setf l (read-line f nil nil)) collect (parse-data-line l))))) ;; => 33465 ;; now remove the duplicates !!! (setf ht (make-hash-table :test 'equal)) (loop for d in data do (incf (gethash d ht 0))) ;; (maphash (lambda (x y) (when (> y 1) (print (list x y)))) ht) ;; ... 620 lines !!! (setf realdata nil) (maphash (lambda (x y) (push x) realdata) ht) #| results (test-hash 'hash1 realdata 1024 1024) ((0 458) (1 291) (2 150) (3 86) (4 25) (5 10) (6 3) (7 1)) ;; meaning 458 buckets with no entries, 291 buckets with 1 entry ... (f 1023 7 (/ 1l0 1024)) 7.154...L-5 - so we should get 7 in a bucket less than once in 10K meaning this hash is not doing a good job with this data (test-hash 'hash1 realdata 1023 1023) ((0 372) (1 381) (2 188) (3 64) (4 16) (5 2)) (f 1023 5 (/ 1l0 1023)) 0.00304 [ i.e., 3 buckets out of 1000 should have 5 entries so this is more reasonable ] (test-hash 'hash1 realdata 16384 16384) ((0 6755) (1 5533) (2 2509) (3 955) (4 354) (5 169) (6 68) (7 34) (8 4) (9 1) (10 2)) 10 in a bucket should happen only once in 1E7 (test-hash 'hash1 realdata 16383 16383) ((0 6692) (1 5496) (2 2647) (3 956) (4 367) (5 142) (6 50) (7 22) (8 8) (9 2) (10 0) (11 1)) and 11 only once in 1E8 Now adding sport twice (test-hash 'hash2 realdata 1024 1024) ((0 391) (1 361) (2 184) (3 63) (4 19) (5 6)) (f 1023 5 (/ 1l0 1024)) 0.003 (as above) (test-hash 'hash2 realdata 1023 1023) ((0 365) (1 392) (2 190) (3 57) (4 17) (5 1) (6 0) (7 1)) above we saw 7E-5 (test-hash 'hash2 realdata 16384 16384) ((0 6518) (1 5667) (2 2678) (3 1005) (4 338) (5 104) (6 52) (7 15) (8 6) (9 1)) (test-hash 'hash2 realdata 16383 16383) ((0 6608) (1 5583) (2 2655) (3 984) (4 347) (5 128) (6 53) (7 11) (8 12) (9 1) (10 1)) |# ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 21:28 ` Patrick Schaaf 2002-06-28 21:49 ` Don Cohen @ 2002-06-28 22:30 ` Don Cohen 2002-06-29 9:03 ` Patrick Schaaf ` (2 more replies) 1 sibling, 3 replies; 55+ messages in thread From: Don Cohen @ 2002-06-28 22:30 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel The buckets are still a little longer than expected. Perhaps this is related to ... 'cat /proc/net/ip_conntrack' being terminally broken. Or perhaps something else strange in your data. Or maybe something inherent in "real" data that has not yet occurred to me. Here's the most extreme case: (test-hash 'hash1 realdata 16383 16383) ((0 6692) (1 5496) (2 2647) (3 956) (4 367) (5 142) (6 50) (7 22) (8 8) (9 2) (10 0) (11 1)) and 11 only once in 1E8 I changed the code to find the data that went into this bucket with 11 connections. Here it is: (loop for x in '( (6 161731392 2776778790 3228 1045) (6 161731392 2776778584 3128 1351) (6 161731392 2776778230 3228 1605) (6 2776778862 161726449 1064 8080) (6 161731392 2776778518 3228 1317) (6 161731392 2776778560 3128 1375) (6 161731392 2776778406 3228 1429) (6 161731392 2776778148 3128 1787) (6 161731392 2776779212 3128 33489) (6 2776778556 161726449 1370 8080) (6 161731392 2776778534 3228 1301)) collect (reduce '+ x)) (2938514461 2938514461 2938514461 2938514461 2938514461 2938514461 2938514461 2938514461 2938547227 2938514461 2938514461) And what we see is that the sum of the data (used in hash1) is almost always the same, so the table size doesn't matter. Now, let's see how the components of the first differ from those of the others: (loop for x in <data above> collect (loop for y in ' (6 161731392 2776778790 3228 1045) as z in x collect (- y z))) ((0 0 0 0 0) (0 0 206 100 -306) (0 0 560 0 -560) (0 -2615047470 2615052341 2164 -7035) (0 0 272 0 -272) (0 0 230 100 -330) (0 0 384 0 -384) (0 0 642 100 -742) (0 0 -422 100 -32444) (0 -2615047164 2615052341 1858 -7035) (0 0 256 0 -256)) So there are 4 cases where the source IP and port are the same and the dest IP and dest port happen to cancel out, and four more where the source port differs by 100 and the dest IP and port differ by enough to cancel that 100. That's 8 out of the 11. Is this "normal" ? Do you have some idea about all these ports 3128 and 3228 ? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 22:30 ` Don Cohen @ 2002-06-29 9:03 ` Patrick Schaaf 2002-06-29 16:48 ` Don Cohen 2002-06-29 9:29 ` Patrick Schaaf 2002-06-29 12:07 ` Patrick Schaaf 2 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 9:03 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel > Is this "normal" ? Do you have some idea about all these > ports 3128 and 3228 ? The machine where that ip_conntrack came from, is running two squid processes, one on each processor, and clients are distributed evenly over the two processes. 3128 and 3228 are the two listening ports of those squid processes. Such a conntrack shape will be the normal case for iptables running on a server, more extreme than when it's running on a routing firewall. As you asked for ideas about "better" hashes, could you possibly try using CRC32 over the concatenation of the key values? best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 9:03 ` Patrick Schaaf @ 2002-06-29 16:48 ` Don Cohen 2002-06-29 17:22 ` Patrick Schaaf 2002-06-29 17:33 ` Patrick Schaaf 0 siblings, 2 replies; 55+ messages in thread From: Don Cohen @ 2002-06-29 16:48 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf writes: > > Is this "normal" ? Do you have some idea about all these > > ports 3128 and 3228 ? > > The machine where that ip_conntrack came from, is running two squid > processes, one on each processor, and clients are distributed evenly > over the two processes. 3128 and 3228 are the two listening ports > of those squid processes. > > Such a conntrack shape will be the normal case for iptables running > on a server, more extreme than when it's running on a routing firewall. The fact that many connections have the same source IP and two common source ports does not seem at all abnormal. The interesting correlation in this case is that the client with IP address differing by n is using a port that differs by -n. This is just the sort of correlation that would be bad for this hash function, and the sort that I would not expect. (6 161731392 2776778790 3228 1045) (6 161731392 2776778584 3128 1351) ;; (0 0 206 100 -306) (6 161731392 2776778230 3228 1605) ;; (0 0 560 0 -560) (6 2776778862 161726449 1064 8080) ;; different machines (6 161731392 2776778518 3228 1317) ;; (0 0 272 0 -272) (6 161731392 2776778560 3128 1375) ;; (0 0 230 100 -330) (6 161731392 2776778406 3228 1429) ;; (0 0 384 0 -384) (6 161731392 2776778148 3128 1787) ;; (0 0 642 100 -742) (6 161731392 2776779212 3128 33489) ;; different sum (6 2776778556 161726449 1370 8080) ;; different machines (6 161731392 2776778534 3228 1301) ;; (0 0 256 0 -256) > As you asked for ideas about "better" hashes, could you possibly try > using CRC32 over the concatenation of the key values? Looks like you may have done this already, but if not then perhaps you could supply the code or tell me how to compute it. Instead, I just look at your words above "concatenation of the key values". Here's my first order approximation: Instead of adding IPs to ports I concatenate IP and port into one 48 bit number. Then add the two 48 bit numbers (and the protocol) and take the result mod table size. (defun hash3 (tablesize proto src dst sport dport) (mod (+ proto (ash (+ src dst) 16) sport dport) tablesize)) The result: (test-hash 'hash3 realdata 16383 16383) ((0 6284) (1 5796) (2 2867) (3 1015) (4 319) (5 84) (6 14) (7 4)) Much closer to theory BTW it turns out this leads to a good demonstration of my point about making the modulus relatively prime to the size of the data. If I change 16383 to 16384 (a power of 2) then almost all of the IP address data above is ignored and we get this monster: (test-hash 'hash3 realdata 16384 16384) ((0 8757) (1 4875) (2 1192) (3 490) (4 293) (5 179) (6 131) (7 95) (8 92) (9 60) (10 49) (11 42) (12 37) (13 28) (14 20) (15 13) (16 11) (17 6) (18 6) (19 2) (20 1) (21 1) (22 0) (23 1) (24 1) (25 0) (26 0) (27 0) (28 1) (29 0) (30 0) (31 0) (32 0) (33 0) (34 0) (35 0) (36 0) (37 0) (38 0) (39 0) (40 0) (41 0) (42 0) (43 0) (44 0) (45 0) (46 0) (47 0) (48 0) (49 0) (50 0) (51 0) (52 0) (53 0) (54 0) (55 0) (56 0) (57 0) (58 0) (59 0) (60 0) (61 0) (62 0) (63 0) (64 0) (65 0) (66 0) (67 0) (68 0) (69 0) (70 0) (71 0) (72 0) (73 0) (74 0) (75 0) (76 0) (77 0) (78 0) (79 0) (80 0) (81 0) (82 0) (83 0) (84 0) (85 0) (86 1)) [reactions to later messages] one more thing to observe for your tests: each line from ip_conntrack is hashed twice, with mirrored source / destination addresses. I was only reading the first src/dst/sport/dport from each line. I didn't check for such mirrors in general, but they don't appear in the sample bucket above containing 11 entries. This version adds a '-b X' option. If you specify that, all TCP conntracks with timeouts less than X get ignored. For example, use "-b 431940" to only count the conntracks active within the last 60 seconds. I guess I just don't understand conntrack well enough. How does -b 431940 map to active in the last 60 sec? Also, why does this matter? If you use my suggestion to treat a "full bucket" like a full table then the issue is whether buckets are full, and I don't see how that's affected by how recently used the entries are. Incidentally, this makes the current ip_conntrack hash function look a lot less bad than before. I appears that the "long bucket" packets mostly belong to some kind of port scan. They will time out eventually, and they will NOT contribute significantly to CPU usage, because they will be rarely hit. They do contribute to cpu cost because they lengthen the search for other connections. Everybody should see the results for their own system. Really. I mean it. My feeling is that we should be able to find a hash function that is "good enough" in almost all cases, meaning something like this: if we want probability of full bucket < 1E-10 and the formula tells us that this indicates we need to make max bucket length >= 12 then to be on the safe side we increase it, say 50% to 18, and then we log full buckets. When these appear in the log then there are two possible explanations. One is that we're under attack from someone purposely trying to fill buckets. The other is that our hash function is bad for the real data. In either case we should use your program to look at the connections in the overflowing buckets, and this will tell us what's going on (and how to fix it if it can be fixed). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 16:48 ` Don Cohen @ 2002-06-29 17:22 ` Patrick Schaaf 2002-07-05 13:47 ` Harald Welte 2002-06-29 17:33 ` Patrick Schaaf 1 sibling, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 17:22 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel > This version adds a '-b X' option. If you specify that, all TCP conntracks > with timeouts less than X get ignored. For example, use "-b 431940" to only > count the conntracks active within the last 60 seconds. > > I guess I just don't understand conntrack well enough. > How does -b 431940 map to active in the last 60 sec? Whenever a new packet arrives, the timeout of its conntrack is set to the maximum value. For ESTABLISHED TCP connections, that's 5 days, i.e. 432000 seconds. > Also, why does this matter? When worrying about chain lengths, we are worrying about lookup performance. By restricting the accounting to conntracks active during the last 60 seconds, I tried to capture the part of the buckets "where the action was". My finding is that the really long chain lengths, which also prevail when I use the crc32 hash, are all for very old inactive (and UNREPLIED) connections - which look like portscans from one or two days ago. And indeed, those show the +n -n pattern on the ports that you observed, and maybe, just maybe this is a sign of some malicious conntrack attacking scan tool. If my further analysis below is correct, however, that attack is almost futile. > If you use my suggestion to treat a "full > bucket" like a full table then the issue is whether buckets are full, > and I don't see how that's affected by how recently used the entries > are. I think my analysis shows a possible problem with your approach: when such a port scan results in an overly long single bucket list, that list will prevail for up to 5 days. If N is the number of buckets, and newly arriving connections are evenly spread over the buckets, then every Nth connection will fall into that overly long bucket - being blocked if we apply your cutting logic unmodified. However, in my observation of those problematic buckets, the connections are UNREPLIED. Then we can use the same trick that is used when ip_conntrack_max is reached: recycle one of those UNREPLIED connections immediately for the new conntrack. (If you are not aware of that ip_conntrack_max handling, have a look at the early_drop() logic in ip_conntrack_core.c) > Incidentally, this makes the current ip_conntrack hash function look a lot > less bad than before. I appears that the "long bucket" packets mostly belong > to some kind of port scan. They will time out eventually, and they will NOT > contribute significantly to CPU usage, because they will be rarely hit. > > They do contribute to cpu cost because they lengthen the search for > other connections. NO. They were "alone in their bucket". In other words, no active connection fell into the same bucket, so they won't come into play at all during lookup. And, as each entry has its individual timer, there is no scanning of the overall table, so they would be really untouched in real life, until they timeout after 5 days. At this point, I would like to stop for some time, and await results from other's real world ip_conntrack setups. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 17:22 ` Patrick Schaaf @ 2002-07-05 13:47 ` Harald Welte 0 siblings, 0 replies; 55+ messages in thread From: Harald Welte @ 2002-07-05 13:47 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Don Cohen, netfilter-devel On Sat, Jun 29, 2002 at 07:22:18PM +0200, Patrick Schaaf wrote: > I think my analysis shows a possible problem with your approach: when such > a port scan results in an overly long single bucket list, that list will > prevail for up to 5 days. If N is the number of buckets, and newly arriving > connections are evenly spread over the buckets, then every Nth connection > will fall into that overly long bucket - being blocked if we apply your > cutting logic unmodified. > > However, in my observation of those problematic buckets, the connections > are UNREPLIED. Then we can use the same trick that is used when > ip_conntrack_max is reached: recycle one of those UNREPLIED > connections immediately for the new conntrack. this sounds like a sane way to go. > NO. They were "alone in their bucket". In other words, no active connection > fell into the same bucket, so they won't come into play at all during lookup. > And, as each entry has its individual timer, there is no scanning of the > overall table, so they would be really untouched in real life, until they > timeout after 5 days. but how high is the probability that they are [and remain] 'alone in their bucket' ? This could just be a coincidence... > At this point, I would like to stop for some time, and await results > from other's real world ip_conntrack setups. I'm going to do this on a couple of my firewalls on Jul 08 [that's the single day I'm going to be in .de between my different trips...]. Right now I only have dialup modem internet... So please wait another couple of days. Thanks. > best regards > Patrick -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 16:48 ` Don Cohen 2002-06-29 17:22 ` Patrick Schaaf @ 2002-06-29 17:33 ` Patrick Schaaf 1 sibling, 0 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 17:33 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel > BTW it turns out this leads to a good demonstration of my point about > making the modulus relatively prime to the size of the data. Missed that point on first reading. I totally agree with you on this point. We already pay for the modulus in the current code, so we may as well take advantage of it. In an unrelated piece of software (phone call correlation), I use a precomputed table of primes that are all just a bit smaller than a power of two, and for any bucket count desired by the user, I really use the next largest almost-power-of-two prime from that table. best regards Patrick static unsigned sizes[] = { 3, /* 2^2-1 */ 7, /* 2^3-1 */ 13, 31, /* 2^5-1 */ 61, 127, /* 2^7-1 */ 251, 509, 1021, 2039, 4093, 8191, /* 2^13-1 */ 16381, 32749, 65521, 131071, /* 2^17-1 */ 262139, 524287, /* 2^19-1 */ 1048571, 2097157, 4194301, 8388581, }; ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 22:30 ` Don Cohen 2002-06-29 9:03 ` Patrick Schaaf @ 2002-06-29 9:29 ` Patrick Schaaf 2002-06-29 12:07 ` Patrick Schaaf 2 siblings, 0 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 9:29 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel Don, one more thing to observe for your tests: each line from ip_conntrack is hashed twice, with mirrored source / destination addresses. Ideally, for a sufficiently oversized bucket count, that would be immaterial. For the too-small-bucket case, hash list lengths should double. With a too-bad hash function, things could get even worse. (My apologies if you already did this doubling in your test code; my meatware lisp parser is not very good...) best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 22:30 ` Don Cohen 2002-06-29 9:03 ` Patrick Schaaf 2002-06-29 9:29 ` Patrick Schaaf @ 2002-06-29 12:07 ` Patrick Schaaf 2002-06-29 12:34 ` Patrick Schaaf 2 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 12:07 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2146 bytes --] Hi all, find appended, for use and review, my C version of a /proc/net/ip_conntrack based hash function and bucket occupation test program. Here is a short usage summary: - extract crc32.c and cttest.c from the attachments - compile with "gcc -o cttest cttest.c" - by default, /proc/net/ip_conntrack is read. Use the "-f filename" option to read the given file, instead. - use '-u' to enable unidirectional operation; the reverse tuple is ignored. - by default, the hash bucket count is 7168. Use "-s NUMBER" to modify this. Look at the kernel output from ip_conntrack load time to find out the bucket count that is active on your system. - use '-h NAME' to select the hash function to use. Right now you can use: o original this should be the hash function used by ip_conntrack o crc32 a crc32 over proto/sip/dip/sport/dport o random no relation to the input at all; this still reads the input, but only to get at the number of entries. Instead of calculating an input based hash, random() is called. - use '-i bucket-number' to show the input lines corresponding to the given hash bucket number. default is to not do this. - use '-t count' to output the bucket-number corresponding to the first bucket with 'count' entries in it. For example, if you find that the maximum bucket list length is 131, use '-t 131', and look for "occupation target 131 is at XXX". Then, use '-i XXX' on the next run to see all input lines falling into bucket number XXX. NOTE: as we have seen, it is likely that reading /proc/net/ip_conntrack does produce duplicate lines. This test program does not try to cope with that situation; if it applies to you, first save the output to some file, and use "sort somefile | uniq >testfile" to generate the input file for this test program. Oh, maybe I should mention what the program outputs... The main output comes after the "Count Length CT" heading. Each lines gives the COUNT of all buckets having list LENGTH, sorted by decreasing length. CT gives the cummulative total over the counts. Everybody should see the results for their own system. Really. I mean it. best regards Patrick [-- Attachment #2: crc32.c --] [-- Type: text/plain, Size: 4400 bytes --] /* crc32.c -- compute the CRC-32 of a data stream * Copyright (C) 1995-2002 Mark Adler * stolen from zlib CVS */ /* @(#) $Id: crc32.c,v 1.2.2.1 2002/03/12 09:34:29 werner Exp $ */ /* ======================================================================== * Table of CRC-32's of all single-byte values (made by make_crc_table) */ static const u32 crc_table[256] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; /* ========================================================================= */ #define DO1(buf) crc = crc_table[((int)crc ^ (*buf++)) & 0xff] ^ (crc >> 8); #define DO2(buf) DO1(buf); DO1(buf); #define DO4(buf) DO2(buf); DO2(buf); #define DO8(buf) DO4(buf); DO4(buf); /* ========================================================================= */ u32 crc32(crc, buf, len) u32 crc; const char *buf; int len; { crc = crc ^ 0xffffffffL; while (len >= 8) { DO8(buf); len -= 8; } if (len) do { DO1(buf); } while (--len); return crc ^ 0xffffffffL; } [-- Attachment #3: cttest.c --] [-- Type: text/plain, Size: 5614 bytes --] /* cttest.c - parse /proc/net/ip_conntrack, compute bucket occupation * * This is how lines look: * * udp 17 0 src=213.6.73.215 dst=62.104.191.241 sport=1858 dport=53 src=62.104.191.241 dst=213.6.73.215 sport=53 dport=1858 use=1 * tcp 6 114 TIME_WAIT src=62.104.211.64 dst=212.7.144.66 sport=34685 dport=80 src=212.7.144.66 dst=62.104.211.64 sport=80 dport=34685 [ASSURED] use=1 */ #include <stdio.h> #include <string.h> #include <malloc.h> #include <stdlib.h> #include <getopt.h> #include <errno.h> #include <netinet/in.h> #include <arpa/inet.h> typedef unsigned long u32; typedef unsigned short u16; typedef unsigned char u8; struct ct_key { u32 sip; u32 dip; u16 sport; u16 dport; u8 proto; }; static u32 hash_conntrack(struct ct_key *key) { return ntohl(key->sip + key->dip + key->sport + key->dport + key->proto) + ntohs(key->sport); } #include "crc32.c" static u32 hash_crc32(struct ct_key *key) { u32 crc = crc32(0, (const char *) &key->proto, sizeof(key->proto)); u32 v32; u16 v16; v32 = ntohl(key->sip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v32 = ntohl(key->dip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v16 = ntohs(key->sport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); v16 = ntohs(key->dport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); return crc; } static u32 hash_random(struct ct_key *key) { return (u32) random(); } struct hash_def { char *name; u32 (*hash_f)(struct ct_key *); }; #define NR_HASHES (sizeof(hashes)/sizeof(hashes[0])) static struct hash_def hashes[] = { { "original", hash_conntrack }, { "random", hash_random }, { "crc32", hash_crc32 }, }; static u32 htable_size = 7168; static u32 *occupation; static struct hash_def *test_hash = hashes; static void do_init(void) { u32 idx; occupation = malloc(sizeof(*occupation) * htable_size); for (idx=0; idx<htable_size; idx++) occupation[idx] = 0; } static int display_target = -1; static int do_count(struct ct_key *key) { u32 idx = test_hash->hash_f(key) % htable_size; occupation[idx]++; return ((int)idx) == display_target; } static u32 parse_ip(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return inet_addr(p); } static u16 parse_port(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return (u16) strtoul(p, 0, 10); } static int unidir; static int process(char *p) { struct ct_key key[1]; char *tok; int res; strtok(p, " \t\n"); /* prot text */ tok = strtok(0, " \t\n"); key->proto = (u8) strtoul(tok, 0, 10); /* prot num */ strtok(0, " \t\n"); /* timeout */ switch (key->proto) { case 6: strtok(0, " \t\n"); /* tcp state */ case 17: break; default: return 0; } /* forward */ tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ res = do_count(key); if (unidir) return res; /* reverse */ tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ return res | do_count(key); } static int occupation_target = -1; static void show_occupation(void) { u32 idx, maxlen, total, partial, *agg; for (maxlen=0, total=0, idx=0; idx<htable_size; idx++) { total += occupation[idx]; if (occupation[idx] > maxlen) maxlen = occupation[idx]; } printf("hash function: %s\n", test_hash->name); printf("maximum bucket list length: %lu\n", maxlen); printf("total number of entries: %lu\n", total); agg = malloc(sizeof(*agg) * (maxlen+1)); for (idx=0; idx<=maxlen; idx++) agg[idx] = 0; for (idx=0; idx<htable_size; idx++) { agg[occupation[idx]]++; if (((int)occupation[idx]) == occupation_target) { printf("occupation target %d at %lu\n", occupation_target, idx); occupation_target = -1; } } printf("Count\tLength\tCT\n"); for (partial=0, idx=maxlen; idx>0; idx--) { double d; if (agg[idx] == 0) continue; partial += idx * agg[idx]; d = (100.0 * partial) / (1.0 * total); printf("%lu\t%lu\t%.02f%%\n", agg[idx], idx, d); } printf("%lu\t0\t100.00%%\n", agg[idx]); } static struct hash_def *find_hash(char *name) { u32 idx; for (idx=0; idx<NR_HASHES; idx++) if (0 == strcmp(name, hashes[idx].name)) return hashes+idx; fprintf(stderr, "no hash function named '%s'\n", name); return 0; } static void usage(void) { } int main(int argc, char **argv) { int c; char buf[512], copy[512]; char *filename = "/proc/net/ip_conntrack"; FILE *fp; while (EOF != (c = getopt(argc, argv, "f:us:h:i:t:"))) switch(c) { case 'f': filename = optarg; break; case 'u': unidir = 1; break; case 's': htable_size = strtoul(optarg, 0, 10); break; case 'h': test_hash = find_hash(optarg); break; case 'i': display_target = strtol(optarg, 0, 10); break; case 't': occupation_target = strtol(optarg, 0, 10); break; default: usage(); } if (!test_hash) return 1; fp = (0 == strcmp(filename, "-")) ? stdin : fopen(filename, "r"); if (!fp) { fprintf(stderr, "%s: %s\n", filename, strerror(errno)); exit(1); } printf("filename: %s\n", filename); do_init(); while (fgets(buf, sizeof(buf), fp)) { if (display_target != -1) strcpy(copy, buf); if (process(buf)) printf("[%d]\t%s", display_target, copy); } if (fp != stdin) fclose(fp); show_occupation(); return 0; } ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 12:07 ` Patrick Schaaf @ 2002-06-29 12:34 ` Patrick Schaaf 2002-06-30 8:31 ` Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-29 12:34 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Don Cohen, netfilter-devel [-- Attachment #1: Type: text/plain, Size: 811 bytes --] Hi again, one more elaboration of the cttest program, before I go offline in preparation for the fourth German football world championship... This version adds a '-b X' option. If you specify that, all TCP conntracks with timeouts less than X get ignored. For example, use "-b 431940" to only count the conntracks active within the last 60 seconds. Incidentally, this makes the current ip_conntrack hash function look a lot less bad than before. I appears that the "long bucket" packets mostly belong to some kind of port scan. They will time out eventually, and they will NOT contribute significantly to CPU usage, because they will be rarely hit. All is well, it seems. At least looking at my 33000 entry sample from yesterday. I'm looking forward to results from other workloads! best regards Patrick [-- Attachment #2: cttest.c --] [-- Type: text/plain, Size: 6095 bytes --] /* cttest.c - parse /proc/net/ip_conntrack, compute bucket occupation * * This is how lines look: * * udp 17 0 src=213.6.73.215 dst=62.104.191.241 sport=1858 dport=53 src=62.104.191.241 dst=213.6.73.215 sport=53 dport=1858 use=1 * tcp 6 114 TIME_WAIT src=62.104.211.64 dst=212.7.144.66 sport=34685 dport=80 src=212.7.144.66 dst=62.104.211.64 sport=80 dport=34685 [ASSURED] use=1 */ #include <stdio.h> #include <string.h> #include <malloc.h> #include <stdlib.h> #include <getopt.h> #include <errno.h> #include <netinet/in.h> #include <arpa/inet.h> typedef unsigned long u32; typedef unsigned short u16; typedef unsigned char u8; struct ct_key { u32 sip; u32 dip; u16 sport; u16 dport; u8 proto; }; static u32 hash_conntrack(struct ct_key *key) { return ntohl(key->sip + key->dip + key->sport + key->dport + key->proto) + ntohs(key->sport); } #include "crc32.c" static u32 hash_crc32(struct ct_key *key) { u32 crc = crc32(0, (const char *) &key->proto, sizeof(key->proto)); u32 v32; u16 v16; v32 = ntohl(key->sip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v32 = ntohl(key->dip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v16 = ntohs(key->sport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); v16 = ntohs(key->dport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); return crc; } static u32 hash_random(struct ct_key *key) { return (u32) random(); } struct hash_def { char *name; u32 (*hash_f)(struct ct_key *); }; #define NR_HASHES (sizeof(hashes)/sizeof(hashes[0])) static struct hash_def hashes[] = { { "original", hash_conntrack }, { "random", hash_random }, { "crc32", hash_crc32 }, }; static u32 htable_size = 7168; static u32 *occupation; static struct hash_def *test_hash = hashes; static void do_init(void) { u32 idx; occupation = malloc(sizeof(*occupation) * htable_size); for (idx=0; idx<htable_size; idx++) occupation[idx] = 0; } static int display_target = -1; static int do_count(struct ct_key *key) { u32 idx = test_hash->hash_f(key) % htable_size; occupation[idx]++; return ((int)idx) == display_target; } static u32 parse_ip(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return inet_addr(p); } static u16 parse_port(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return (u16) strtoul(p, 0, 10); } static u32 ignore_tcp_timeout_bound = 0; static int ignore_by_timeout(struct ct_key *key, char *tok) { return key->proto == 6 && strtoul(tok, 0, 10) < ignore_tcp_timeout_bound; } static int unidir; static int process(char *p) { struct ct_key key[1]; char *tok; int res; strtok(p, " \t\n"); /* prot text */ tok = strtok(0, " \t\n"); key->proto = (u8) strtoul(tok, 0, 10); /* prot num */ tok = strtok(0, " \t\n"); /* timeout */ if (ignore_by_timeout(key, tok)) return 0; switch (key->proto) { case 6: strtok(0, " \t\n"); /* tcp state */ case 17: break; default: return 0; } /* forward */ tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ res = do_count(key); if (unidir) return res; /* reverse */ tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ return res | do_count(key); } static int occupation_target = -1; static void show_occupation(void) { u32 idx, maxlen, total, partial, *agg; for (maxlen=0, total=0, idx=0; idx<htable_size; idx++) { total += occupation[idx]; if (occupation[idx] > maxlen) maxlen = occupation[idx]; } printf("hash function: %s\n", test_hash->name); printf("maximum bucket list length: %lu\n", maxlen); printf("total number of entries: %lu\n", total); agg = malloc(sizeof(*agg) * (maxlen+1)); for (idx=0; idx<=maxlen; idx++) agg[idx] = 0; for (idx=0; idx<htable_size; idx++) { agg[occupation[idx]]++; if (((int)occupation[idx]) == occupation_target) { printf("occupation target %d at %lu\n", occupation_target, idx); occupation_target = -1; } } printf("Count\tLength\tCT\n"); for (partial=0, idx=maxlen; idx>0; idx--) { double d; if (agg[idx] == 0) continue; partial += idx * agg[idx]; d = (100.0 * partial) / (1.0 * total); printf("%lu\t%lu\t%.02f%%\n", agg[idx], idx, d); } printf("%lu\t0\t100.00%%\n", agg[idx]); } static struct hash_def *find_hash(char *name) { u32 idx; for (idx=0; idx<NR_HASHES; idx++) if (0 == strcmp(name, hashes[idx].name)) return hashes+idx; fprintf(stderr, "no hash function named '%s'\n", name); return 0; } static void usage(void) { fprintf(stderr, "Usage: not written\n"); exit(1); } int main(int argc, char **argv) { int c; char buf[512], copy[512]; char *filename = "/proc/net/ip_conntrack"; FILE *fp; while (EOF != (c = getopt(argc, argv, "f:us:h:i:t:b:"))) switch(c) { case 'f': filename = optarg; break; case 'u': unidir = 1; break; case 's': htable_size = strtoul(optarg, 0, 10); break; case 'h': test_hash = find_hash(optarg); break; case 'i': display_target = strtol(optarg, 0, 10); break; case 't': occupation_target = strtol(optarg, 0, 10); break; case 'b': ignore_tcp_timeout_bound = strtoul(optarg, 0, 10); break; default: usage(); } if (!test_hash) return 1; fp = (0 == strcmp(filename, "-")) ? stdin : fopen(filename, "r"); if (!fp) { fprintf(stderr, "%s: %s\n", filename, strerror(errno)); exit(1); } printf("filename: %s\n", filename); if (ignore_tcp_timeout_bound) printf("ignore TCP conntracks with timeout less than %lu\n", ignore_tcp_timeout_bound); do_init(); while (fgets(buf, sizeof(buf), fp)) { if (display_target != -1) strcpy(copy, buf); if (process(buf)) printf("[%d]\t%s", display_target, copy); } if (fp != stdin) fclose(fp); show_occupation(); return 0; } [-- Attachment #3: crc32.c --] [-- Type: text/plain, Size: 4400 bytes --] /* crc32.c -- compute the CRC-32 of a data stream * Copyright (C) 1995-2002 Mark Adler * stolen from zlib CVS */ /* @(#) $Id: crc32.c,v 1.2.2.1 2002/03/12 09:34:29 werner Exp $ */ /* ======================================================================== * Table of CRC-32's of all single-byte values (made by make_crc_table) */ static const u32 crc_table[256] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; /* ========================================================================= */ #define DO1(buf) crc = crc_table[((int)crc ^ (*buf++)) & 0xff] ^ (crc >> 8); #define DO2(buf) DO1(buf); DO1(buf); #define DO4(buf) DO2(buf); DO2(buf); #define DO8(buf) DO4(buf); DO4(buf); /* ========================================================================= */ u32 crc32(crc, buf, len) u32 crc; const char *buf; int len; { crc = crc ^ 0xffffffffL; while (len >= 8) { DO8(buf); len -= 8; } if (len) do { DO1(buf); } while (--len); return crc ^ 0xffffffffL; } ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-29 12:34 ` Patrick Schaaf @ 2002-06-30 8:31 ` Patrick Schaaf 2002-06-30 19:40 ` Don Cohen 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-06-30 8:31 UTC (permalink / raw) To: don-nf; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1682 bytes --] Hi all, here is the next version of my cttest.c conntrack test program. Thanks go to Don Cohen for noticing that my parser was fucked up. I did not notice that we sometimes have the [XXX] state marker between the two tuples, and sometimes we have it after both tuples. How ugly... This is now fixed by skipping a field when the second tuple starts with '['. Incidentally, the occasional overly long chains (140 entry lists etc.) are now gone gone gone. They were an artifact of the misparsing... A small fix I also did, was to (correctly) parse the ports into network byte order. Now the "original" hash function should really match what's in the kernel. We now see a nice "spread" of list lengths for both the original hash function, as well as the crc32 one, without strange outliers. For my bucket count of 7168 (default for a 1GB RAM non-highmem machine...), the original hash is still much worse than the crc32 one. At this point, I implemented some new options to the cttest program: Use the new '-P' flag (no arguments) to automatically choose a prime number of hash buckets, just a bit larger than the number you selected with '-s NUMBER'. When I do that, the large difference between the original and crc32 hashes almost vanishes - sometimes our original hash even looks _better_ (by chance)! This is a strong indication that we want the "prime select" in the kernel! You can now use '-c NUMBER' to set a seed for the CRC32 hash. You can now use '-C' (without parameter) to set a random seed for the CRC32 hash. These options are for testing the idea of "attack compensation by inserting local variance". I'd like to hear opinions on this. best regards Patrick [-- Attachment #2: cttest.c --] [-- Type: text/plain, Size: 7297 bytes --] /* cttest.c - parse /proc/net/ip_conntrack, compute bucket occupation * * This is how lines look: * * udp 17 0 src=213.6.73.215 dst=62.104.191.241 sport=1858 dport=53 src=62.104.191.241 dst=213.6.73.215 sport=53 dport=1858 use=1 * tcp 6 114 TIME_WAIT src=62.104.211.64 dst=212.7.144.66 sport=34685 dport=80 src=212.7.144.66 dst=62.104.211.64 sport=80 dport=34685 [ASSURED] use=1 */ #include <stdio.h> #include <string.h> #include <malloc.h> #include <stdlib.h> #include <getopt.h> #include <errno.h> #include <time.h> #include <netinet/in.h> #include <arpa/inet.h> typedef unsigned long u32; typedef unsigned short u16; typedef unsigned char u8; struct ct_key { u32 sip; u32 dip; u16 sport; u16 dport; u8 proto; }; static u32 hash_conntrack(struct ct_key *key) { return ntohl(key->sip + key->dip + key->sport + key->dport + key->proto) + ntohs(key->sport); } #include "crc32.c" static u32 crc32_seed; static u32 hash_crc32(struct ct_key *key) { u32 crc = crc32_seed; u32 v32; u16 v16; crc = crc32(crc, (const char *) &key->proto, sizeof(key->proto)); v32 = ntohl(key->sip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v32 = ntohl(key->dip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v16 = ntohs(key->sport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); v16 = ntohs(key->dport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); return crc; } static u32 hash_random(struct ct_key *key) { return (u32) random(); } struct hash_def { char *name; u32 (*hash_f)(struct ct_key *); }; #define NR_HASHES (sizeof(hashes)/sizeof(hashes[0])) static struct hash_def hashes[] = { { "original", hash_conntrack }, { "random", hash_random }, { "crc32", hash_crc32 }, }; static u32 htable_size = 7168; static u32 *occupation; static struct hash_def *test_hash = hashes; static void do_init(void) { u32 idx; occupation = malloc(sizeof(*occupation) * htable_size); for (idx=0; idx<htable_size; idx++) occupation[idx] = 0; } static int display_target = -1; static int do_count(struct ct_key *key) { u32 idx = test_hash->hash_f(key) % htable_size; occupation[idx]++; return ((int)idx) == display_target; } static u32 parse_ip(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return inet_addr(p); } static u16 parse_port(char *t) { char *p = strchr(t, '='); if (!p++) return 0; return htons((u16) strtoul(p, 0, 10)); } static u32 ignore_tcp_timeout_bound = 0; static int ignore_by_timeout(struct ct_key *key, char *tok) { return key->proto == 6 && strtoul(tok, 0, 10) < ignore_tcp_timeout_bound; } static int unidir; static int process(char *p) { struct ct_key key[1]; char *tok; int res; strtok(p, " \t\n"); /* prot text */ tok = strtok(0, " \t\n"); key->proto = (u8) strtoul(tok, 0, 10); /* prot num */ tok = strtok(0, " \t\n"); /* timeout */ if (ignore_by_timeout(key, tok)) return 0; switch (key->proto) { case 6: strtok(0, " \t\n"); /* tcp state */ case 17: break; default: return 0; } /* forward */ tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ res = do_count(key); if (unidir) return res; /* reverse */ tok = strtok(0, " \t\n"); if (*tok == '[') tok = strtok(0, " \t\n"); key->sip = parse_ip(tok); /* src ip */ tok = strtok(0, " \t\n"); key->dip = parse_ip(tok); /* dst ip */ tok = strtok(0, " \t\n"); key->sport = parse_port(tok); /* src port */ tok = strtok(0, " \t\n"); key->dport = parse_port(tok); /* dst port */ return res | do_count(key); } static int occupation_target = -1; static void show_occupation(void) { u32 idx, maxlen, total, partial, *agg; for (maxlen=0, total=0, idx=0; idx<htable_size; idx++) { total += occupation[idx]; if (occupation[idx] > maxlen) maxlen = occupation[idx]; } printf("maximum bucket list length: %lu\n", maxlen); printf("total number of entries: %lu\n", total); agg = malloc(sizeof(*agg) * (maxlen+1)); for (idx=0; idx<=maxlen; idx++) agg[idx] = 0; for (idx=0; idx<htable_size; idx++) { agg[occupation[idx]]++; if (((int)occupation[idx]) == occupation_target) { printf("occupation target %d at %lu\n", occupation_target, idx); occupation_target = -1; } } printf("Count\tLength\tCT\n"); for (partial=0, idx=maxlen; idx>0; idx--) { double d; if (agg[idx] == 0) continue; partial += idx * agg[idx]; d = (100.0 * partial) / (1.0 * total); printf("%lu\t%lu\t%.02f%%\n", agg[idx], idx, d); } printf("%lu\t0\t100.00%%\n", agg[idx]); } static struct hash_def *find_hash(char *name) { u32 idx; for (idx=0; idx<NR_HASHES; idx++) if (0 == strcmp(name, hashes[idx].name)) return hashes+idx; fprintf(stderr, "no hash function named '%s'\n", name); return 0; } static void usage(void) { fprintf(stderr, "Usage: not written\n"); exit(1); } static u32 prime_select(u32); int main(int argc, char **argv) { int c; char buf[512], copy[512]; char *filename = "/proc/net/ip_conntrack"; FILE *fp; int select_prime = 0; while (EOF!=(c=getopt(argc, argv, "f:s:h:i:t:b:c:CPu"))) switch(c) { case 'f': filename = optarg; break; case 'u': unidir = 1; break; case 's': htable_size = strtoul(optarg, 0, 10); break; case 'P': select_prime = 1; break; case 'h': test_hash = find_hash(optarg); break; case 'i': display_target = strtol(optarg, 0, 10); break; case 't': occupation_target = strtol(optarg, 0, 10); break; case 'b': ignore_tcp_timeout_bound = strtoul(optarg, 0, 10); break; case 'c': crc32_seed = strtoul(optarg, 0, 10); break; case 'C': srandom(time(0)); crc32_seed = random(); break; default: usage(); } if (!test_hash) return 1; fp = (0 == strcmp(filename, "-")) ? stdin : fopen(filename, "r"); if (!fp) { fprintf(stderr, "%s: %s\n", filename, strerror(errno)); exit(1); } if (select_prime) htable_size = prime_select(htable_size); printf("filename: %s\n", filename); printf("hash function: %s\n", test_hash->name); printf("number of buckets: %lu\n", htable_size); if (ignore_tcp_timeout_bound) printf("ignore TCP conntracks with timeout less than %lu\n", ignore_tcp_timeout_bound); if (crc32_seed) printf("CRC32 seed %lu\n", crc32_seed); do_init(); while (fgets(buf, sizeof(buf), fp)) { if (display_target != -1) strcpy(copy, buf); if (process(buf)) printf("[%d]\t%s", display_target, copy); } if (fp != stdin) fclose(fp); show_occupation(); return 0; } static u32 prime_sizes[] = { 3, /* 2^2-1 */ 7, /* 2^3-1 */ 13, 31, /* 2^5-1 */ 61, 127, /* 2^7-1 */ 251, 509, 1021, 2039, 4093, 8191, /* 2^13-1 */ 16381, 32749, 65521, 131071, /* 2^17-1 */ 262139, 524287, /* 2^19-1 */ 1048571, 2097157, 4194301, 8388581, }; #define NR_PRIME_SIZES (sizeof(prime_sizes)/sizeof(prime_sizes[0])) static u32 prime_select(u32 val) { u32 idx; for (idx=0; idx<NR_PRIME_SIZES; idx++) if (val < prime_sizes[idx]) return prime_sizes[idx]; return prime_sizes[NR_PRIME_SIZES-1]; } [-- Attachment #3: crc32.c --] [-- Type: text/plain, Size: 4400 bytes --] /* crc32.c -- compute the CRC-32 of a data stream * Copyright (C) 1995-2002 Mark Adler * stolen from zlib CVS */ /* @(#) $Id: crc32.c,v 1.2.2.1 2002/03/12 09:34:29 werner Exp $ */ /* ======================================================================== * Table of CRC-32's of all single-byte values (made by make_crc_table) */ static const u32 crc_table[256] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; /* ========================================================================= */ #define DO1(buf) crc = crc_table[((int)crc ^ (*buf++)) & 0xff] ^ (crc >> 8); #define DO2(buf) DO1(buf); DO1(buf); #define DO4(buf) DO2(buf); DO2(buf); #define DO8(buf) DO4(buf); DO4(buf); /* ========================================================================= */ u32 crc32(crc, buf, len) u32 crc; const char *buf; int len; { crc = crc ^ 0xffffffffL; while (len >= 8) { DO8(buf); len -= 8; } if (len) do { DO1(buf); } while (--len); return crc ^ 0xffffffffL; } ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-30 8:31 ` Patrick Schaaf @ 2002-06-30 19:40 ` Don Cohen 2002-07-01 8:07 ` Henrik Nordstrom 2002-07-02 14:55 ` Harald Welte 0 siblings, 2 replies; 55+ messages in thread From: Don Cohen @ 2002-06-30 19:40 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf writes: > Use the new '-P' flag (no arguments) to automatically choose a prime > number of hash buckets, just a bit larger than the number you selected > with '-s NUMBER'. When I do that, the large difference between the > original and crc32 hashes almost vanishes - sometimes our original > hash even looks _better_ (by chance)! This is a strong indication > that we want the "prime select" in the kernel! Actually, I believe that this is overkill. All we really need is for the components that go into the data to have sizes relatively prime to the modulus, and since all the data has sizes of form 2^n it should be sufficient to make the modulus odd. That's why I suggested the modulus be "if table size odd then table size otherwise one less". You might try that to convince yourself that it works. > You can now use '-c NUMBER' to set a seed for the CRC32 hash. You can now > use '-C' (without parameter) to set a random seed for the CRC32 hash. > These options are for testing the idea of "attack compensation by > inserting local variance". I'd like to hear opinions on this. I'm sorry I don't know enough about CRC to know whether this will help. How does the -c argument affect the answer? My first test is to try specifying -c and not, and I'm encouraged that I get different distributions. At a higher level, though, I don't see much value in preventing an attacker from filling a single bucket when he can almost as easily do much more harm by filling the entire conntrack database! The reason for limiting the size of each bucket is to prevent any such attack from killing performance by consuming time, and it does that, but it doesn't prevent a resource consumption attack on space. Clearly one easy defense against one easy attack (as was mentioned in private communication) is that whenever you want to add to a bucket that is full, you should feel free to throw out the oldest UNREPLIED connection in that bucket. If you're interested in preventing attacks that consume connection table space devoted to "real" connections I have some ideas, but that's another large discussion. If lots of people out there want to hear it I'll send to the list, otherwise perhaps it's better off the list. On a related subject, I'm worried that UNREPLIED might not mean what I think it does. Your data contains things like: tcp 6 387070 ESTABLISHED src=9.163.211.64 dst=165.130.71.38 sport=3228 dport=1301 [UNREPLIED] src=165.130.71.38 dst=9.163.211.64 sport=1301 dport=3228 use=1 How can one half of the connection be established while the other half is unreplied? Is this cause in one direction a syn was sent and a syn-ack was received, and that makes it established, while in the other direction a syn-ack was sent and nothing more was received so it's unreplied? That doesn't make much sense. They should both be established only if the entire handshake is complete, right? You were describing unreplied connections that hang around for 5 days. When I send some random syn's I see stuff like cat /proc/net/ip_conntrack tcp 6 113 SYN_SENT src=10.0.3.2 dst=10.0.2.3 sport=4096 ... I'm guessing the 113 is the timeout, and that it was originally set to 120, which still seems way too long to me, but a lot better than 5 days. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-30 19:40 ` Don Cohen @ 2002-07-01 8:07 ` Henrik Nordstrom 2002-07-01 17:49 ` Don Cohen [not found] ` <15652.38084.704660.234319@isis.cs3-inc.com> 2002-07-02 14:55 ` Harald Welte 1 sibling, 2 replies; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-01 8:07 UTC (permalink / raw) To: Don Cohen, Patrick Schaaf; +Cc: netfilter-devel Don Cohen wrote: > On a related subject, I'm worried that UNREPLIED might not mean > what I think it does. Your data contains things like: > tcp 6 387070 ESTABLISHED src=9.163.211.64 dst=165.130.71.38 sport=3228 > dport=1301 [UNREPLIED] src=165.130.71.38 dst=9.163.211.64 sport=1301 > dport=3228 use=1 > How can one half of the connection be established while the other half > is unreplied? The ESTABLISHED indicates the TCP state, UNREPLIED indicates the conntrack state. This is a TCP session that has only seen ACK in one direction, no packets in the other. Almost related note: The connection is not ASSURED. Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-01 8:07 ` Henrik Nordstrom @ 2002-07-01 17:49 ` Don Cohen 2002-07-02 7:58 ` Henrik Nordstrom [not found] ` <15652.38084.704660.234319@isis.cs3-inc.com> 1 sibling, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-07-01 17:49 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: netfilter-devel Henrik Nordstrom writes: > Don Cohen wrote: > > > On a related subject, I'm worried that UNREPLIED might not mean > > what I think it does. Your data contains things like: > > tcp 6 387070 ESTABLISHED src=9.163.211.64 dst=165.130.71.38 sport=3228 > > dport=1301 [UNREPLIED] src=165.130.71.38 dst=9.163.211.64 sport=1301 > > dport=3228 use=1 > > How can one half of the connection be established while the other half > > is unreplied? > > The ESTABLISHED indicates the TCP state, UNREPLIED indicates the conntrack > state. This is a TCP session that has only seen ACK in one direction, no > packets in the other. > > Almost related note: The connection is not ASSURED. I'm having trouble making sense of your explanation above. This line is supposed to describe a single connection, right? Established as a tcp state means the three packet handshake is complete? But that seems to contradict the unreplied. Is there any doc for stuff like this? - how to read the lines above - what exactly these things (unreplied, assured, established ...) mean - can I match on ASSURED ? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-01 17:49 ` Don Cohen @ 2002-07-02 7:58 ` Henrik Nordstrom 0 siblings, 0 replies; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-02 7:58 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel On Monday 01 July 2002 19.49, Don Cohen wrote: > > The ESTABLISHED indicates the TCP state, UNREPLIED indicates the > > conntrack state. This is a TCP session that has only seen ACK in > > one direction, no packets in the other. > > > > Almost related note: The connection is not ASSURED. > > I'm having trouble making sense of your explanation above. > This line is supposed to describe a single connection, right? > Established as a tcp state means the three packet handshake is > complete? But that seems to contradict the unreplied. See the archives. This was discussed to death some days ago. Summary in short: TCP state only indicates what kind of packets are currently seen on the connection. This can be derived from a single packet due to "connection pickup". > Is there any doc for stuff like this? > - how to read the lines above > - what exactly these things (unreplied, assured, established ...) > mean - can I match on ASSURED ? ASSURED can be matched using the new conntrack match found in patch-o-matic. Normally this flag is only used by conntrack to garbagecollect invalid entries in case of a DoS attempt. There isn't really much use of matching it in rulesets. Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
[parent not found: <15652.38084.704660.234319@isis.cs3-inc.com>]
* Re: conntrack performance/DoS formula [not found] ` <15652.38084.704660.234319@isis.cs3-inc.com> @ 2002-07-04 21:53 ` Henrik Nordstrom 2002-07-05 7:08 ` Don Cohen 0 siblings, 1 reply; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-04 21:53 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel On Thursday 04 July 2002 20.32, Don Cohen wrote: > [off list, but feel free to put it there if you think appropriate] It has been discussed to death on the list. Try reading the archives some weeks back (always a healthy exercise). You are welcome to forward this response to the netfilter-users mailinglist where it really belongs. But here we go again.. > My current understanding is that there are two matching conntrack > records here. The ESTABLISHED is the conntrack state of one, not > the tcp state. If it were the tcp state then this would be assured > and the other conntrack state would not be unreplied but also > established, right? > What's use, btw? ESTABLISHED is the TCP tracking state, not the conntrack state. The TCP tracking state ESTABLISHED is derived from the last packet seen on the connection (ACK). This is not to be confused with the conntrack "state" ESTABLISHED as used in the state match which is a actually packet state describing how THIS packet relates to the connection, not a connection state as such. What you have in conntrack for a TCP session is a) A set of conntrack state bits, describing the state of the connection: - SEEN_REPLY - ASSURED - EXPECTED b) A TCP tracking state, SYN_RECV, SYN_SENT, ESTABLISHED, TIME_WAIT and many more. Also keeps track of TCP window information and other crucial information for tracking a TCP session properly. The TCP tracking state mainly controls the timeouts, but also has some influence on what kind of packets conntrack expects to see on the connection. The TCP tracking state is not visible in your rulesets. The conntrack state bit SEEN_REPLY is pretty obvious what is represents. ASSURED is a little trickier. What it represents is that conntrack are sure this connection is for real. For TCP this requires that conntrack has seen the whole SYN->SYNACK->ACK handshake. EXPECTED is simply that this connection was expected by a conntrack protocol helper, for example FTP data channels expected due to commands on their FTP control channel. Now, if we take a look at the "state" match, it indicates four different packet states - INVALID - NEW - ESTABLISHED - RELATED These states has nothing to do with TCP, only packet flows. If conntrack cannot identify a session key for the packet or otherwise fails to establish connection tracking (i.e. OOM) then the paket state is INVALID. If the packet is the first packet conntrack has seen for a connection then it is a NEW (or RELATED where applicable). Note that this may well be an ACK or even a FIN, conntrack does not care, it only cares that it did not know about the connection before seeing the packet. If there has been any reply packets (including the first reply packet) then the packet state is ESTABLISHED. The ASSURED bit deserves a little more attention. As you may see from the above description of the four "state" types this bit is not used in any rulesets. The ASSURED bit is used by conntrack to defeat almost all DoS attempts. When conntrack runs out of entries it will throw away not ASSURED connections to make room for new connections. This very effectively stops almost all DoS attempts in filling the connection table. This is the last time I will answer this question. If anyone feels like using any of this in a FAQ then you are welcome. If I see much more chatter about these fundamental aspects of netfilter connection tracking that really SHOULD be common knowledge to all users (note: I am not talking about developers here.. developers looking at conntrack MUST understand these aspects) then I don't know what I will do, but I suspect I will stop following the netfilter-devel traffic. I am damn tired that so few people seem to understand what "iptables -m state --state NEW" or "--state ESTABLISED" means. Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-04 21:53 ` Henrik Nordstrom @ 2002-07-05 7:08 ` Don Cohen 2002-07-05 11:41 ` Henrik Nordstrom 0 siblings, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-07-05 7:08 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: netfilter-devel Henrik Nordstrom writes: > On Thursday 04 July 2002 20.32, Don Cohen wrote: > > [off list, but feel free to put it there if you think appropriate] > > It has been discussed to death on the list. Try reading the archives > some weeks back (always a healthy exercise). You are welcome to > forward this response to the netfilter-users mailinglist where it > really belongs. But here we go again.. I did read that, and it lead me to my mistaken conclusion. > ESTABLISHED is the TCP tracking state, not the conntrack state. I believe that now that I see the code: ip_conntrack_standalone.c static unsigned int print_conntrack(char *buffer, const struct ip_conntrack *conntrack) { ... proto name, proto number, timeout, len += print_tuple(buffer + len, &conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, proto); The > TCP tracking state ESTABLISHED is derived from the last packet seen > on the connection (ACK). This is not to be confused with the > conntrack "state" ESTABLISHED as used in the state match which is a > actually packet state describing how THIS packet relates to the > connection, not a connection state as such. This is what I was missing. I had seen and understood that conntrack states and tcp states were different. I had also understood, but evidently *incorrectly* that TCP states were the states as described in rfc793. The term "established" there means something different from what you describe above, more like what you call assured below. > What you have in conntrack for a TCP session is > > a) A set of conntrack state bits, describing the state of the > connection: > > - SEEN_REPLY > - ASSURED > - EXPECTED > > b) A TCP tracking state, SYN_RECV, SYN_SENT, ESTABLISHED, TIME_WAIT > and many more. Also keeps track of TCP window information and other > crucial information for tracking a TCP session properly. And these are the same names (or very similar) that show up in rfc793, which is why I thought they meant the same thing. If they don't mean the same thing it would be nice to know what they do mean. Especially since they are displayed from /proc. > The TCP tracking state mainly controls the timeouts, but also has some > influence on what kind of packets conntrack expects to see on the > connection. The TCP tracking state is not visible in your rulesets. > > The conntrack state bit SEEN_REPLY is pretty obvious what is > represents. > > ASSURED is a little trickier. What it represents is that conntrack are > sure this connection is for real. For TCP this requires that > conntrack has seen the whole SYN->SYNACK->ACK handshake. This is what I thought the tcp state ESTABLISHED meant ! > EXPECTED is simply that this connection was expected by a conntrack > protocol helper, for example FTP data channels expected due to > commands on their FTP control channel. > > > Now, if we take a look at the "state" match, it indicates four > different packet states > > - INVALID > - NEW > - ESTABLISHED > - RELATED > > These states has nothing to do with TCP, only packet flows. If > conntrack cannot identify a session key for the packet or otherwise > fails to establish connection tracking (i.e. OOM) then the paket > state is INVALID. > > If the packet is the first packet conntrack has seen for a connection > then it is a NEW (or RELATED where applicable). Note that this may > well be an ACK or even a FIN, conntrack does not care, it only cares > that it did not know about the connection before seeing the packet. > > If there has been any reply packets (including the first reply packet) > then the packet state is ESTABLISHED. > > > The ASSURED bit deserves a little more attention. As you may see from > the above description of the four "state" types this bit is not used > in any rulesets. The ASSURED bit is used by conntrack to defeat > almost all DoS attempts. When conntrack runs out of entries it will > throw away not ASSURED connections to make room for new connections. > This very effectively stops almost all DoS attempts in filling the > connection table. > > > This is the last time I will answer this question. If anyone feels > like using any of this in a FAQ then you are welcome. If I see much > more chatter about these fundamental aspects of netfilter connection > tracking that really SHOULD be common knowledge to all users (note: I > am not talking about developers here.. developers looking at > conntrack MUST understand these aspects) then I don't know what I > will do, but I suspect I will stop following the netfilter-devel > traffic. I am damn tired that so few people seem to understand what > "iptables -m state --state NEW" or "--state ESTABLISED" means. Clearly this stuff should be in the manual. That's what I read when I was trying to understand it. Perhaps in http://netfilter.samba.org/documentation/HOWTO//packet-filtering-HOWTO-7.html#ss7.3 ? So, returning to the original question: tcp 6 341427 ESTABLISHED src=165.130.74.240 dst=194.243.69.137 sport=1255 dport=80 [UNREPLIED] src=9.163.211.64 dst=165.130.74.240 sport=3128 dport=1255 use=1 This evidently means that only one packet in this connection has been seen, but it had the TCP ACK bit on. Hence "established". And it was also NATed, since the reply tuple is not just the reverse of the original. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-05 7:08 ` Don Cohen @ 2002-07-05 11:41 ` Henrik Nordstrom 2002-07-06 2:49 ` Don Cohen 0 siblings, 1 reply; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-05 11:41 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel On Friday 05 July 2002 09.08, Don Cohen wrote: > > TCP tracking state ESTABLISHED is derived from the last packet > > seen on the connection (ACK). This is not to be confused with > > the conntrack "state" ESTABLISHED as used in the state match > > which is a actually packet state describing how THIS packet > > relates to the connection, not a connection state as such. > > This is what I was missing. I had seen and understood that > conntrack states and tcp states were different. I had also > understood, but evidently *incorrectly* that TCP states were the > states as described in rfc793. The term "established" there means > something different from what you describe above, more like what > you call assured below. The TCP tracking states are approximations of RFC793. However, conntrack_tcp does not implement TCP, it only tries to derive the states of the involved TCP endpoints by looking at the transmitted packets. > > b) A TCP tracking state, SYN_RECV, SYN_SENT, ESTABLISHED, > > TIME_WAIT and many more. Also keeps track of TCP window > > information and other crucial information for tracking a TCP > > session properly. > > And these are the same names (or very similar) that show up in > rfc793, which is why I thought they meant the same thing. If they > don't mean the same thing it would be nice to know what they do > mean. Especially since they are displayed from /proc. They mean mostly the same thing, except for the small fact that conntrack is not a TCP endpoint so it has to make some assumptions and guesses, and a great deal of simplifications in the state transitions... and as it is tracking both TCP endpoints at the same time the states may differ slightly.. As conntrack is looking at packets it has to make state transitions is based on the packets it sees rather than actions taken by the application, and selects the state closest matching the packets it sees. To be on the safe side (in terms of tracking) conntrack does not enforce a strict TCP state diagram but just tries to gess "If I see this kind of packet, what can the current TCP state be". > So, returning to the original question: > tcp 6 341427 ESTABLISHED src=165.130.74.240 dst=194.243.69.137 > sport=1255 dport=80 [UNREPLIED] src=9.163.211.64 dst=165.130.74.240 > sport=3128 dport=1255 use=1 > > This evidently means that only one packet in this connection has > been seen, but it had the TCP ACK bit on. Hence "established". > And it was also NATed, since the reply tuple is not just the > reverse of the original. Correct, except for the minor detail that it can only have been a normal ACK packet (ACK,!SYN,!FIN,!RST). As the connection is not ASSURED it will be thrown away if there is a shortage of conntrack entries.. but as long as there is no shortage of conntrack entries it will stay around for a while in case there really is a TCP connection. Based on the port numbers used the NAT rule probably is a REDIRECT of HTTP to Squid.. Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-05 11:41 ` Henrik Nordstrom @ 2002-07-06 2:49 ` Don Cohen 0 siblings, 0 replies; 55+ messages in thread From: Don Cohen @ 2002-07-06 2:49 UTC (permalink / raw) To: Henrik Nordstrom; +Cc: netfilter-devel Henrik Nordstrom writes: > The TCP tracking states are approximations of RFC793. However, > conntrack_tcp does not implement TCP, it only tries to derive the > states of the involved TCP endpoints by looking at the transmitted > packets. I understand that there are limits to what conntrack can do. However, someone has taken the trouble to compute assured, and this seems like a *much* better approximation to tcp established than what is actually presented as the intended approximation. I guess now that you can match on assured the right functionality is there, but the current tcp established still seems like false advertising. While I'm at it, so what happens in case of NAT? The tuples are not the same in this case... Yep, that's what I realized later. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-30 19:40 ` Don Cohen 2002-07-01 8:07 ` Henrik Nordstrom @ 2002-07-02 14:55 ` Harald Welte 1 sibling, 0 replies; 55+ messages in thread From: Harald Welte @ 2002-07-02 14:55 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, netfilter-devel On Sun, Jun 30, 2002 at 12:40:09PM -0700, Don Cohen wrote: > Clearly one easy defense against one easy attack (as was mentioned in > private communication) is that whenever you want to add to a bucket > that is full, you should feel free to throw out the oldest UNREPLIED > connection in that bucket. if we're going to change the overall strategy into limiting the number of list-entries per bucket, this is definitely the way to go. just deleting unreplied in the case ip_conntrack_max is reached doesn't make sense anymore. > If you're interested in preventing attacks that consume connection > table space devoted to "real" connections I have some ideas, but > that's another large discussion. If lots of people out there want to > hear it I'll send to the list, otherwise perhaps it's better off the > list. please send it to the list, this way people can later read it for reference purpose, and you encourage more people to discuss it. > On a related subject, I'm worried that UNREPLIED might not mean > what I think it does. Your data contains things like: > tcp 6 387070 ESTABLISHED src=9.163.211.64 dst=165.130.71.38 sport=3228 > dport=1301 [UNREPLIED] src=165.130.71.38 dst=9.163.211.64 sport=1301 > dport=3228 use=1 > How can one half of the connection be established while the other half > is unreplied? Is this cause in one direction a syn was sent and a > syn-ack was received, and that makes it established, while in the > other direction a syn-ack was sent and nothing more was received so > it's unreplied? That doesn't make much sense. They should both be > established only if the entire handshake is complete, right? The problem here is most likely that cat /proc/net/ip_conntrack does not read out the conntrack entries atomically. We really should have ctnetlink for gathering the contents of the conntrack hash table... I'll try to have it ready soon. > You were describing unreplied connections that hang around for 5 days. > When I send some random syn's I see stuff like > cat /proc/net/ip_conntrack > tcp 6 113 SYN_SENT src=10.0.3.2 dst=10.0.2.3 sport=4096 ... > I'm guessing the 113 is the timeout, and that it was originally set to > 120, which still seems way too long to me, but a lot better than > 5 days. so now send a single ACK packet and see what happens... This is needed for connection pickup. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-28 19:03 ` Don Cohen 2002-06-28 19:35 ` Patrick Schaaf @ 2002-07-02 14:40 ` Harald Welte 2002-07-02 16:32 ` Patrick Schaaf 1 sibling, 1 reply; 55+ messages in thread From: Harald Welte @ 2002-07-02 14:40 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, netfilter-devel On Fri, Jun 28, 2002 at 12:03:38PM -0700, Don Cohen wrote: > Patrick Schaaf writes: > > I have real data from an IRC server (one of the german IRCnet hubs), and > > from several boxen providing transparent proxy service to dialup customers, > > 3000 customers per box peak, running DNS and squid towards the Internet. > > With the peak load, the proxy boxen have up to 100000 conntrack entries > > (according to slabinfo; /proc/net/ip_conntrack is too lame to show them all). > > > > Others will certainly be able to contribute data from production > > routing firewalls. I have one of those, too, but they only protect > > an ~50 workstation LAN, so these are not that significant. > > If you can put some data on a web page that I could download I'd be > happy to write and run the program that analyzes it and compares it > to the results expected for a uniform hash function. great. I'd definitley want to encourage the two of you to do some furtherr research on the hashing function. It's just a pity that I'm currently travelling and away from my test boxes, so I won't be of much help. Good work. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-02 14:40 ` Harald Welte @ 2002-07-02 16:32 ` Patrick Schaaf 2002-07-02 16:35 ` Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-02 16:32 UTC (permalink / raw) To: Harald Welte, Don Cohen, Patrick Schaaf, netfilter-devel > great. I'd definitley want to encourage the two of you to do some > furtherr research on the hashing function. It's good to hear the audience :) Everybody, have a look at http://bei.bof.de/ex1/. There's a bunch of pictures, which may become legible together with reading my recent mails on this thread. The "cost" listed for each line in each picture, is the sum of the squares of the bucket list lengths. It's a lame attempt at getting a single number out of the spread. The links below the pictures lead you to the bucket statistics in ASCII text form (output of my cttest program). The data these pictures are based on, is my sample from some days ago. Hope this is interesting to somebody :) best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-02 16:32 ` Patrick Schaaf @ 2002-07-02 16:35 ` Patrick Schaaf 2002-07-02 16:53 ` Henrik Nordstrom 2002-07-02 17:48 ` Don Cohen 0 siblings, 2 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-07-02 16:35 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Harald Welte, Don Cohen, netfilter-devel > Everybody, have a look at http://bei.bof.de/ex1/. There's a bunch of > pictures, which may become legible together with reading my recent > mails on this thread. Ah - maybe I should at least note that the x-axis is bucket length, on a log-2 scale, The y-axis is the number of occurences, log-10. later Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-02 16:35 ` Patrick Schaaf @ 2002-07-02 16:53 ` Henrik Nordstrom 2002-07-02 17:48 ` Don Cohen 1 sibling, 0 replies; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-02 16:53 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Patrick Schaaf wrote: > > Everybody, have a look at http://bei.bof.de/ex1/. There's a bunch of > > pictures, which may become legible together with reading my recent > > mails on this thread. > > Ah - maybe I should at least note that the x-axis is bucket length, > on a log-2 scale, The y-axis is the number of occurences, log-10. Any explanation for the odd tails with very long bucket chains in the original hash? In many cases the rest of the graph looks quite smooth not making one expect such tail, so I wonder if these may be an artefect of tomething else (bug, or incorrect input data). Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-02 16:35 ` Patrick Schaaf 2002-07-02 16:53 ` Henrik Nordstrom @ 2002-07-02 17:48 ` Don Cohen 2002-07-02 18:31 ` Patrick Schaaf 1 sibling, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-07-02 17:48 UTC (permalink / raw) To: Patrick Schaaf; +Cc: Harald Welte, netfilter-devel Patrick Schaaf writes: > > Everybody, have a look at http://bei.bof.de/ex1/. There's a bunch of > > pictures, which may become legible together with reading my recent > > mails on this thread. The only thing that looks really suspicious to me is the lump in original for large table sizes. I tried this myself: # time ./cttest -u -f ../ip_conntrack.2 -s 131073 filename: ../ip_conntrack.2 hash function: original number of buckets: 131073 maximum bucket list length: 24 total number of entries: 32866 Count Length CT 1 24 0.07% 1 19 0.13% 1 14 0.17% 2 8 0.22% 13 7 0.50% 31 6 1.06% 130 5 3.04% 276 4 6.40% 821 3 13.90% 3392 2 34.54% 21515 1 100.00% 104890 0 100.00% BTW 5 is already at the low end of the probability, as you see for the other two hashes. Then, as usual, show the data for the longest bucket, read it into my lisp program and take differences ... > (loop for x in data do (print (loop for z in (first data) as y in x collect (- z y)))) (0 0 0 0 0) (0 0 0 2 2) (0 0 0 4 4) (0 0 0 6 6) (0 0 0 8 8) (0 0 0 10 10) (0 0 0 12 12) (0 0 0 14 14) (0 -498354813 738840907 37187 38280) (0 0 0 -6 -6) (0 0 0 -20 -20) (0 0 0 -27 -27) (0 0 0 -2 -2) (0 0 0 -16 -16) (0 0 0 -37 -37) (0 0 0 -12 -12) (0 0 0 -33 -33) (0 0 0 -8 -8) (0 0 0 -29 -29) (0 0 0 -18 -18) (0 0 0 -25 -25) (0 0 0 -4 -4) (0 0 0 -35 -35) (0 0 0 -31 -31) I also changed cttest to print hash data (all hex): sip 100ce87 dip 100ce87 sport d995 dport d895 proto 6 result 3e4f98db sip 100ce87 dip 100ce87 sport d795 dport d695 proto 6 result 3e4b98d9 sip 100ce87 dip 100ce87 sport d595 dport d495 proto 6 result 3e4798d7 sip 100ce87 dip 100ce87 sport d395 dport d295 proto 6 result 3e4398d5 sip 100ce87 dip 100ce87 sport d195 dport d095 proto 6 result 3e3f98d3 sip 100ce87 dip 100ce87 sport cf95 dport ce95 proto 6 result 3e3b98d1 sip 100ce87 dip 100ce87 sport cd95 dport cc95 proto 6 result 3e3798cf sip 100ce87 dip 100ce87 sport cb95 dport ca95 proto 6 result 3e3398cd sip 7e4a82a5 dip b62ec45b sport 9604 dport 5000 proto 6 result a2d7eca sip 100ce87 dip 100ce87 sport df95 dport de95 proto 6 result 3e5b98e1 sip 100ce87 dip 100ce87 sport ed95 dport ec95 proto 6 result 3e7798ef sip 100ce87 dip 100ce87 sport f495 dport f395 proto 6 result 3e8598f6 sip 100ce87 dip 100ce87 sport db95 dport da95 proto 6 result 3e5398dd sip 100ce87 dip 100ce87 sport e995 dport e895 proto 6 result 3e6f98eb sip 100ce87 dip 100ce87 sport fe95 dport fd95 proto 6 result 3e999900 sip 100ce87 dip 100ce87 sport e595 dport e495 proto 6 result 3e6798e7 sip 100ce87 dip 100ce87 sport fa95 dport f995 proto 6 result 3e9198fc sip 100ce87 dip 100ce87 sport e195 dport e095 proto 6 result 3e5f98e3 sip 100ce87 dip 100ce87 sport f695 dport f595 proto 6 result 3e8998f8 sip 100ce87 dip 100ce87 sport eb95 dport ea95 proto 6 result 3e7398ed sip 100ce87 dip 100ce87 sport f295 dport f195 proto 6 result 3e8198f4 sip 100ce87 dip 100ce87 sport dd95 dport dc95 proto 6 result 3e5798df sip 100ce87 dip 100ce87 sport fc95 dport fb95 proto 6 result 3e9598fe sip 100ce87 dip 100ce87 sport f895 dport f795 proto 6 result 3e8d98fa It seems that adding n to the two ports (which is what happens almost every time here) gives a difference in the hash function of n*131073 !! static u32 hash_conntrack(struct ct_key *key) { u32 ans; ans= ntohl(key->sip + key->dip + key->sport + key->dport + key->proto) + ntohs(key->sport); printf("sip %x dip %x sport %x dport %x proto %x result %x\n", key->sip, key->dip, key->sport, key->dport, key->proto, ans); return ans; } I think this is related to the ntohl and ntohs. If all the data were 0 the sum would be 0. When you add 1 to sport and dport you get something like 2^16 + 2 ? which is 2 x our unlucky table size, right? Or something like that. And similar things happen with table size 2^n-1. Basically the problem with this sort of hash function is that you get collisions when you have a lot of IP and port pairs that are close to each other with the wrong sort of correlation. Have you tried the ABCD suggestion I posted? Just as a quick test: static u32 hash_conntrack(struct ct_key *key) { return 0x47441DFB * key->sip + 0x57655A7D * key->dip + 0x1C7F1D2D * key->sport + 0xDF91D738 * key->dport + key->proto; coefficients chosen at random from [0, 2^32) # time ./cttest -u -f ../ip_conntrack.2 -s 131073 filename: ../ip_conntrack.2 hash function: original number of buckets: 131073 maximum bucket list length: 5 total number of entries: 32866 Count Length CT 1 5 0.02% 15 4 0.20% 253 3 2.51% 3180 2 21.86% 25682 1 100.00% 101942 0 100.00% Longest bucket with 131071 is also 5, with 131072 is 36 (as advertised!). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-02 17:48 ` Don Cohen @ 2002-07-02 18:31 ` Patrick Schaaf 2002-07-02 21:52 ` cttest-0.1 Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-02 18:31 UTC (permalink / raw) To: Don Cohen; +Cc: Patrick Schaaf, Harald Welte, netfilter-devel > > > Everybody, have a look at http://bei.bof.de/ex1/. There's a bunch of > > > pictures, which may become legible together with reading my recent > > > mails on this thread. > > The only thing that looks really suspicious to me is the lump in > original for large table sizes. Note that due to the y-axis log-10 scale, the tail is more visible; the far out tail does not consist of many buckets. I agree with your correlation observation for the original hash. portscans, again? With an overcrowded bucket table (the ~7000 entry cases), there is almost no difference between the original hash, and the alternatives. They are all equally bad (because of the overcrowding). With a wide bucket table (130000 and 260000 buckets for 65000 entries), the general badness of the original hash clearly shows in the shape of the curve. > Have you tried the ABCD suggestion I posted? > Just as a quick test: > static u32 hash_conntrack(struct ct_key *key) > { > return > 0x47441DFB * key->sip > + 0x57655A7D * key->dip > + 0x1C7F1D2D * key->sport > + 0xDF91D738 * key->dport > + key->proto; I now added this to cttest (new hash called 'abcd'), and updated the pictures on the website accordingly. Looks good for odd table sizes, and detoriates badly for even table sizes - like you said. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* cttest-0.1 2002-07-02 18:31 ` Patrick Schaaf @ 2002-07-02 21:52 ` Patrick Schaaf 2002-07-03 4:15 ` cttest-0.1 Joakim Axelsson 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-02 21:52 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Hi all, I have put a tarball at http://bei.bof.de/cttest-0.1.tar.gz Unpack, look at README, and reproduce the gnuplot pictures I have mentioned earlier today (at http://bei.bof.de/ex1/) I would love to see results from other kinds of workloads. thanks in advance Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-02 21:52 ` cttest-0.1 Patrick Schaaf @ 2002-07-03 4:15 ` Joakim Axelsson 2002-07-05 15:37 ` cttest-0.1 Martin Josefsson 2002-07-05 16:10 ` cttest-0.1 Joakim Axelsson 0 siblings, 2 replies; 55+ messages in thread From: Joakim Axelsson @ 2002-07-03 4:15 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel 2002-07-02 23:52:06+0200, Patrick Schaaf <bof@bof.de> -> > Hi all, > > I have put a tarball at http://bei.bof.de/cttest-0.1.tar.gz > Unpack, look at README, and reproduce the gnuplot pictures I have > mentioned earlier today (at http://bei.bof.de/ex1/) > > I would love to see results from other kinds of workloads. > > thanks in advance > Patrick I've tested it on our router data. The router holds one mayor DALnet server (~7000K users) and one minor EFnet-server (~700 users) behind. Also about 1000 student computers. The DALnet-server was attack today and yesterday with 70K pps icmp reply. However I filtered that (in mangle PREROUTING) fast and I don't know how much impact is show in the conntrack. I also have limiters on state NEW to prevent attacks to fill the conntrack. Graphs at: http://aaricia.hemmet.chalmers.se/~gozem/cttest/ Stats for the actuall running conntrack is: $MODPROBE ip_conntrack hashsize=131072 echo 262144 > /proc/sys/net/ipv4/ip_conntrack_max (Yes meaning average bucket-length to be 2 not 8). Conntrack was using about 50K slabs according to /proc/slabinfo However looking at the stats, I will change hashsize to a primenumber fast :-) I collected this data 5am during absolut low time. I'll try again later at primetime :-) One comment: In your script ctplot you have an absolute path to gnuplot which I guess not everyone has the same as you :-) -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-03 4:15 ` cttest-0.1 Joakim Axelsson @ 2002-07-05 15:37 ` Martin Josefsson 2002-07-05 16:10 ` cttest-0.1 Joakim Axelsson 1 sibling, 0 replies; 55+ messages in thread From: Martin Josefsson @ 2002-07-05 15:37 UTC (permalink / raw) To: Joakim Axelsson; +Cc: Patrick Schaaf, netfilter-devel Hi, I've also done some plotting. about 400 student-machines behind router, it's summer now so there's not as much traffic as the rest of the year, only ~28k entries. hashsize is 16384 with default ip_conntrack_max of 131072. http://gandalf.hjorten.nu/kna-gw/ I added 16384 as size in ctreport as this is what's currently used. There's a huge diffrence between 16384 and 1638[35], from a maximum length of 181 down to 9. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-03 4:15 ` cttest-0.1 Joakim Axelsson 2002-07-05 15:37 ` cttest-0.1 Martin Josefsson @ 2002-07-05 16:10 ` Joakim Axelsson 2002-07-05 16:54 ` cttest-0.1 Patrick Schaaf 1 sibling, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-05 16:10 UTC (permalink / raw) To: Patrick Schaaf, netfilter-devel 2002-07-03 06:15:02+0200, Joakim Axelsson <gozem@aaricia.hemmet.chalmers.se> -> > > I collected this data 5am during absolut low time. I'll try again later > at primetime :-) > Here is a new stat with about 85K entries: http://aaricia.hemmet.chalmers.se/~gozem/cttest-2002-07-05_1739/ Look at the 131072 original hash (which we are running until i dare reboot the router :-) "maximum bucket list length: 9471" -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-05 16:10 ` cttest-0.1 Joakim Axelsson @ 2002-07-05 16:54 ` Patrick Schaaf 2002-07-05 16:53 ` cttest-0.1 Joakim Axelsson 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-05 16:54 UTC (permalink / raw) To: netfilter-devel Martin, Joakim, thank you for your cttest runs. They both provide new data points, coming from a router instead of a server. They confirm that we should at least strongly avoid even hash bucket counts. In all examples we have now, both the original hash, and Don's abcd hash detoriate badly. The interesting new theme, when comparing my "transproxy server" statistics with your router statistics, is that for the odd bucket sizes, the original hash does a lot better in the router case than it does in the server case. Today, I wrote a new comparison, using rdtsc() to time the CPU usage of the hash calculation within cttest. I'll make a cttest-0.2 soon, so everybody can reproduce. However, here's the preliminary result: on a 1Ghz P-III machine, compiled with march=i686, the original hash function takes 55ns per call. The abcd hash is better, at 43ns per call. After inlining the crc32() function, the crc32 hash 185ns per call. My experience is that a one-longer hash list, when scanned, needs one more memory roundtrip at about 200ns latency. Maybe it's double that with the current ip_conntrack layout, but I'll whine about this at another time. It is clear that even the cost of the crc32 calculation is negligible if we can save one on the list length on average. Overall, these results seem to call for this set of actions: - make the hash bucket count at least individible by 2. This should go as a strong suggestion into the documentation, and should be implemented in the default initialization code. Anybody volunteering for one or the other? I'll do the code part, but I won't do the docs. - given the clear advantage of the abcd hash over the original hash in the server case, make the abcd hash at least a compile time option, or better yet, a module load time option. I can do that, too. For the code, would one or two patch-o-matic type patches against 2.4.18-rc1 be the desired deliverable? best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-05 16:54 ` cttest-0.1 Patrick Schaaf @ 2002-07-05 16:53 ` Joakim Axelsson 2002-07-06 6:10 ` cttest-0.1 Andrew Smith 0 siblings, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-05 16:53 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel 2002-07-05 18:54:15+0200, Patrick Schaaf <bof@bof.de> -> > > - make the hash bucket count at least individible by 2. This should go > as a strong suggestion into the documentation, and should be implemented > in the default initialization code. Anybody volunteering for one or > the other? I'll do the code part, but I won't do the docs. Well, I think we even can force people to use an odd count. if (hashsize%2 == 0) hashsize--; -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-05 16:53 ` cttest-0.1 Joakim Axelsson @ 2002-07-06 6:10 ` Andrew Smith 2002-07-06 7:12 ` cttest-0.1 Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Andrew Smith @ 2002-07-06 6:10 UTC (permalink / raw) To: netfilter-devel > 2002-07-05 18:54:15+0200, Patrick Schaaf <bof@bof.de> -> >> >> - make the hash bucket count at least individible by 2. This should go >> as a strong suggestion into the documentation, and should be >> implemented in the default initialization code. Anybody volunteering >> for one or the other? I'll do the code part, but I won't do the >> docs. > > Well, I think we even can force people to use an odd count. > > if (hashsize%2 == 0) > hashsize--; > > > -- > /Joakim Axelsson A.K.A Gozem@EFnet & OPN Just so you don't underestimate a requirement ... :-) if ((hashsize & 1) == 0) hashsize++; (assume we are checking positive numbers ;-) -- -Cheers -Andrew MS ... if only he hadn't been hang gliding! ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 6:10 ` cttest-0.1 Andrew Smith @ 2002-07-06 7:12 ` Patrick Schaaf 2002-07-06 15:23 ` cttest-0.1 Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-06 7:12 UTC (permalink / raw) To: Andrew Smith; +Cc: netfilter-devel > > if (hashsize%2 == 0) > > hashsize--; > > Just so you don't underestimate a requirement ... :-) Underestimating is desirable here. User gives upper bound - observe it! > (assume we are checking positive numbers ;-) We are, the variable in question, is unsigned. Actually, I now have this nice <128-byte-icache snippet which computes the next-smaller-prime, given any 32 bit unsigned. And it's fast. If the core team doesn't puke, that's what I would like to do at conntrack module load time. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 7:12 ` cttest-0.1 Patrick Schaaf @ 2002-07-06 15:23 ` Patrick Schaaf 2002-07-06 21:14 ` cttest-0.1 Joakim Axelsson 0 siblings, 1 reply; 55+ messages in thread From: Patrick Schaaf @ 2002-07-06 15:23 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel Hi all, I have released http://bei.bof.de/cttest-0.2.tar.gz. It is now easier to modify the list of sizes and hashes that should be plotted, and there's timing code for the hash functions, together with average used list length analysis. > Actually, I now have this nice <128-byte-icache snippet which computes the > next-smaller-prime, given any 32 bit unsigned. And it's fast. If somebody wants to review this code, have a look at cttest-0.2/prime.c The pictures at http://bei.bof.de/ex1/ have been updated to the produce of cttest-0.2. I'd like to see some runs from Athlon and Pentium4 systems, to compare the hash function timings. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 15:23 ` cttest-0.1 Patrick Schaaf @ 2002-07-06 21:14 ` Joakim Axelsson 2002-07-06 22:41 ` cttest-0.1 Joakim Axelsson 2002-07-06 22:54 ` cttest-0.1 Joakim Axelsson 0 siblings, 2 replies; 55+ messages in thread From: Joakim Axelsson @ 2002-07-06 21:14 UTC (permalink / raw) To: Patrick Schaaf; +Cc: netfilter-devel 2002-07-06 17:23:42+0200, Patrick Schaaf <bof@bof.de> -> > Hi all, > > I have released http://bei.bof.de/cttest-0.2.tar.gz. It is now easier > to modify the list of sizes and hashes that should be plotted, and > there's timing code for the hash functions, together with average > used list length analysis. > Me and Martin Josefsson has tested the new cttest-0.2. Martin also took the idea of using the hash-function that the routingcache uses in Linux: res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); res ^= key->sip ^ key->tos; res ^= (res >> 16); res ^= (res >> 8);); Together we tried to make one that fitted for conntrack. Since we don't understand completly why the things are done as they are, there might be some more tuning where can be done to this function: static u32 hash_rt(struct ct_key *key) { u32 res; PER_HASH_TIMER_1( res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); res ^= key->sip ^ key->proto; res ^= key->dport ^ key->sport; res ^= (res >> 16); res ^= (res >> 8); ); return res; } Also with this idea of using power of (^), we tested the abcd hash using power of instead of just adding the values up: static u32 hash_abcd_power(struct ct_key *key) { u32 res; PER_HASH_TIMER_1( res = (0x47441DFB * key->sip) ^ (0x57655A7D * key->dip) ^ (0x1C7F1D2D * key->sport) ^ (0xDF91D738 * key->dport) ^ key->proto; ); return res; } Results can be viewed at the following url with my 85K entry conntrack: http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/rt/ Conculsions are that the rt-hash is just as fast as abcd and has a good distribusion, even on the 2^n hash-sizes. This concludes us and with the fact that we are going to use prime-hashsizes that this rt-hash is the best alternitive we have right now. rt uses about 21 cycles on my CPU and abcd about 22. crc32 is up on 134. The tests was computed on my (Dual) Athlon MP 1800+ (1533Mhz), with ECC/REG DDR266 memory. Ofcouse only using one CPU. For the test of your prime.c I can't say more than its not even measureable on my computer. But I got a very fast CPU. >time ./prime 3457675589 3457675579 0.000u 0.000s 0:00.00 0.0% 0+0k 0+0io 87pf+0w -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 21:14 ` cttest-0.1 Joakim Axelsson @ 2002-07-06 22:41 ` Joakim Axelsson 2002-07-06 23:16 ` cttest-0.1 Joakim Axelsson 2002-07-06 22:54 ` cttest-0.1 Joakim Axelsson 1 sibling, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-06 22:41 UTC (permalink / raw) To: Patrick Schaaf, netfilter-devel 2002-07-06 23:14:51+0200, Joakim Axelsson <gozem@aaricia.hemmet.chalmers.se> -> > > Me and Martin Josefsson has tested the new cttest-0.2. Martin also took the > idea of using the hash-function that the routingcache uses in Linux: > Some more on this issue. Martin discovered that changing "res ^= (res >> 16);" into "res ^= (res >> 24);" gave a much better result. Actually better than any of the hash-functions we have. To the smalest calculation price. The somewhat bad distribution on 2^n hashsizes has now vanished. Now to make it even more impossible to attack I added a A,B factor to it: "res ^= A ^ B;" Same A and B as in abcd* static u32 hash_rt_ab(struct ct_key *key) { u32 res; PER_HASH_TIMER_1( res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); res ^= key->sip ^ key->proto; res ^= key->dport ^ key->sport; res ^= 0x47441DFB ^ 0x57655A7D; res ^= (res >> 24); res ^= (res >> 8); ); return res; } This addition of A,B only adds 2 more cycles on my CPU (from about 20 to about 22). In my believe is this hashfunction what we are looking for. Two random constanst makes sure that a studied attack can't fill just any bucket, not knowing which one as might be possible with only one random constant. I also tested Don's abcdef. Conclusion is that abcdef is about the same as abcd or worse and takes longer time to caluclate. From about 22 to about 25. On my test-data. It still has a bad distribution on 2^n hashsizes. Results are from my calculations: http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/rt_ab/ And from Martin who did test alot of attacking tools against his lab-router: Look at each index.html to what type of attack it was. http://gandalf.hjorten.nu/cttest/kna-gw-rt24-rt_ab-abcdef/ http://gandalf.hjorten.nu/cttest/labbrouter-172000-randip-randport-rt24-rt_ab-abcdef/ http://gandalf.hjorten.nu/cttest/labbrouter-172000-same-rt24-rt_ab-abcdef/ The modified cttest.c we have used can be found at: http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/cttest.c -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 22:41 ` cttest-0.1 Joakim Axelsson @ 2002-07-06 23:16 ` Joakim Axelsson 2002-07-07 2:30 ` cttest-0.1 Svenning Sorensen 0 siblings, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-06 23:16 UTC (permalink / raw) To: Patrick Schaaf, netfilter-devel 2002-07-07 00:41:09+0200, Joakim Axelsson <gozem@aaricia.hemmet.chalmers.se> -> > > Now to make it even more impossible to attack I added a A,B factor to it: > "res ^= A ^ B;" Same A and B as in abcd* > > static u32 hash_rt_ab(struct ct_key *key) > { > u32 res; > > PER_HASH_TIMER_1( > res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); > res ^= key->sip ^ key->proto; > res ^= key->dport ^ key->sport; > res ^= 0x47441DFB ^ 0x57655A7D; > res ^= (res >> 24); > res ^= (res >> 8); > ); > > return res; > } > I guess you all are begining to get a little tired of my mails :-). Anyhow on our little misstake what ^ really does in C (should have known better :-). I guess I seldom use xor in my c-code.) res ^= 0x47441DFB ^ 0x57655A7D is kinda of useless then. So I changed it into: res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); res ^= key->sip ^ key->proto; res ^= key->dport ^ key->sport; res ^= 0x47441DFB; res ^= (res >> 24); res ^= (res >> 8); res ^= 0x57655A7D; Results: http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/rt_ab_xor/ New cttest.c (same url as before): http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/cttest.c -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 23:16 ` cttest-0.1 Joakim Axelsson @ 2002-07-07 2:30 ` Svenning Sorensen 2002-07-07 4:23 ` cttest-0.1 Joakim Axelsson 0 siblings, 1 reply; 55+ messages in thread From: Svenning Sorensen @ 2002-07-07 2:30 UTC (permalink / raw) To: Joakim Axelsson, Patrick Schaaf, netfilter-devel At 01:16 07-07-2002, Joakim Axelsson wrote: >I guess you all are begining to get a little tired of my mails :-). Anyhow >on our little misstake what ^ really does in C (should have known better >:-). I guess I seldom use xor in my c-code.) res ^= 0x47441DFB ^ 0x57655A7D >is kinda of useless then. So I changed it into: > >res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); >res ^= key->sip ^ key->proto; >res ^= key->dport ^ key->sport; >res ^= 0x47441DFB; >res ^= (res >> 24); >res ^= (res >> 8); >res ^= 0x57655A7D; Heh, still as useless... :) Two random 32 bit values xor'ed gives, well, 32 random bits. And you don't get more randomness into the first value by shifting and xor'ing with itself. But actually, I don't see how it buys you anything to xor with a random value. It doesn't effect the distribution of the hash values, so an attacker can still attack a single bucket. He just doesn't know which one, but I bet he doesn't care :) A multiplication would be slightly better (and more expensive, cpu-wise), since that would make it harder to predict which input bits change which output bits. Furthermore, you need to perform the two prf functions on different parts of the input, so any variation in one part of the input can't be easily nullified by the inverse variation in the other part. Maybe it would also make sense to distribute the source & destination ports across all 32 bits, and similarly mix the high and low parts better, like this (completely untested): res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); res ^= (res * secret_value_a); res ^= key->sip ^ key->proto; res ^= (key->dport << 16) ^ key->sport; res ^= (res * secret_value_b); res ^= ((res >> 25) | (res << 7)); The compiler should optimize the two last shifts into a single rotate instruction. Svenning ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-07 2:30 ` cttest-0.1 Svenning Sorensen @ 2002-07-07 4:23 ` Joakim Axelsson 2002-07-07 5:46 ` cttest-0.1 Joakim Axelsson 0 siblings, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-07 4:23 UTC (permalink / raw) To: Svenning Sorensen; +Cc: Patrick Schaaf, netfilter-devel 2002-07-07 04:30:00+0200, Svenning Sorensen <sss@sss.dnsalias.net> -> > At 01:16 07-07-2002, Joakim Axelsson wrote: > >I guess you all are begining to get a little tired of my mails :-). Anyhow > >on our little misstake what ^ really does in C (should have known better > >:-). I guess I seldom use xor in my c-code.) res ^= 0x47441DFB ^ 0x57655A7D > >is kinda of useless then. So I changed it into: > > > >res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); > >res ^= key->sip ^ key->proto; > >res ^= key->dport ^ key->sport; > >res ^= 0x47441DFB; > >res ^= (res >> 24); > >res ^= (res >> 8); > >res ^= 0x57655A7D; > > Heh, still as useless... :) > Two random 32 bit values xor'ed gives, well, 32 random bits. > And you don't get more randomness into the first value by shifting and > xor'ing with itself. > Thx for the input! > But actually, I don't see how it buys you anything to xor with a random value. > It doesn't effect the distribution of the hash values, so an attacker can > still attack a single bucket. He just doesn't know which one, but I bet he > doesn't care :) > Well, since the value is later % with the hashsize it might matters somewhat, but not as much as a multiplication. Anyhow, you are right. > A multiplication would be slightly better (and more expensive, cpu-wise), > since that would make it harder to predict which input bits change which > output bits. Furthermore, you need to perform the two prf functions on > different parts of the input, so any variation in one part of the input > can't be easily nullified by the inverse variation in the other part. > > Maybe it would also make sense to distribute the source & destination ports > across all 32 bits, and similarly mix the high and low parts better, > like this (completely untested): > Yepp, the spread of ports can be good. > res = ((key->dip & 0xF0F0F0F0) >> 4) | ((key->dip & 0x0F0F0F0F) << 4); > res ^= (res * secret_value_a); > res ^= key->sip ^ key->proto; > res ^= (key->dport << 16) ^ key->sport; > res ^= (res * secret_value_b); > res ^= ((res >> 25) | (res << 7)); > > The compiler should optimize the two last shifts into a single rotate instruction. > Why this rotate? I tested your code and concluded two things: 1. res ^= (res * value) is not good. You can't bring "parts" of the res it self to xor with. A plain "res *= value" was better. 2. res ^= ((res >> 25) | (res << 7)); did not do the same fine thing as our >> 24 followed by >> 8. It gave a second "hill" as a tail. Not giving a fine line. I also noticed that its using dip in the first shifts. Now we (me and Martin) only got router conntrack-data where dip is a wide range. But on an end machine/server there will to 99% only be one dip present. Considering packets coming TO the machine. It will take more packets than it gives on an attack. Remember also the orignal routing cache wants to know where the packet are heading. We want to know where it came from. If anyone has more ideas on this one I'll happly hear them. So, I swaped placed on dip and sip. The swaping didn't seams to affect anything on my input data. Ending in this function: res = ((key->sip & 0xF0F0F0F0) >> 4) | ((key->sip & 0x0F0F0F0F) << 4); res *= 0x47441DFB; res ^= key->dip ^ key->proto; res ^= (key->dport << 16) ^ key->sport; res *= 0x57655A7D; res ^= (res >> 24); res ^= (res >> 8); I think this solution will make the two random constanst enough independent to make it impossible to attack one, any, bucket. Better solutions on the two constants are welcome. This hash takes about 25 cycles on my CPU compared to the plain first rt-hash which takes about 20 cycles. The first rt_ab I made takes about 22 cyles. Results on: http://aaricia.hemmet.chalmers.se/~gozem/cttest-0.2/rt_new I called your idea rt_sven, my new idea rt_new. I only displayed the 4 rt-hashes and random to get some resolutions in the grafs. abcd and original only kills them with long tails. -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-07 4:23 ` cttest-0.1 Joakim Axelsson @ 2002-07-07 5:46 ` Joakim Axelsson 2002-07-07 11:00 ` cttest-0.1 Henrik Nordstrom 0 siblings, 1 reply; 55+ messages in thread From: Joakim Axelsson @ 2002-07-07 5:46 UTC (permalink / raw) To: Svenning Sorensen, Patrick Schaaf, netfilter-devel 2002-07-07 06:23:41+0200, Joakim Axelsson <gozem@aaricia.hemmet.chalmers.se> -> > > res = ((key->sip & 0xF0F0F0F0) >> 4) | ((key->sip & 0x0F0F0F0F) << 4); > res *= 0x47441DFB; > res ^= key->dip ^ key->proto; > res ^= (key->dport << 16) ^ key->sport; > res *= 0x57655A7D; > res ^= (res >> 24); > res ^= (res >> 8); > > I think this solution will make the two random constanst enough independent > to make it impossible to attack one, any, bucket. Better solutions on the > two constants are welcome. > It can still be calculated backwards to attack i think. However leting one constant multiply (the first) and one xor should make it harder to attack one any bucket. It even gives a slightly better distribution and removing one muliplication is good. I also made an or of the ports. Makes no difference but in reading. res = ((key->sip & 0xF0F0F0F0) >> 4) | ((key->sip & 0x0F0F0F0F) << 4); res *= 0x47441DFB; res ^= key->dip ^ key->proto; res ^= ((key->dport << 16) | key->sport); res ^= 0x57655A7D; res ^= (res >> 24); res ^= (res >> 8); Now someone else have to take a look at it. I digged my head into a corner. -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-07 5:46 ` cttest-0.1 Joakim Axelsson @ 2002-07-07 11:00 ` Henrik Nordstrom 0 siblings, 0 replies; 55+ messages in thread From: Henrik Nordstrom @ 2002-07-07 11:00 UTC (permalink / raw) To: Joakim Axelsson, netfilter-devel On Sunday 07 July 2002 07.46, Joakim Axelsson wrote: > res = ((key->sip & 0xF0F0F0F0) >> 4) | ((key->sip & 0x0F0F0F0F) << > 4); res *= 0x47441DFB; > res ^= key->dip ^ key->proto; > res ^= ((key->dport << 16) | key->sport); > res ^= 0x57655A7D; > res ^= (res >> 24); > res ^= (res >> 8); > > > Now someone else have to take a look at it. I digged my head into a > corner. Hmm... > res ^= key->dip ^ key->proto; > res ^= ((key->dport << 16) | key->sport); can easily nullify itself.. here you are just xor:ing the destination IP with the source and destination ports, which makes a highly predictable hash pattern.. To always hit the same hash bucket you just need to keep the source IP constant, and the result of the above XOR constant.. > res ^= 0x57655A7D; > res ^= (res >> 24); > res ^= (res >> 8); And this is twofold.. it both obfuscates the bits and probably makes it likelier an attacker can find bits that nullify eachother, especially so if the hash size is not prime (or at a minimum odd).. These 5 xor statements xor:s quite many sources into the lower bits, always in the same order.. I am pretty sure you will find very interesting iterference patterns if you unwind these xors. Regards Henrik ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: cttest-0.1 2002-07-06 21:14 ` cttest-0.1 Joakim Axelsson 2002-07-06 22:41 ` cttest-0.1 Joakim Axelsson @ 2002-07-06 22:54 ` Joakim Axelsson 1 sibling, 0 replies; 55+ messages in thread From: Joakim Axelsson @ 2002-07-06 22:54 UTC (permalink / raw) To: Patrick Schaaf, netfilter-devel 2002-07-06 23:14:51+0200, Joakim Axelsson <gozem@aaricia.hemmet.chalmers.se> -> > > Also with this idea of using power of (^), we tested the abcd hash using > power of instead of just adding the values up: > > static u32 hash_abcd_power(struct ct_key *key) > { > u32 res; > > PER_HASH_TIMER_1( > res = (0x47441DFB * key->sip) > ^ (0x57655A7D * key->dip) > ^ (0x1C7F1D2D * key->sport) > ^ (0xDF91D738 * key->dport) > ^ key->proto; > ); > return res; > } > Oops. ^ is xor in C, not "power of". Hehe, thinking in too much mathemaical terms here. Guess that tells you that we really dont know what we're doing. Anyhow. Our rt_ab seams to be very good, thats what matters :-) -- /Joakim Axelsson A.K.A Gozem@EFnet & OPN ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-06-27 20:46 conntrack performance/DoS formula Don Cohen 2002-06-28 6:23 ` Patrick Schaaf @ 2002-07-02 14:38 ` Harald Welte 1 sibling, 0 replies; 55+ messages in thread From: Harald Welte @ 2002-07-02 14:38 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel On Thu, Jun 27, 2002 at 01:46:28PM -0700, Don Cohen wrote: > > I had mentioned that I was hoping to remember/find the appropriate > formula for the probability of a hash bucket reaching its limit. > I got out my prob-stat book from I-won't-mention-how-many years ago. [thanks for your detailad analysis, I will try to look at all the details ASAP]. > My guess is that the amount of time required to do 20 iterations > comparing connection data with elements of a list in a bucket is > very small compared to the amount of time otherwise spent on a > packet. On the other hand, a hash bucket is also pretty cheap. > The data for a connection takes much more space than 2 hash buckets, > or even 10. And with 10L buckets the chance of overflow at M=10 > is only 1E-19, for M=20 it's somewhere around 1E-40. ok the question is now to test this in the real world, which can be fairly easily done, as the hashsize is configurable. > So, there you have it. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) ^ permalink raw reply [flat|nested] 55+ messages in thread
[parent not found: <20020701121404.B78724512@lists.samba.org>]
* Re: conntrack performance/DoS formula [not found] <20020701121404.B78724512@lists.samba.org> @ 2002-07-01 21:30 ` Don Cohen 2002-07-02 6:05 ` Patrick Schaaf 0 siblings, 1 reply; 55+ messages in thread From: Don Cohen @ 2002-07-01 21:30 UTC (permalink / raw) To: netfilter-devel, bof Patrick Schaaf writes: > > You can now use '-c NUMBER' to set a seed for the CRC32 hash. You can now > > use '-C' (without parameter) to set a random seed for the CRC32 hash. > > These options are for testing the idea of "attack compensation by > > inserting local variance". I'd like to hear opinions on this. > At a higher level, though, I don't see much value in preventing an > attacker from filling a single bucket when he can almost as easily do > much more harm by filling the entire conntrack database! On the other hand, if we do have a way to prevent an attacker from filling the conntrack database, this does become interesting again. Assuming, of course, the attacker can't skew the crc hash results by feeding you particular data without knowing your seed. I still hope to hear from some experts out there on this subject. And now, since I think this is a useful approach it's worth while to mention the following: static u32 hash_crc32(struct ct_key *key) { u32 crc = crc32_seed; u32 v32; u16 v16; crc = crc32(crc, (const char *) &key->proto, sizeof(key->proto)); v32 = ntohl(key->sip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v32 = ntohl(key->dip); crc = crc32(crc, (const char *) &v32, sizeof(v32)); v16 = ntohs(key->sport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); v16 = ntohs(key->dport); crc = crc32(crc, (const char *) &v16, sizeof(v16)); return crc; } Isn't that the same (modulo order of data) as this? return crc32(crc, (const char *) key, sizeof(struct ct_key)); On my 200MHz machine your crc my change hash_conntrack 1.5usec 1.1usec .2usec gcc -o cttest cttest.c 1.1usec .53usec .14usec gcc -O9 -o cttest cttest.c ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: conntrack performance/DoS formula 2002-07-01 21:30 ` Don Cohen @ 2002-07-02 6:05 ` Patrick Schaaf 0 siblings, 0 replies; 55+ messages in thread From: Patrick Schaaf @ 2002-07-02 6:05 UTC (permalink / raw) To: Don Cohen; +Cc: netfilter-devel Hi Don, > And now, since I think this is a useful approach it's worth while to > mention the following: I'm doing some gnuplot pictures right now, and crc32 really does come out as preferrable regarding list length distributions for varying bucket sizes. I'll post a link to a presentation of that analysis later today. [regarding the hash_crc32() in my test program] > > Isn't that the same (modulo order of data) as this? > return crc32(crc, (const char *) key, sizeof(struct ct_key)); The problem with this is not so much order of data, which would be the same for all calls on a local machine (i.e. at least consistent). The problem is possible padding in the structure. While such padding in the real in-kernel tuple data structure would be space wastage, it _is_ possible in principle for a C compiler to introduce such padding. > On my 200MHz machine > your crc my change hash_conntrack > 1.5usec 1.1usec .2usec gcc -o cttest cttest.c > 1.1usec .53usec .14usec gcc -O9 -o cttest cttest.c I'm sure that there's much more speed to be had over the crc32() function I borrowed from zlibc. That one is a general N-byte crc32, and here we always have a fixed amount of bytes. Thus, for production, we would look into completely unrolling the crc32 loop. I'll look into microoptimizating this if and only if it is decided to have crc32 as a (possibly alternative) hash function for conntrack. It's not worth the effort for the test program alone. Thanks nevertheless for bringing this up; it is an important issue when we want to deploy crc32. best regards Patrick ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2002-07-07 11:00 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-06-27 20:46 conntrack performance/DoS formula Don Cohen
2002-06-28 6:23 ` Patrick Schaaf
2002-06-28 17:53 ` Don Cohen
2002-06-28 18:36 ` Patrick Schaaf
2002-06-28 19:03 ` Don Cohen
2002-06-28 19:35 ` Patrick Schaaf
2002-06-28 19:39 ` Patrick Schaaf
2002-06-28 21:10 ` Don Cohen
2002-06-28 21:28 ` Patrick Schaaf
2002-06-28 21:49 ` Don Cohen
2002-06-28 22:30 ` Don Cohen
2002-06-29 9:03 ` Patrick Schaaf
2002-06-29 16:48 ` Don Cohen
2002-06-29 17:22 ` Patrick Schaaf
2002-07-05 13:47 ` Harald Welte
2002-06-29 17:33 ` Patrick Schaaf
2002-06-29 9:29 ` Patrick Schaaf
2002-06-29 12:07 ` Patrick Schaaf
2002-06-29 12:34 ` Patrick Schaaf
2002-06-30 8:31 ` Patrick Schaaf
2002-06-30 19:40 ` Don Cohen
2002-07-01 8:07 ` Henrik Nordstrom
2002-07-01 17:49 ` Don Cohen
2002-07-02 7:58 ` Henrik Nordstrom
[not found] ` <15652.38084.704660.234319@isis.cs3-inc.com>
2002-07-04 21:53 ` Henrik Nordstrom
2002-07-05 7:08 ` Don Cohen
2002-07-05 11:41 ` Henrik Nordstrom
2002-07-06 2:49 ` Don Cohen
2002-07-02 14:55 ` Harald Welte
2002-07-02 14:40 ` Harald Welte
2002-07-02 16:32 ` Patrick Schaaf
2002-07-02 16:35 ` Patrick Schaaf
2002-07-02 16:53 ` Henrik Nordstrom
2002-07-02 17:48 ` Don Cohen
2002-07-02 18:31 ` Patrick Schaaf
2002-07-02 21:52 ` cttest-0.1 Patrick Schaaf
2002-07-03 4:15 ` cttest-0.1 Joakim Axelsson
2002-07-05 15:37 ` cttest-0.1 Martin Josefsson
2002-07-05 16:10 ` cttest-0.1 Joakim Axelsson
2002-07-05 16:54 ` cttest-0.1 Patrick Schaaf
2002-07-05 16:53 ` cttest-0.1 Joakim Axelsson
2002-07-06 6:10 ` cttest-0.1 Andrew Smith
2002-07-06 7:12 ` cttest-0.1 Patrick Schaaf
2002-07-06 15:23 ` cttest-0.1 Patrick Schaaf
2002-07-06 21:14 ` cttest-0.1 Joakim Axelsson
2002-07-06 22:41 ` cttest-0.1 Joakim Axelsson
2002-07-06 23:16 ` cttest-0.1 Joakim Axelsson
2002-07-07 2:30 ` cttest-0.1 Svenning Sorensen
2002-07-07 4:23 ` cttest-0.1 Joakim Axelsson
2002-07-07 5:46 ` cttest-0.1 Joakim Axelsson
2002-07-07 11:00 ` cttest-0.1 Henrik Nordstrom
2002-07-06 22:54 ` cttest-0.1 Joakim Axelsson
2002-07-02 14:38 ` conntrack performance/DoS formula Harald Welte
[not found] <20020701121404.B78724512@lists.samba.org>
2002-07-01 21:30 ` Don Cohen
2002-07-02 6:05 ` Patrick Schaaf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.