* Algoritmic Complexity Attacks and 2.4.20 the dcache code
@ 2003-05-29 20:42 Scott A Crosby
2003-05-30 3:57 ` David S. Miller
` (3 more replies)
0 siblings, 4 replies; 29+ messages in thread
From: Scott A Crosby @ 2003-05-29 20:42 UTC (permalink / raw)
To: linux-kernel
Hello. We have analyzed this software to determine its vulnerability
to a new class of DoS attacks that related to a recent paper. ''Denial
of Service via Algorithmic Complexity Attacks.''
This paper discusses a new class of denial of service attacks that
work by exploiting the difference between average case performance and
worst-case performance. In an adversarial environment, the data
structures used by an application may be forced to experience their
worst case performance. For instance, hash tables are usually thought
of as being constant time operations, but with large numbers of
collisions will degrade to a linked list and may lead to a 100-10,000
times performance degradation. Because of the widespread use of hash
tables, the potential for attack is extremely widespread. Fortunately,
in many cases, other limits on the system limit the impact of these
attacks.
To be attackable, an application must have a deterministic or
predictable hash function and accept untrusted input. In general, for
the attack to be signifigant, the applications must be willing and
able to accept hundreds to tens of thousands of 'attack
inputs'. Because of that requirement, it is difficult to judge the
impact of these attack without knowing the source code extremely well,
and knowing all ways in which a program is used.
As part of this project, one of the applications examined was the
linux kernel 2.4.20, both the networking stack (subject of another
email) and in this email, the dcache.
I have confirmed via an actual attack that it is possible to force the
dcache to experience a 200x performance degradation if the attacker
can control filenames. On a P4-1.8ghz, the time to list a directory of
10,000 files is 18 seconds instead of .1 seconds.
The solution for these attacks on hash tables is to make the hash
function unpredictable via a technique known as universal
hashing. Universal hashing is a keyed hash function where, based on
the key, one of a large set hash functions is chosen. When
benchmarking, we observe that for short or medium length inputs, it is
comparable in performance to simple predictable hash functions such as
the ones in Python or Perl. Our paper has graphs and charts of our
benchmarked performance.
I highly advise using a universal hashing library, either our own or
someone elses. As is historically seen, it is very easy to make silly
mistakes when attempting to implement your own 'secure' algorithm.
The abstract, paper, and a library implementing universal hashing is
available at http://www.cs.rice.edu/~scrosby/hash/.
Scott
^ permalink raw reply [flat|nested] 29+ messages in thread* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-29 20:42 Algoritmic Complexity Attacks and 2.4.20 the dcache code Scott A Crosby @ 2003-05-30 3:57 ` David S. Miller 2003-05-30 4:29 ` Ingo Molnar 2003-05-30 5:04 ` Scott A Crosby 2003-05-30 4:02 ` Ingo Molnar ` (2 subsequent siblings) 3 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-05-30 3:57 UTC (permalink / raw) To: Scott A Crosby; +Cc: linux-kernel On Thu, 2003-05-29 at 13:42, Scott A Crosby wrote: > I highly advise using a universal hashing library, either our own or > someone elses. As is historically seen, it is very easy to make silly > mistakes when attempting to implement your own 'secure' algorithm. Why are you recommending this when after 2 days of going back and forth in emails with me you came to the conclusion that for performance critical paths such as the hashes in the kernel the Jenkins hash was an acceptable choice? It is unacceptably costly to use a universal hash, it makes a multiply operation for every byte of key input plus a modulo operation at the end of the hash computation. All of which can be extremely expensive on some architectures. I showed and backed this up for you with benchmarks comparing your universal hashing code and Jenkins. Some embedded folks will have your head on a platter if we end up using a universal hash function for the DCACHE solely based upon your advice. :-) -- David S. Miller <davem@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 3:57 ` David S. Miller @ 2003-05-30 4:29 ` Ingo Molnar 2003-05-30 18:16 ` Timothy Miller 2003-05-30 5:04 ` Scott A Crosby 1 sibling, 1 reply; 29+ messages in thread From: Ingo Molnar @ 2003-05-30 4:29 UTC (permalink / raw) To: David S. Miller; +Cc: Scott A Crosby, linux-kernel On 29 May 2003, David S. Miller wrote: > > I highly advise using a universal hashing library, either our own or > > someone elses. As is historically seen, it is very easy to make silly > > mistakes when attempting to implement your own 'secure' algorithm. > > Why are you recommending this when after 2 days of going back > and forth in emails with me you came to the conclusion that for > performance critical paths such as the hashes in the kernel the Jenkins > hash was an acceptable choice? > > It is unacceptably costly to use a universal hash, it makes a multiply > operation for every byte of key input plus a modulo operation at the end > of the hash computation. All of which can be extremely expensive on > some architectures. > > I showed and backed this up for you with benchmarks comparing your > universal hashing code and Jenkins. i'd suggest to use the jhash for the following additional kernel entities: pagecache hash, inode hash, vcache hash. the buffer-cache hash and the pidhash should be hard (impossible?) to attack locally. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 4:29 ` Ingo Molnar @ 2003-05-30 18:16 ` Timothy Miller 2003-05-30 18:53 ` Scott A Crosby 0 siblings, 1 reply; 29+ messages in thread From: Timothy Miller @ 2003-05-30 18:16 UTC (permalink / raw) To: Ingo Molnar; +Cc: David S. Miller, Scott A Crosby, linux-kernel Ingo Molnar wrote: > i'd suggest to use the jhash for the following additional kernel entities: > pagecache hash, inode hash, vcache hash. > > the buffer-cache hash and the pidhash should be hard (impossible?) to > attack locally. > Maybe this is what someone is saying, and I just missed it, but... Coming late into this discussion (once again), I am making some assumptions here. A while back, someone came up with a method for breaking encryption (narrowing the brute force computations) by determining how long it took a given host to compute encryption keys or hashes or something. If you have a situation where a lot of hashes are to be computed, and you want to decouple the computation time from the response time, then do this: 1) A kernel thread requests a hash to be computed. That hash is computed and put into a queue. The kernel thread is put into the task queue. 2) Other kernel threads do the same. 3) Threads get woken up only when their hash is pulled off the queue. This way, the only added overhead is kernel context switch from one thread to another. The algorithm doesn't waste CPU time trying to hide the complexity of the computation, because the time between request and receipt of a hash is dependent on whatever other random hashed are also being computed. That is, receipt is much later than request, but you haven't wasted time. The only case where this doesn't deal with the problem in a low-cost way is when the queue starts out empty when you make a request, and then it's the only one on the queue when it's pulled off. In that case, you do have to waste time. However, given that the kernel thread will go to sleep anyhow, the time between sleeping and waking up is somewhat random due to whatever other kernel and user threads might get executed in the mean time. In fact, one solution to this problem would be for the hash computing function to just yield the CPU before or after the computation of the hash. Even in a system with absolutely no other load, the fact that we have to go into and out of the scheduler will add some randomness to the computation time, perhaps. Have I just completely left the topic here? :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 18:16 ` Timothy Miller @ 2003-05-30 18:53 ` Scott A Crosby 0 siblings, 0 replies; 29+ messages in thread From: Scott A Crosby @ 2003-05-30 18:53 UTC (permalink / raw) To: Timothy Miller; +Cc: Ingo Molnar, David S. Miller, linux-kernel On Fri, 30 May 2003 14:16:09 -0400, Timothy Miller <miller@techsource.com> writes: > Ingo Molnar wrote: > > > i'd suggest to use the jhash for the following additional kernel > > entities: pagecache hash, inode hash, vcache hash. > > > the buffer-cache hash and the pidhash should be hard (impossible?) to > > > attack locally. > > > > > > Maybe this is what someone is saying, and I just missed it, but... > > Coming late into this discussion (once again), I am making some > assumptions here. A while back, someone came up with a method for > breaking encryption (narrowing the brute force computations) by > determining how long it took a given host to compute encryption keys > or hashes or something. Yes. Thats also being presented at Usenix Security 2003: Remote Timing Attacks Are Practical David Brumley and Dan Boneh, Stanford University Available at: http://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf However, that paper was dealing with public key operations, not hash-collisions. Thus, the temporal dependence on the key, the 'hidden channel' will probably be orders of magnitude smaller. At that, the paper only works well on a local network, and the result of the attack is far more interesting, the server's private key. In *theory* such an attack is possible and could be used to determine the secret key used in the Jenkin's hash in the networking stack, dcache, or other places in the kernel. Assuming that collision versus non-collision could be uniquely distinguished with $k$ trials, and the hash table has $n$ buckets, the attack would require about 5*k*n or so queries. The idea is to determine if inputs $i$ and $j$ collide. This has a chance of $1/n$ of occuring. If I can uniquely distinguish this, which requires $k$ trials. then on average after $k*n$ trials I'll have one collision. I repeat this $32/log_2 (n)$, say, 5 times, and I have 5 such pairs of known collisions $i,j$. I then enumerate all 2^32 keys, and see which one of them is consistent with the collision evidence observed. This will usually result in only a few possible keys. All of this is in theory of course. In practice, for the bits of the kernel we're discussing, this sort of attack is completely irrelevant. I also note that universal hashing is also not immune to these attacks. To defend, we'd have to go to cryptographic techniques. Scott ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 3:57 ` David S. Miller 2003-05-30 4:29 ` Ingo Molnar @ 2003-05-30 5:04 ` Scott A Crosby 2003-05-30 6:24 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Scott A Crosby @ 2003-05-30 5:04 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel On 29 May 2003 20:57:47 -0700, "David S. Miller" <davem@redhat.com> writes: > On Thu, 2003-05-29 at 13:42, Scott A Crosby wrote: > > I highly advise using a universal hashing library, either our own or > > someone elses. As is historically seen, it is very easy to make silly > > mistakes when attempting to implement your own 'secure' algorithm. > > Why are you recommending this when after 2 days of going back > and forth in emails with me you came to the conclusion that for > performance critical paths such as the hashes in the kernel the Jenkins > hash was an acceptable choice? That was a boilerplate paragraph in all the other vulnerability reports I shipped out today. Perhaps I should have stripped out out or replaced it. > It is unacceptably costly to use a universal hash, Yup the Jenkin's is about 5 times faster than our CW construction on SPARC, and thus likely at least as much faster on a wide variety of other CPU's. I do not know if the dcache hash is performance critical, nor do I know if there exists other more efficient universal hash functions. In any case, the attacks I describe are strong in relative terms, but rather weak in terems of actual impact, so fixing it may not matter much. > Some embedded folks will have your head on a platter if we end up using > a universal hash function for the DCACHE solely based upon your advice. > :-) Have you seen the current dcache function? /* Linux dcache */ #define HASH_3(hi,ho,c) ho=(hi + (c << 4) + (c >> 4)) * 11 Scott ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 5:04 ` Scott A Crosby @ 2003-05-30 6:24 ` David S. Miller 2003-05-30 6:46 ` Scott A Crosby 2003-05-30 8:59 ` Alex Riesen 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-05-30 6:24 UTC (permalink / raw) To: scrosby; +Cc: linux-kernel From: Scott A Crosby <scrosby@cs.rice.edu> Date: 30 May 2003 00:04:24 -0500 Have you seen the current dcache function? /* Linux dcache */ #define HASH_3(hi,ho,c) ho=(hi + (c << 4) + (c >> 4)) * 11 Awesome, moving the Jenkins will actually save us some cycles :-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 6:24 ` David S. Miller @ 2003-05-30 6:46 ` Scott A Crosby 2003-05-30 6:56 ` David S. Miller 2003-05-30 8:59 ` Alex Riesen 1 sibling, 1 reply; 29+ messages in thread From: Scott A Crosby @ 2003-05-30 6:46 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel On Thu, 29 May 2003 23:24:40 -0700 (PDT), "David S. Miller" <davem@redhat.com> writes: > From: Scott A Crosby <scrosby@cs.rice.edu> > Date: 30 May 2003 00:04:24 -0500 > > Have you seen the current dcache function? > > /* Linux dcache */ > #define HASH_3(hi,ho,c) ho=(hi + (c << 4) + (c >> 4)) * 11 > > Awesome, moving the Jenkins will actually save us some > cycles :-) Tricky though, because what if you want to hash more than 64 bits? You have to somehow chain Jenkin's together. Let H(a,b,c) be a jenkin's hash that does '' mix(a,b,c) ; return a '' Let a,b,c,d,e be inputs to be hashed, and R,S,T,U be random keys. Its not safe to do anything like H(H(a,b,c),H(d,e,f),R) Because an attacker can brute-force to find tuples (a,b,c), (a',b',c'), ... so that H(a,b,c) == H(a',b',c') == .... A better approach (which I make with no formal analysis of its quality) might be to construct this: H(a,b,R) ^ H(c,d,S) ^ H(e,f,T) Perhaps the best approach is to visit Usenix Security 2003 this August and ask the experts there. Scott ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 6:46 ` Scott A Crosby @ 2003-05-30 6:56 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-05-30 6:56 UTC (permalink / raw) To: scrosby; +Cc: linux-kernel From: Scott A Crosby <scrosby@cs.rice.edu> Date: 30 May 2003 01:46:12 -0500 Its not safe to do anything like One thing that helps here is that we don't need to provide protection outside the realm of a single name. This is because the hash function takes the pointer of the dentry of the directory it is in (the parent), and contributes this into the hash. Back to the basic problem, using jenkins for hashing names. You could simply shuffle the bytes into a set of 3 32-bit words, every time you've contributed 12 bytes (the 3 words are full) or you've finished the string, you run a __jhash_mix(a,b,c) pass. And you can make init_name_hash() insert the initial random value (choosen using get_random_bytes() right before we mount root). After all, a string is just a variable number of u32 words. Actually, since we can say something about the alignment of the string pointer in the dentry case, we can simply feed this as a u32 pointer straight into the jenkins hash. We know the length of the string so this is pretty easy. Actually, the most generic version "jhash()" handles arbitrary byte lengths and alignments. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 6:24 ` David S. Miller 2003-05-30 6:46 ` Scott A Crosby @ 2003-05-30 8:59 ` Alex Riesen 2003-05-30 9:00 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: Alex Riesen @ 2003-05-30 8:59 UTC (permalink / raw) To: David S. Miller; +Cc: scrosby, linux-kernel David S. Miller, Fri, May 30, 2003 08:24:40 +0200: > From: Scott A Crosby <scrosby@cs.rice.edu> > Date: 30 May 2003 00:04:24 -0500 > > Have you seen the current dcache function? > > /* Linux dcache */ > #define HASH_3(hi,ho,c) ho=(hi + (c << 4) + (c >> 4)) * 11 > > Awesome, moving the Jenkins will actually save us some > cycles :-) static int hash_3(int hi, int c) { return (hi + (c << 4) + (c >> 4)) * 11; } gcc-3.2.1 -O2 -march=pentium hash_3: pushl %ebp movl %esp, %ebp movl 12(%ebp), %eax movl 8(%ebp), %ecx movl %eax, %edx popl %ebp sall $4, %edx sarl $4, %eax addl %ecx, %edx addl %eax, %edx leal (%edx,%edx,4), %eax leal (%edx,%eax,2), %eax ret It is not guaranteed to be this way on all architectures, of course. But still - no multiplications. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 8:59 ` Alex Riesen @ 2003-05-30 9:00 ` David S. Miller 2003-05-30 15:05 ` Scott A Crosby 2003-05-31 6:30 ` William Lee Irwin III 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-05-30 9:00 UTC (permalink / raw) To: alexander.riesen; +Cc: scrosby, linux-kernel From: Alex Riesen <alexander.riesen@synopsys.COM> Date: Fri, 30 May 2003 10:59:01 +0200 static int hash_3(int hi, int c) { return (hi + (c << 4) + (c >> 4)) * 11; } gcc-3.2.1 -O2 -march=pentium ... It is not guaranteed to be this way on all architectures, of course. But still - no multiplications. Indeed, I'd missed this. GCC will emit the constant multiply expansion unless the multiply cost is set VERY low. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 9:00 ` David S. Miller @ 2003-05-30 15:05 ` Scott A Crosby 2003-05-31 6:18 ` David S. Miller 2003-05-31 6:30 ` William Lee Irwin III 1 sibling, 1 reply; 29+ messages in thread From: Scott A Crosby @ 2003-05-30 15:05 UTC (permalink / raw) To: David S. Miller; +Cc: alexander.riesen, linux-kernel On Fri, 30 May 2003 02:00:40 -0700 (PDT), "David S. Miller" <davem@redhat.com> writes: > From: Alex Riesen <alexander.riesen@synopsys.COM> > Date: Fri, 30 May 2003 10:59:01 +0200 > > static > int hash_3(int hi, int c) > { > return (hi + (c << 4) + (c >> 4)) * 11; > } > > gcc-3.2.1 -O2 -march=pentium > ... > It is not guaranteed to be this way on all architectures, of course. > But still - no multiplications. > > Indeed, I'd missed this. GCC will emit the constant multiply > expansion unless the multiply cost is set VERY low. It may still be a win. This does a bit under a dozen instructions per byte. However, jenkin's does many bytes at a time. Scott ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 15:05 ` Scott A Crosby @ 2003-05-31 6:18 ` David S. Miller 2003-05-31 8:02 ` Willy TARREAU 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-05-31 6:18 UTC (permalink / raw) To: scrosby; +Cc: alexander.riesen, linux-kernel From: Scott A Crosby <scrosby@cs.rice.edu> Date: 30 May 2003 10:05:51 -0500 On Fri, 30 May 2003 02:00:40 -0700 (PDT), "David S. Miller" <davem@redhat.com> writes: > Indeed, I'd missed this. GCC will emit the constant multiply > expansion unless the multiply cost is set VERY low. It may still be a win. This does a bit under a dozen instructions per byte. However, jenkin's does many bytes at a time. It turns out to not be the case at all. There is too much work involved in the main loop just maintaining the 3-word + curbyte state. It needs to be optimized a bit. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 6:18 ` David S. Miller @ 2003-05-31 8:02 ` Willy TARREAU 2003-05-31 8:12 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Willy TARREAU @ 2003-05-31 8:02 UTC (permalink / raw) To: David S. Miller; +Cc: scrosby, alexander.riesen, linux-kernel, marcelo Hi David, On Fri, May 30, 2003 at 11:18:13PM -0700, David S. Miller wrote: > It turns out to not be the case at all. There is too much work > involved in the main loop just maintaining the 3-word + curbyte > state. > > It needs to be optimized a bit. With this simple change, jhash_mix is *exactly* three times faster for me on athlon-xp, whatever gcc I use (2.95.3 or 3.2.3), on the following do_hash() function, and about 40% faster when used on local variables. I think that gcc is not always smart enough to avoid assignments, and consumes memory bandwidth and perhaps prevents better optimizations. void do_hash(unsigned *a, unsigned *b, unsigned *c) { __jhash_mix(*a, *b, *c); } This function is 189 bytes long, and takes about 72 cycles to complete with the original macro, and is now 130 bytes long for about 24 cycles, which means about 1.5 operation/cycle... not bad :-) I've just tested on other architectures, it's even more interesting : - On a sparc ultra2/400, 100 million hashes drop from 38.8 seconds to 8.28 s (4.68x) - And the best win : on Alpha EV6/466, it goes from 51.5 secs to 5.75 s (8.96x) !!! I've checked that the results are the same on every arch, before and after the modification. I believe it's trivial enough to go into 2.4.21, don't you think ? Regards, Willy --- linux-2.4/include/linux/jhash.h Sat May 10 11:36:02 2003 +++ /tmp/jhash.h Sat May 31 09:38:17 2003 @@ -23,15 +23,15 @@ /* NOTE: Arguments are modified. */ #define __jhash_mix(a, b, c) \ { \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ + a = (a - b - c) ^ (c>>13); \ + b = (b - c - a) ^ (a<<8); \ + c = (c - a - b) ^ (b>>13); \ + a = (a - b - c) ^ (c>>12); \ + b = (b - c - a) ^ (a<<16); \ + c = (c - a - b) ^ (b>>5); \ + a = (a - b - c) ^ (c>>3); \ + b = (b - c - a) ^ (a<<10); \ + c = (c - a - b) ^ (b>>15); \ } /* The golden ration: an arbitrary value */ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 8:02 ` Willy TARREAU @ 2003-05-31 8:12 ` David S. Miller 2003-05-31 8:56 ` Willy Tarreau 2003-05-31 8:58 ` David Schwartz 0 siblings, 2 replies; 29+ messages in thread From: David S. Miller @ 2003-05-31 8:12 UTC (permalink / raw) To: willy; +Cc: scrosby, alexander.riesen, linux-kernel, marcelo From: Willy TARREAU <willy@w.ods.org> Date: Sat, 31 May 2003 10:02:05 +0200 With this simple change, jhash_mix is *exactly* three times faster for me on athlon-xp, whatever gcc I use (2.95.3 or 3.2.3), on the following do_hash() function, and about 40% faster when used on local variables. Interesting :-) This function is 189 bytes long, and takes about 72 cycles to complete with the original macro, and is now 130 bytes long for about 24 cycles, which means about 1.5 operation/cycle... not bad :-) __jhash_mix takes ~23 cycles on sparc64 in the original version for me. I get the same measurement for your version. Maybe your gcc version just stinks :-( Oh wait, yes, it's the memory operations it can't eliminate. It can't do that because it has no idea whether certain pointers alias or not. (ie. it doesn't know whether 'a' and 'b' point to the same value) Since all the networking versions work on local variables, in 2.4.x it shouldn't matter performance wise. You'll note that my updated dcache jenkins patch for 2.5.x brought the hash->words[] variables into locals before running __jhash_mix() on it. So it shouldn't matter there either. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 8:12 ` David S. Miller @ 2003-05-31 8:56 ` Willy Tarreau 2003-05-31 8:58 ` David S. Miller 2003-05-31 8:58 ` David Schwartz 1 sibling, 1 reply; 29+ messages in thread From: Willy Tarreau @ 2003-05-31 8:56 UTC (permalink / raw) To: David S. Miller; +Cc: willy, scrosby, alexander.riesen, linux-kernel, marcelo On Sat, May 31, 2003 at 01:12:10AM -0700, David S. Miller wrote: > From: Willy TARREAU <willy@w.ods.org> > Date: Sat, 31 May 2003 10:02:05 +0200 > > With this simple change, jhash_mix is *exactly* three times faster > for me on athlon-xp, whatever gcc I use (2.95.3 or 3.2.3), on the > following do_hash() function, and about 40% faster when used on > local variables. > > Interesting :-) Not that much finally, because I cannot reproduce the slowdown I initially observed with the original function on local variables. I think I may have had a wrong type somewhere or a particular operation which prevented gcc from fully optimizing the macro :-/ So at the moment, both the original and my version are the same speed on local variables on athlon-xp. BTW, I don't understand why, but I've just noticed that the Alpha is 25% slower on my version when used on local variables ! So my version is only interesting when working on indirect variables, which is never the case in the kernel... Never mind, that was just a try. For info, here's what I measure here on the original version : - athlon-xp : 24 cycles, whatever -march/-mcpu - alpha ev6 : -mcpu=ev5 : 22 cycles -mcpu=ev6 : 24 cycles - sparc64 : default : 26 cycles -mcpu=v9 : 24 cycles -mcpu=ultrasparc: 19 cycles (18 cycles for my version here) Considering that the Alpha and the UltraSparc can issue up to 4 instruction per cycle, I wonder whether it would be worse trying to implement such a hash in assembler, optimized for each processor, with a default fall-back to the C function for archs that have not been implemented. Cheers, Willy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 8:56 ` Willy Tarreau @ 2003-05-31 8:58 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-05-31 8:58 UTC (permalink / raw) To: willy; +Cc: scrosby, alexander.riesen, linux-kernel, marcelo From: Willy Tarreau <willy@w.ods.org> Date: Sat, 31 May 2003 10:56:12 +0200 Considering that the Alpha and the UltraSparc can issue up to 4 instruction per cycle, Ultrasparc only has 2 integer units. So it really can only do 2 integer operations per cycle. GCC is giving it an optimal schedule for -mtune=ultrasparc, I know because I wrote that instruction scheduler :-) You can get 4 issue if you're doing floating point stuff. I believe the current generation Alpha has 3 integer units. GCC should be doing a good job there too. ^ permalink raw reply [flat|nested] 29+ messages in thread
* RE: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 8:12 ` David S. Miller 2003-05-31 8:56 ` Willy Tarreau @ 2003-05-31 8:58 ` David Schwartz 2003-05-31 9:01 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: David Schwartz @ 2003-05-31 8:58 UTC (permalink / raw) To: David S. Miller, willy; +Cc: linux-kernel > __jhash_mix takes ~23 cycles on sparc64 in the original version for > me. I get the same measurement for your version. Maybe your gcc > version just stinks :-( > Oh wait, yes, it's the memory operations it can't eliminate. > It can't do that because it has no idea whether certain pointers > alias or not. (ie. it doesn't know whether 'a' and 'b' point > to the same value) It's a macro, and the way it inlines, it should be obvious in most cases that 'a', 'b', and 'c' can't be in the same place in memory. I've been messing with optimizing the Jenkins hash lately. I've found two interesting optimizations. One is to check if the input pointer happens to be aligned on a double-word boundary, and if so optimize the inner loop to use dword fetches. Even most strings, assuming you hash from the beginning, will be aligned on a dword boundary. For very short strings, this has no effect. The longer the string, the more of an affect this has. Basically, you change this: // handle most of the key while(len >= 12) { a += ((unsigned)ptr[0] + ((unsigned)ptr[1]<<8) + ((unsigned)ptr[2]<<16) + ((unsigned)ptr[3]<<24)); b += ((unsigned)ptr[4] + ((unsigned)ptr[5]<<8) + ((unsigned)ptr[6]<<16) + ((unsigned)ptr[7]<<24)); c += ((unsigned)ptr[8] + ((unsigned)ptr[9]<<8) + ((unsigned)ptr[10]<<16) + ((unsigned)ptr[11]<<24)); J_MIX(a, b, c); ptr += 12; len -= 12; } to: // handle most of the key if( (((unsigned)data)&3) == 0) { while(len >= 12) { a += * (unsigned *) ptr; b += *(((unsigned *) ptr)+1); c += *(((unsigned *) ptr)+2); J_MIX(a, b, c); ptr += 12; len -= 12; } } else { while(len >= 12) { a += ((unsigned)ptr[0] + ((unsigned)ptr[1]<<8) + ((unsigned)ptr[2]<<16) + ((unsigned)ptr[3]<<24)); b += ((unsigned)ptr[4] + ((unsigned)ptr[5]<<8) + ((unsigned)ptr[6]<<16) + ((unsigned)ptr[7]<<24)); c += ((unsigned)ptr[8] + ((unsigned)ptr[9]<<8) + ((unsigned)ptr[10]<<16) + ((unsigned)ptr[11]<<24)); J_MIX(a, b, c); ptr += 12; len -= 12; } } How much this helps depends upon how often your inputs are dword aligned. If you're hashing strings from the beginning, they almost always are. The other is to rework the switch/case statement into a tree of 'if(len>0) { len--; ...'. You have to reverse the order of the additions, basically changing: switch(len) // all the case statements fall through { case 11: c+=((unsigned)ptr[10]<<24); case 10: c+=((unsigned)ptr[9]<<16); case 9 : c+=((unsigned)ptr[8]<<8); /* the first byte of c is reserved for the length */ case 8 : b+=((unsigned)ptr[7]<<24); case 7 : b+=((unsigned)ptr[6]<<16); case 6 : b+=((unsigned)ptr[5]<<8); case 5 : b+=ptr[4]; case 4 : a+=((unsigned)ptr[3]<<24); case 3 : a+=((unsigned)ptr[2]<<16); case 2 : a+=((unsigned)ptr[1]<<8); case 1 : a+=ptr[0]; // case 0 : // nothing left to add } to if(len!=0) { len--; a+=ptr[0]; if(len!=0) { len--; a+=((unsigned)ptr[1]<<8); if(len!=0) { len--; a+=((unsigned)ptr[2]<<16); if(len!=0) { len--; a+=((unsigned)ptr[3]<<24); if(len!=0) { len--; b+=ptr[4]; if(len!=0) { len--; b+=((unsigned)ptr[5]<<8); if(len!=0) { len--; b+=((unsigned)ptr[6]<<16); if(len!=0) { len--; b+=((unsigned)ptr[7]<<24); if(len!=0) { len--; c+=((unsigned)ptr[8]<<8); if(len!=0) { len--; c+=((unsigned)ptr[9]<<16); if(len!=0) { len--; c+=((unsigned)ptr[10]<<24); } } } } } } } } } } } Yeah, it's uglier, and you'd think the extra conditionals would slow things down, but actually, the branches predict better and the comparisons are cheaper. The original switch/case statement translates into a subtract/compare for nothing (to see if len>=12) followed by a totally unpredictable indirect jump. With my version, the most frequently executed comparisons are the easiest to predict (assuming random distribution of lengths). I've confirmed by benchmark that both of these are in fact optimizations (on my processor/compiler/options/etcetera). YMMV. DS ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 8:58 ` David Schwartz @ 2003-05-31 9:01 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-05-31 9:01 UTC (permalink / raw) To: davids; +Cc: willy, linux-kernel From: "David Schwartz" <davids@webmaster.com> Date: Sat, 31 May 2003 01:58:09 -0700 It's a macro, and the way it inlines, it should be obvious in most cases that 'a', 'b', and 'c' can't be in the same place in memory. Not true at all in Willy's test case, which was: void test(u32 *a, u32 *b, u32 *c) { __jhash_mix(*a, *b, *c); } And here you certainly have absolutely NO idea where a, b, and c might point to. One is to check if the input pointer happens to be aligned on a double-word boundary, The most generic jhash() frankly isn't very interesting for kernel applications, %99 of uses there can use the u32 optimized version. Even for dcache strings we can't use it because for each character we have to test it against some terminating value or translate it somehow. I wouldn't waste any time at all on this thing. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 9:00 ` David S. Miller 2003-05-30 15:05 ` Scott A Crosby @ 2003-05-31 6:30 ` William Lee Irwin III 2003-05-31 6:33 ` David S. Miller 1 sibling, 1 reply; 29+ messages in thread From: William Lee Irwin III @ 2003-05-31 6:30 UTC (permalink / raw) To: David S. Miller; +Cc: alexander.riesen, scrosby, linux-kernel From: Alex Riesen <alexander.riesen@synopsys.COM> Date: Fri, 30 May 2003 10:59:01 +0200 > static > int hash_3(int hi, int c) > { > return (hi + (c << 4) + (c >> 4)) * 11; > } > gcc-3.2.1 -O2 -march=pentium > ... > It is not guaranteed to be this way on all architectures, of course. > But still - no multiplications. On Fri, May 30, 2003 at 02:00:40AM -0700, David S. Miller wrote: > Indeed, I'd missed this. GCC will emit the constant multiply > expansion unless the multiply cost is set VERY low. If the strength reduction situation changes to being properly handled by gcc for most/all 64-bit arches, include/linux/hash.h can lose a #ifdef. -- wli ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 6:30 ` William Lee Irwin III @ 2003-05-31 6:33 ` David S. Miller 2003-05-31 6:41 ` William Lee Irwin III 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-05-31 6:33 UTC (permalink / raw) To: wli; +Cc: alexander.riesen, scrosby, linux-kernel From: William Lee Irwin III <wli@holomorphy.com> Date: Fri, 30 May 2003 23:30:40 -0700 If the strength reduction situation changes to being properly handled by gcc for most/all 64-bit arches, include/linux/hash.h can lose a #ifdef. It's not a strength reduction issue, it's about not setting the multiply cost properly in the machine description. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 6:33 ` David S. Miller @ 2003-05-31 6:41 ` William Lee Irwin III 2003-05-31 6:45 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: William Lee Irwin III @ 2003-05-31 6:41 UTC (permalink / raw) To: David S. Miller; +Cc: alexander.riesen, scrosby, linux-kernel From: William Lee Irwin III <wli@holomorphy.com> Date: Fri, 30 May 2003 23:30:40 -0700 > If the strength reduction situation changes to being properly handled > by gcc for most/all 64-bit arches, include/linux/hash.h can lose a #ifdef. On Fri, May 30, 2003 at 11:33:53PM -0700, David S. Miller wrote: > It's not a strength reduction issue, it's about not setting the > multiply cost properly in the machine description. If it's literally that trivial I'll put digging around the machine descriptions on my TODO list. -- wli ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 6:41 ` William Lee Irwin III @ 2003-05-31 6:45 ` David S. Miller 2003-05-31 18:40 ` Aaron Lehmann 0 siblings, 1 reply; 29+ messages in thread From: David S. Miller @ 2003-05-31 6:45 UTC (permalink / raw) To: wli; +Cc: alexander.riesen, scrosby, linux-kernel From: William Lee Irwin III <wli@holomorphy.com> Date: Fri, 30 May 2003 23:41:38 -0700 If it's literally that trivial I'll put digging around the machine descriptions on my TODO list. Look at TARGET_RTX_COSTS, thats where all of this happens. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-31 6:45 ` David S. Miller @ 2003-05-31 18:40 ` Aaron Lehmann 0 siblings, 0 replies; 29+ messages in thread From: Aaron Lehmann @ 2003-05-31 18:40 UTC (permalink / raw) To: David S. Miller; +Cc: wli, alexander.riesen, scrosby, linux-kernel On Fri, May 30, 2003 at 11:45:29PM -0700, David S. Miller wrote: > From: William Lee Irwin III <wli@holomorphy.com> > Date: Fri, 30 May 2003 23:41:38 -0700 > > If it's literally that trivial I'll put digging around the machine > descriptions on my TODO list. > > Look at TARGET_RTX_COSTS, thats where all of this happens. Reading the code that handles this stuff (expmed.c) always cracks me up. /* We might want to refine this now that we have division-by-constant optimization. Since expand_mult_highpart tries so many variants, it is not straightforward to generalize this. Maybe we should make an array of possible modes in init_expmed? Save this for GCC 2.7. */ /* We could just as easily deal with negative constants here, but it does not seem worth the trouble for GCC 2.6. */ /* This is extremely similar to the code for the unsigned case above. For 2.7 we should merge these variants, but for 2.6.1 I don't want to touch the code for unsigned since that get used in C. The signed case will only be used by other languages (Ada). */ Sometimes I wish the gcc code was tame enough for me to work on. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-29 20:42 Algoritmic Complexity Attacks and 2.4.20 the dcache code Scott A Crosby 2003-05-30 3:57 ` David S. Miller @ 2003-05-30 4:02 ` Ingo Molnar 2003-05-30 4:42 ` Scott A Crosby 2003-05-30 13:48 ` Nikita Danilov 2003-06-01 1:15 ` Daniel Phillips 3 siblings, 1 reply; 29+ messages in thread From: Ingo Molnar @ 2003-05-30 4:02 UTC (permalink / raw) To: Scott A Crosby; +Cc: linux-kernel On 29 May 2003, Scott A Crosby wrote: > I have confirmed via an actual attack that it is possible to force the > dcache to experience a 200x performance degradation if the attacker can > control filenames. On a P4-1.8ghz, the time to list a directory of > 10,000 files is 18 seconds instead of .1 seconds. are you sure this is a big issue? Kernel 2.0 (maybe even 2.2) lists 10,000 files at roughly the same speed (18 seconds) without any attack pattern used for filenames - still it's a kernel being used. the network hash collision was a much more serious issue, because it could be triggered externally, could be maintained with a relatively low input packet flow, and affected all users of the network stack. also, directories with 10,000 files are not quite common on systems where there is a trust problem between users. So a typical directory with say 100 files will be listed in 0.18 seconds - it's slower, but does not make the system unusable. also, dcache flushes happen quite frequently under VM pressure - and especially when using many files you get VM pressure. So it would take a really specialized attack to keep the dcache size at the critical level and trigger the slowdown. also, any local user who can create thousands of files can cause much more havoc by simply overloading the system. You might as well use that CPU time to really bog down the system by making it swap heavily - this will cause a _much_ heavier slowdown. Ingo ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 4:02 ` Ingo Molnar @ 2003-05-30 4:42 ` Scott A Crosby 2003-05-30 5:01 ` David S. Miller 0 siblings, 1 reply; 29+ messages in thread From: Scott A Crosby @ 2003-05-30 4:42 UTC (permalink / raw) To: Ingo Molnar; +Cc: linux-kernel On Fri, 30 May 2003 06:02:18 +0200 (CEST), Ingo Molnar <mingo@elte.hu> writes: > On 29 May 2003, Scott A Crosby wrote: > > > I have confirmed via an actual attack that it is possible to force the > > dcache to experience a 200x performance degradation if the attacker can > > control filenames. On a P4-1.8ghz, the time to list a directory of > > 10,000 files is 18 seconds instead of .1 seconds. > > are you sure this is a big issue? Kernel 2.0 (maybe even 2.2) lists 10,000 > files at roughly the same speed (18 seconds) without any attack pattern > used for filenames - still it's a kernel being used. No. Its not that severe, but it does exist, and it is noticable even with a quarter that number of files. I did it because it was an interesting illustrative example, and it only took 30 seconds or so of coding to put the hash function into generator generating program. > So it would take a really specialized attack to keep the dcache size > at the critical level and trigger the slowdown. Yup. It is probably a very unusual configuration, but that doesn't mean that somebody won't experience it. :) Scott ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-30 4:42 ` Scott A Crosby @ 2003-05-30 5:01 ` David S. Miller 0 siblings, 0 replies; 29+ messages in thread From: David S. Miller @ 2003-05-30 5:01 UTC (permalink / raw) To: Scott A Crosby; +Cc: Ingo Molnar, linux-kernel On Thu, 2003-05-29 at 21:42, Scott A Crosby wrote: > It is probably a very unusual configuration, It is worth noting that it might even be possible to go after this remotely. Consider a mailserver that you can someone influence the queued mail file names for. -- David S. Miller <davem@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-29 20:42 Algoritmic Complexity Attacks and 2.4.20 the dcache code Scott A Crosby 2003-05-30 3:57 ` David S. Miller 2003-05-30 4:02 ` Ingo Molnar @ 2003-05-30 13:48 ` Nikita Danilov 2003-06-01 1:15 ` Daniel Phillips 3 siblings, 0 replies; 29+ messages in thread From: Nikita Danilov @ 2003-05-30 13:48 UTC (permalink / raw) To: Scott A Crosby; +Cc: linux-kernel, Alexander Viro Scott A Crosby writes: > Hello. We have analyzed this software to determine its vulnerability > to a new class of DoS attacks that related to a recent paper. ''Denial > of Service via Algorithmic Complexity Attacks.'' > > This paper discusses a new class of denial of service attacks that > work by exploiting the difference between average case performance and > worst-case performance. In an adversarial environment, the data > structures used by an application may be forced to experience their > worst case performance. For instance, hash tables are usually thought > of as being constant time operations, but with large numbers of > collisions will degrade to a linked list and may lead to a 100-10,000 > times performance degradation. Another nice way to experience "worst case performance", is to create deeply nested directory structure, like 0/1/2/3/4/.../99999/100000 try to unmount and see how shrink_dcache_parent/prune_dcache consume 100% of CPU without allowing preemption. Not recommended on a single processor machine. > Because of the widespread use of hash > tables, the potential for attack is extremely widespread. Fortunately, > in many cases, other limits on the system limit the impact of these > attacks. > [...] > > Scott Nikita. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Algoritmic Complexity Attacks and 2.4.20 the dcache code 2003-05-29 20:42 Algoritmic Complexity Attacks and 2.4.20 the dcache code Scott A Crosby ` (2 preceding siblings ...) 2003-05-30 13:48 ` Nikita Danilov @ 2003-06-01 1:15 ` Daniel Phillips 3 siblings, 0 replies; 29+ messages in thread From: Daniel Phillips @ 2003-06-01 1:15 UTC (permalink / raw) To: Scott A Crosby, linux-kernel On Thursday 29 May 2003 22:42, Scott A Crosby wrote: > The solution for these attacks on hash tables is to make the hash > function unpredictable via a technique known as universal > hashing. Universal hashing is a keyed hash function where, based on > the key, one of a large set hash functions is chosen. When > benchmarking, we observe that for short or medium length inputs, it is > comparable in performance to simple predictable hash functions such as > the ones in Python or Perl. Our paper has graphs and charts of our > benchmarked performance. You should have said "a solution", not "the solution", above. For the Ext2/3 HTree index, we found a rather nice solution that varies the hash dispersion pattern on a per-volume basis, in a way that's difficult for a DOSer to reconstruct (please feel free to analyze this approach and find a hole, if there is one). This is much simpler than what you propose. As we are talking core kernel here, adding an extra spiderweb of complexity in the form of multiple hash algorithms really isn't appealing, if it can be avoided. Not to mention the overhead of indexing into the correct hash algorithm on each lookup. > I highly advise using a universal hashing library, either our own or > someone elses. As is historically seen, it is very easy to make silly > mistakes when attempting to implement your own 'secure' algorithm. Translation: adding bloat is often the easy way out for the lazy. Anyway, thanks for your analysis, but I disagree with your recommendation. Regards, Daniel ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2003-06-01 1:01 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-05-29 20:42 Algoritmic Complexity Attacks and 2.4.20 the dcache code Scott A Crosby 2003-05-30 3:57 ` David S. Miller 2003-05-30 4:29 ` Ingo Molnar 2003-05-30 18:16 ` Timothy Miller 2003-05-30 18:53 ` Scott A Crosby 2003-05-30 5:04 ` Scott A Crosby 2003-05-30 6:24 ` David S. Miller 2003-05-30 6:46 ` Scott A Crosby 2003-05-30 6:56 ` David S. Miller 2003-05-30 8:59 ` Alex Riesen 2003-05-30 9:00 ` David S. Miller 2003-05-30 15:05 ` Scott A Crosby 2003-05-31 6:18 ` David S. Miller 2003-05-31 8:02 ` Willy TARREAU 2003-05-31 8:12 ` David S. Miller 2003-05-31 8:56 ` Willy Tarreau 2003-05-31 8:58 ` David S. Miller 2003-05-31 8:58 ` David Schwartz 2003-05-31 9:01 ` David S. Miller 2003-05-31 6:30 ` William Lee Irwin III 2003-05-31 6:33 ` David S. Miller 2003-05-31 6:41 ` William Lee Irwin III 2003-05-31 6:45 ` David S. Miller 2003-05-31 18:40 ` Aaron Lehmann 2003-05-30 4:02 ` Ingo Molnar 2003-05-30 4:42 ` Scott A Crosby 2003-05-30 5:01 ` David S. Miller 2003-05-30 13:48 ` Nikita Danilov 2003-06-01 1:15 ` Daniel Phillips
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox