* [PATCH 0/2] New jhash function @ 2010-11-25 13:15 Jozsef Kadlecsik 2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik 0 siblings, 1 reply; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik Hi, Please consider applying the following patch: The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). The new hash function - mixes better than lookup2(): it passes the check that every input bit changes every output bit 50% of the time, while lookup2() failed it. - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster than lookup2() depending on the key length. You can read a longer comparison of the two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. jhash is widely used in the kernel and because the functions are inlined, the cost in size is significant. The new functions are slightly larger than the previous ones so the new implementation uses non-inlined fucntions. (See http://lkml.indiana.edu/hypermail//linux/kernel/0902.2/01149.html). Therefore the first patch replaces the calls to the internal macros of jhash with the jhash function calls and the second patch contains the implementation of the new jhash functions. Jozsef Kadlecsik (2): Prepare the tree for un-inlined jhash. The new jhash implementation include/linux/jhash.h | 136 ++------------------------------- lib/Makefile | 2 +- lib/jhash.c | 153 ++++++++++++++++++++++++++++++++++++++ net/ipv6/inet6_connection_sock.c | 18 ++--- net/ipv6/reassembly.c | 36 ++++----- 5 files changed, 186 insertions(+), 159 deletions(-) create mode 100644 lib/jhash.c ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] Prepare the tree for un-inlined jhash. 2010-11-25 13:15 [PATCH 0/2] New jhash function Jozsef Kadlecsik @ 2010-11-25 13:15 ` Jozsef Kadlecsik 2010-11-25 13:15 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik 2010-11-25 13:39 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt 0 siblings, 2 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik jhash is widely used in the kernel and because the functions are inlined, the cost in size is significant. Also, the new jhash functions are slightly larger than the previous ones so better un-inline. As a preparation step, the calls to the internal macros are replaced with the plain jhash function calls. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> --- net/ipv6/inet6_connection_sock.c | 18 ++++++++---------- net/ipv6/reassembly.c | 36 ++++++++++++++++-------------------- 2 files changed, 24 insertions(+), 30 deletions(-) diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 8a16280..861d252 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict); static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport, const u32 rnd, const u16 synq_hsize) { - u32 a = (__force u32)raddr->s6_addr32[0]; - u32 b = (__force u32)raddr->s6_addr32[1]; - u32 c = (__force u32)raddr->s6_addr32[2]; - - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += rnd; - __jhash_mix(a, b, c); - - a += (__force u32)raddr->s6_addr32[3]; - b += (__force u32)rport; - __jhash_mix(a, b, c); + u32 c; + + c = jhash_3words((__force u32)raddr->s6_addr32[0], + (__force u32)raddr->s6_addr32[1], + (__force u32)raddr->s6_addr32[2], + rnd); + + c = jhash_2words((__force u32)raddr->s6_addr32[3], + (__force u32)rport, + c); return c & (synq_hsize - 1); } diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index c7ba314..5e57490 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, const struct in6_addr *daddr, u32 rnd) { - u32 a, b, c; - - a = (__force u32)saddr->s6_addr32[0]; - b = (__force u32)saddr->s6_addr32[1]; - c = (__force u32)saddr->s6_addr32[2]; - - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += rnd; - __jhash_mix(a, b, c); - - a += (__force u32)saddr->s6_addr32[3]; - b += (__force u32)daddr->s6_addr32[0]; - c += (__force u32)daddr->s6_addr32[1]; - __jhash_mix(a, b, c); - - a += (__force u32)daddr->s6_addr32[2]; - b += (__force u32)daddr->s6_addr32[3]; - c += (__force u32)id; - __jhash_mix(a, b, c); + u32 c; + + c = jhash_3words((__force u32)saddr->s6_addr32[0], + (__force u32)saddr->s6_addr32[1], + (__force u32)saddr->s6_addr32[2], + rnd); + + c = jhash_3words((__force u32)saddr->s6_addr32[3], + (__force u32)daddr->s6_addr32[0], + (__force u32)daddr->s6_addr32[1], + c); + + c = jhash_3words((__force u32)daddr->s6_addr32[2], + (__force u32)daddr->s6_addr32[3], + (__force u32)id, + c); return c & (INETFRAGS_HASHSZ - 1); } -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] The new jhash implementation 2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik @ 2010-11-25 13:15 ` Jozsef Kadlecsik 2010-11-25 13:49 ` Eric Dumazet 2010-11-27 3:31 ` Rusty Russell 2010-11-25 13:39 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt 1 sibling, 2 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 13:15 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). The new hash function - mixes better than lookup2(): it passes the check that every input bit changes every output bit 50% of the time, while lookup2() failed it. - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster than lookup2() depending on the key length. The patch replaces the lookup2() implementation of the 'jhash*' functions with that of lookup3(). You can read a longer comparison of the two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> --- include/linux/jhash.h | 136 +++----------------------------------------- lib/Makefile | 2 +- lib/jhash.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 162 insertions(+), 129 deletions(-) create mode 100644 lib/jhash.c diff --git a/include/linux/jhash.h b/include/linux/jhash.h index ced1159..ca69ac3 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -1,134 +1,14 @@ #ifndef _LINUX_JHASH_H #define _LINUX_JHASH_H -/* jhash.h: Jenkins hash support. - * - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net) - * - * http://burtleburtle.net/bob/hash/ - * - * These are the credits from Bob's sources: - * - * lookup2.c, by Bob Jenkins, December 1996, Public Domain. - * hash(), hash2(), hash3, and mix() are externally useful functions. - * Routines to test the hash are included if SELF_TEST is defined. - * You can use this free for any purpose. It has no warranty. - * - * Copyright (C) 2003 David S. Miller (davem@redhat.com) - * - * I've modified Bob's hash to be useful in the Linux kernel, and - * any bugs present are surely my fault. -DaveM - */ - -/* NOTE: Arguments are modified. */ -#define __jhash_mix(a, b, c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ -} - -/* The golden ration: an arbitrary value */ -#define JHASH_GOLDEN_RATIO 0x9e3779b9 - -/* The most generic version, hashes an arbitrary sequence - * of bytes. No alignment or length assumptions are made about - * the input key. - */ -static inline u32 jhash(const void *key, u32 length, u32 initval) -{ - u32 a, b, c, len; - const u8 *k = key; - - len = length; - a = b = JHASH_GOLDEN_RATIO; - c = initval; - - while (len >= 12) { - a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24)); - b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24)); - c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24)); - - __jhash_mix(a,b,c); - - k += 12; - len -= 12; - } - - c += length; - switch (len) { - case 11: c += ((u32)k[10]<<24); - case 10: c += ((u32)k[9]<<16); - case 9 : c += ((u32)k[8]<<8); - case 8 : b += ((u32)k[7]<<24); - case 7 : b += ((u32)k[6]<<16); - case 6 : b += ((u32)k[5]<<8); - case 5 : b += k[4]; - case 4 : a += ((u32)k[3]<<24); - case 3 : a += ((u32)k[2]<<16); - case 2 : a += ((u32)k[1]<<8); - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; -} - -/* A special optimized version that handles 1 or more of u32s. - * The length parameter here is the number of u32s in the key. - */ -static inline u32 jhash2(const u32 *k, u32 length, u32 initval) -{ - u32 a, b, c, len; - - a = b = JHASH_GOLDEN_RATIO; - c = initval; - len = length; - - while (len >= 3) { - a += k[0]; - b += k[1]; - c += k[2]; - __jhash_mix(a, b, c); - k += 3; len -= 3; - } - - c += length * 4; - - switch (len) { - case 2 : b += k[1]; - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; -} - - -/* A special ultra-optimized versions that knows they are hashing exactly - * 3, 2 or 1 word(s). - * - * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally - * done at the end is not done here. - */ -static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) -{ - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += initval; - - __jhash_mix(a, b, c); - - return c; -} +/* Best hash sizes are of power of two */ +#define jhash_size(n) ((u32)1<<(n)) +/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ +#define jhash_mask(n) (jhash_size(n)-1) + +extern u32 jhash(const void *key, u32 length, u32 initval); +extern u32 jhash2(const u32 *k, u32 length, u32 initval); +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval); static inline u32 jhash_2words(u32 a, u32 b, u32 initval) { diff --git a/lib/Makefile b/lib/Makefile index e6a3763..a1a4932 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -10,7 +10,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o \ idr.o int_sqrt.o extable.o prio_tree.o \ - sha1.o irq_regs.o reciprocal_div.o argv_split.o \ + jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o prio_heap.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o flex_array.o diff --git a/lib/jhash.c b/lib/jhash.c new file mode 100644 index 0000000..593cc77 --- /dev/null +++ b/lib/jhash.c @@ -0,0 +1,153 @@ +/* jhash.c: Jenkins hash support. + * + * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net) + * + * http://burtleburtle.net/bob/hash/ + * + * These are the credits from Bob's sources: + * + * lookup3.c, by Bob Jenkins, May 2006, Public Domain. + * + * These are functions for producing 32-bit hashes for hash table lookup. + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() + * are externally useful functions. Routines to test the hash are included + * if SELF_TEST is defined. You can use this free for any purpose. It's in + * the public domain. It has no warranty. + * + * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu) + * + * I've modified Bob's hash to be useful in the Linux kernel, and + * any bugs present are my fault. + * Jozsef + */ +#include <linux/bitops.h> +#include <linux/module.h> +#include <linux/jhash.h> + +/* __jhash_mix -- mix 3 32-bit values reversibly. */ +#define __jhash_mix(a, b, c) \ +{ \ + a -= c; a ^= rol32(c, 4); c += b; \ + b -= a; b ^= rol32(a, 6); a += c; \ + c -= b; c ^= rol32(b, 8); b += a; \ + a -= c; a ^= rol32(c, 16); c += b; \ + b -= a; b ^= rol32(a, 19); a += c; \ + c -= b; c ^= rol32(b, 4); b += a; \ +} + +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ +#define __jhash_final(a, b, c) \ +{ \ + c ^= b; c -= rol32(b, 14); \ + a ^= c; a -= rol32(c, 11); \ + b ^= a; b -= rol32(a, 25); \ + c ^= b; c -= rol32(b, 16); \ + a ^= c; a -= rol32(c, 4); \ + b ^= a; b -= rol32(a, 14); \ + c ^= b; c -= rol32(b, 24); \ +} + +/* An arbitrary initial parameter */ +#define JHASH_INITVAL 0xdeadbeef + +/* jhash - hash an arbitrary key + * @k: sequence of bytes as key + * @length: the length of the key + * @initval: the previous hash, or an arbitray value + * + * The generic version, hashes an arbitrary sequence of bytes. + * No alignment or length assumptions are made about the input key. + * + * Returns the hash value of the key. The result depends on endianness. + */ +u32 jhash(const void *key, u32 length, u32 initval) +{ + u32 a, b, c; + const u8 *k = key; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + length + initval; + + /* All but the last block: affect some 32 bits of (a,b,c) */ + while (length > 12) { + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24); + b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24); + c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24); + __jhash_mix(a, b, c); + length -= 12; + k += 12; + } + /* Last block: affect all 32 bits of (c) */ + /* All the case statements fall through */ + switch (length) { + case 12: c += (u32)k[11]<<24; + case 11: c += (u32)k[10]<<16; + case 10: c += (u32)k[9]<<8; + case 9: c += k[8]; + case 8: b += (u32)k[7]<<24; + case 7: b += (u32)k[6]<<16; + case 6: b += (u32)k[5]<<8; + case 5: b += k[4]; + case 4: a += (u32)k[3]<<24; + case 3: a += (u32)k[2]<<16; + case 2: a += (u32)k[1]<<8; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash); + +/* jhash2 - hash an array of u32's + * @k: the key which must be an array of u32's + * @length: the number of u32's in the key + * @initval: the previous hash, or an arbitray value + * + * Returns the hash value of the key. + */ +u32 jhash2(const u32 *k, u32 length, u32 initval) +{ + u32 a, b, c; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + (length<<2) + initval; + + /* Handle most of the key */ + while (length > 3) { + a += k[0]; + b += k[1]; + c += k[2]; + __jhash_mix(a, b, c); + length -= 3; + k += 3; + } + + /* Handle the last 3 u32's: all the case statements fall through */ + switch (length) { + case 3: c += k[2]; + case 2: b += k[1]; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash2); + +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) +{ + a += JHASH_INITVAL; + b += JHASH_INITVAL; + c += initval; + + __jhash_mix(a, b, c); + + return c; +} +EXPORT_SYMBOL(jhash_3words); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 13:15 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik @ 2010-11-25 13:49 ` Eric Dumazet 2010-11-25 13:55 ` Changli Gao 2010-11-25 14:41 ` Jozsef Kadlecsik 2010-11-27 3:31 ` Rusty Russell 1 sibling, 2 replies; 18+ messages in thread From: Eric Dumazet @ 2010-11-25 13:49 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell Le jeudi 25 novembre 2010 à 14:15 +0100, Jozsef Kadlecsik a écrit : > The current jhash.h implements the lookup2() hash function by Bob Jenkins. > However, lookup2() is outdated as Bob wrote a new hash function called > lookup3(). The new hash function > > - mixes better than lookup2(): it passes the check that every input bit > changes every output bit 50% of the time, while lookup2() failed it. > - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster > than lookup2() depending on the key length. > > The patch replaces the lookup2() implementation of the 'jhash*' > functions with that of lookup3(). > > You can read a longer comparison of the two and other hash functions at > http://burtleburtle.net/bob/hash/doobs.html. > > Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> > --- > include/linux/jhash.h | 136 +++----------------------------------------- > lib/Makefile | 2 +- > lib/jhash.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 162 insertions(+), 129 deletions(-) > create mode 100644 lib/jhash.c > ... I agree jhash() should be not be inlined. I am not sure for other variants. > +u32 jhash(const void *key, u32 length, u32 initval) > +{ > + u32 a, b, c; > + const u8 *k = key; > + > + /* Set up the internal state */ > + a = b = c = JHASH_INITVAL + length + initval; > + > + /* All but the last block: affect some 32 bits of (a,b,c) */ > + while (length > 12) { > + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24); disassembly code on x86_32 for the previous line : 26: 66 90 xchg %ax,%ax 28: 0f b6 72 01 movzbl 0x1(%edx),%esi 2c: 0f b6 4a 02 movzbl 0x2(%edx),%ecx 30: c1 e6 08 shl $0x8,%esi 33: c1 e1 10 shl $0x10,%ecx 36: 8d 0c 0e lea (%esi,%ecx,1),%ecx 39: 0f b6 32 movzbl (%edx),%esi 3c: 8d 34 31 lea (%ecx,%esi,1),%esi 3f: 0f b6 4a 03 movzbl 0x3(%edx),%ecx 43: c1 e1 18 shl $0x18,%ecx 46: 8d 0c 0e lea (%esi,%ecx,1),%ecx or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) : 1b: 0f b6 7b 01 movzbl 0x1(%ebx),%edi 1f: c1 e7 08 shl $0x8,%edi 22: 89 7d f0 mov %edi,-0x10(%ebp) 25: 0f b6 7b 02 movzbl 0x2(%ebx),%edi 29: c1 e7 10 shl $0x10,%edi 2c: 03 7d f0 add -0x10(%ebp),%edi 2f: 89 7d f0 mov %edi,-0x10(%ebp) 32: 0f b6 3b movzbl (%ebx),%edi 35: 03 7d f0 add -0x10(%ebp),%edi 38: 89 7d f0 mov %edi,-0x10(%ebp) 3b: 0f b6 7b 03 movzbl 0x3(%ebx),%edi 3f: c1 e7 18 shl $0x18,%edi 42: 03 7d f0 add -0x10(%ebp),%edi I suggest : #include <linux/unaligned/packed_struct.h> ... a += __get_unaligned_cpu32(k); b += __get_unaligned_cpu32(k+4); c += __get_unaligned_cpu32(k+8); Fits nicely in registers. > + b += k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24); > + c += k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24); > + __jhash_mix(a, b, c); > + length -= 12; > + k += 12; > + } > + /* Last block: affect all 32 bits of (c) */ > + /* All the case statements fall through */ > + switch (length) { > + case 12: c += (u32)k[11]<<24; > + case 11: c += (u32)k[10]<<16; > + case 10: c += (u32)k[9]<<8; > + case 9: c += k[8]; > + case 8: b += (u32)k[7]<<24; > + case 7: b += (u32)k[6]<<16; > + case 6: b += (u32)k[5]<<8; > + case 5: b += k[4]; > + case 4: a += (u32)k[3]<<24; > + case 3: a += (u32)k[2]<<16; > + case 2: a += (u32)k[1]<<8; > + case 1: a += k[0]; > + __jhash_final(a, b, c); > + case 0: /* Nothing left to add */ > + break; > + } > + > + return c; > +} > +EXPORT_SYMBOL(jhash); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 13:49 ` Eric Dumazet @ 2010-11-25 13:55 ` Changli Gao 2010-11-25 14:05 ` Eric Dumazet 2010-11-25 14:41 ` Jozsef Kadlecsik 1 sibling, 1 reply; 18+ messages in thread From: Changli Gao @ 2010-11-25 13:55 UTC (permalink / raw) To: Eric Dumazet Cc: Jozsef Kadlecsik, linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell On Thu, Nov 25, 2010 at 9:49 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le jeudi 25 novembre 2010 à 14:15 +0100, Jozsef Kadlecsik a écrit : >> The current jhash.h implements the lookup2() hash function by Bob Jenkins. >> However, lookup2() is outdated as Bob wrote a new hash function called >> lookup3(). The new hash function >> >> - mixes better than lookup2(): it passes the check that every input bit >> changes every output bit 50% of the time, while lookup2() failed it. >> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster >> than lookup2() depending on the key length. >> >> The patch replaces the lookup2() implementation of the 'jhash*' >> functions with that of lookup3(). >> >> You can read a longer comparison of the two and other hash functions at >> http://burtleburtle.net/bob/hash/doobs.html. >> >> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> >> --- >> include/linux/jhash.h | 136 +++----------------------------------------- >> lib/Makefile | 2 +- >> lib/jhash.c | 153 +++++++++++++++++++++++++++++++++++++++++++++++++ >> 3 files changed, 162 insertions(+), 129 deletions(-) >> create mode 100644 lib/jhash.c >> > ... > > I agree jhash() should be not be inlined. > > I am not sure for other variants. > >> +u32 jhash(const void *key, u32 length, u32 initval) >> +{ >> + u32 a, b, c; >> + const u8 *k = key; >> + >> + /* Set up the internal state */ >> + a = b = c = JHASH_INITVAL + length + initval; >> + >> + /* All but the last block: affect some 32 bits of (a,b,c) */ >> + while (length > 12) { >> + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24); > > disassembly code on x86_32 for the previous line : > > 26: 66 90 xchg %ax,%ax > 28: 0f b6 72 01 movzbl 0x1(%edx),%esi > 2c: 0f b6 4a 02 movzbl 0x2(%edx),%ecx > 30: c1 e6 08 shl $0x8,%esi > 33: c1 e1 10 shl $0x10,%ecx > 36: 8d 0c 0e lea (%esi,%ecx,1),%ecx > 39: 0f b6 32 movzbl (%edx),%esi > 3c: 8d 34 31 lea (%ecx,%esi,1),%esi > 3f: 0f b6 4a 03 movzbl 0x3(%edx),%ecx > 43: c1 e1 18 shl $0x18,%ecx > 46: 8d 0c 0e lea (%esi,%ecx,1),%ecx > > or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) : > > 1b: 0f b6 7b 01 movzbl 0x1(%ebx),%edi > 1f: c1 e7 08 shl $0x8,%edi > 22: 89 7d f0 mov %edi,-0x10(%ebp) > 25: 0f b6 7b 02 movzbl 0x2(%ebx),%edi > 29: c1 e7 10 shl $0x10,%edi > 2c: 03 7d f0 add -0x10(%ebp),%edi > 2f: 89 7d f0 mov %edi,-0x10(%ebp) > 32: 0f b6 3b movzbl (%ebx),%edi > 35: 03 7d f0 add -0x10(%ebp),%edi > 38: 89 7d f0 mov %edi,-0x10(%ebp) > 3b: 0f b6 7b 03 movzbl 0x3(%ebx),%edi > 3f: c1 e7 18 shl $0x18,%edi > 42: 03 7d f0 add -0x10(%ebp),%edi > > > I suggest : > > #include <linux/unaligned/packed_struct.h> > ... > a += __get_unaligned_cpu32(k); > b += __get_unaligned_cpu32(k+4); > c += __get_unaligned_cpu32(k+8); > > Fits nicely in registers. > I think you mean get_unaligned_le32(). -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 13:55 ` Changli Gao @ 2010-11-25 14:05 ` Eric Dumazet 0 siblings, 0 replies; 18+ messages in thread From: Eric Dumazet @ 2010-11-25 14:05 UTC (permalink / raw) To: Changli Gao Cc: Jozsef Kadlecsik, linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell Le jeudi 25 novembre 2010 à 21:55 +0800, Changli Gao a écrit : > > I suggest : > > > > #include <linux/unaligned/packed_struct.h> > > ... > > a += __get_unaligned_cpu32(k); > > b += __get_unaligned_cpu32(k+4); > > c += __get_unaligned_cpu32(k+8); > > > > Fits nicely in registers. > > > > I think you mean get_unaligned_le32(). > No, I meant __get_unaligned_cpu32() We do same thing in jhash2() : a += k[0]; b += k[1]; c += k[2]; We dont care of bit order of the 32bit quantity we are adding to a,b or c , as long its consistent for the current machine ;) get_unaligned_le32() would be slow on big endian arches. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 13:49 ` Eric Dumazet 2010-11-25 13:55 ` Changli Gao @ 2010-11-25 14:41 ` Jozsef Kadlecsik 2010-11-25 17:21 ` Eric Dumazet 1 sibling, 1 reply; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 14:41 UTC (permalink / raw) To: Eric Dumazet Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell On Thu, 25 Nov 2010, Eric Dumazet wrote: > I agree jhash() should be not be inlined. > > I am not sure for other variants. Yeah, I have got the same feelings. I decided to un-inline all of them because that way the internal macros are not exposed at all. > > +u32 jhash(const void *key, u32 length, u32 initval) > > +{ > > + u32 a, b, c; > > + const u8 *k = key; > > + > > + /* Set up the internal state */ > > + a = b = c = JHASH_INITVAL + length + initval; > > + > > + /* All but the last block: affect some 32 bits of (a,b,c) */ > > + while (length > 12) { > > + a += k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24); > > disassembly code on x86_32 for the previous line : > > 26: 66 90 xchg %ax,%ax > 28: 0f b6 72 01 movzbl 0x1(%edx),%esi > 2c: 0f b6 4a 02 movzbl 0x2(%edx),%ecx > 30: c1 e6 08 shl $0x8,%esi > 33: c1 e1 10 shl $0x10,%ecx > 36: 8d 0c 0e lea (%esi,%ecx,1),%ecx > 39: 0f b6 32 movzbl (%edx),%esi > 3c: 8d 34 31 lea (%ecx,%esi,1),%esi > 3f: 0f b6 4a 03 movzbl 0x3(%edx),%ecx > 43: c1 e1 18 shl $0x18,%ecx > 46: 8d 0c 0e lea (%esi,%ecx,1),%ecx > > or (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) : > > 1b: 0f b6 7b 01 movzbl 0x1(%ebx),%edi > 1f: c1 e7 08 shl $0x8,%edi > 22: 89 7d f0 mov %edi,-0x10(%ebp) > 25: 0f b6 7b 02 movzbl 0x2(%ebx),%edi > 29: c1 e7 10 shl $0x10,%edi > 2c: 03 7d f0 add -0x10(%ebp),%edi > 2f: 89 7d f0 mov %edi,-0x10(%ebp) > 32: 0f b6 3b movzbl (%ebx),%edi > 35: 03 7d f0 add -0x10(%ebp),%edi > 38: 89 7d f0 mov %edi,-0x10(%ebp) > 3b: 0f b6 7b 03 movzbl 0x3(%ebx),%edi > 3f: c1 e7 18 shl $0x18,%edi > 42: 03 7d f0 add -0x10(%ebp),%edi > > > I suggest : > > #include <linux/unaligned/packed_struct.h> > ... > a += __get_unaligned_cpu32(k); > b += __get_unaligned_cpu32(k+4); > c += __get_unaligned_cpu32(k+8); > > Fits nicely in registers. Good idea, thanks! Here follows the updated second patch: The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). The new hash function - mixes better than lookup2(): it passes the check that every input bit changes every output bit 50% of the time, while lookup2() failed it. - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster than lookup2() depending on the key length. The patch replaces the lookup2() implementation of the 'jhash*' functions with that of lookup3(). You can read a longer comparison of the two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> --- include/linux/jhash.h | 136 +++---------------------------------------- lib/Makefile | 2 +- lib/jhash.c | 154 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 163 insertions(+), 129 deletions(-) create mode 100644 lib/jhash.c diff --git a/include/linux/jhash.h b/include/linux/jhash.h index ced1159..ca69ac3 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -1,134 +1,14 @@ #ifndef _LINUX_JHASH_H #define _LINUX_JHASH_H -/* jhash.h: Jenkins hash support. - * - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net) - * - * http://burtleburtle.net/bob/hash/ - * - * These are the credits from Bob's sources: - * - * lookup2.c, by Bob Jenkins, December 1996, Public Domain. - * hash(), hash2(), hash3, and mix() are externally useful functions. - * Routines to test the hash are included if SELF_TEST is defined. - * You can use this free for any purpose. It has no warranty. - * - * Copyright (C) 2003 David S. Miller (davem@redhat.com) - * - * I've modified Bob's hash to be useful in the Linux kernel, and - * any bugs present are surely my fault. -DaveM - */ - -/* NOTE: Arguments are modified. */ -#define __jhash_mix(a, b, c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ -} - -/* The golden ration: an arbitrary value */ -#define JHASH_GOLDEN_RATIO 0x9e3779b9 - -/* The most generic version, hashes an arbitrary sequence - * of bytes. No alignment or length assumptions are made about - * the input key. - */ -static inline u32 jhash(const void *key, u32 length, u32 initval) -{ - u32 a, b, c, len; - const u8 *k = key; - - len = length; - a = b = JHASH_GOLDEN_RATIO; - c = initval; - - while (len >= 12) { - a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24)); - b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24)); - c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24)); - - __jhash_mix(a,b,c); - - k += 12; - len -= 12; - } - - c += length; - switch (len) { - case 11: c += ((u32)k[10]<<24); - case 10: c += ((u32)k[9]<<16); - case 9 : c += ((u32)k[8]<<8); - case 8 : b += ((u32)k[7]<<24); - case 7 : b += ((u32)k[6]<<16); - case 6 : b += ((u32)k[5]<<8); - case 5 : b += k[4]; - case 4 : a += ((u32)k[3]<<24); - case 3 : a += ((u32)k[2]<<16); - case 2 : a += ((u32)k[1]<<8); - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; -} - -/* A special optimized version that handles 1 or more of u32s. - * The length parameter here is the number of u32s in the key. - */ -static inline u32 jhash2(const u32 *k, u32 length, u32 initval) -{ - u32 a, b, c, len; - - a = b = JHASH_GOLDEN_RATIO; - c = initval; - len = length; - - while (len >= 3) { - a += k[0]; - b += k[1]; - c += k[2]; - __jhash_mix(a, b, c); - k += 3; len -= 3; - } - - c += length * 4; - - switch (len) { - case 2 : b += k[1]; - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; -} - - -/* A special ultra-optimized versions that knows they are hashing exactly - * 3, 2 or 1 word(s). - * - * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally - * done at the end is not done here. - */ -static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) -{ - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += initval; - - __jhash_mix(a, b, c); - - return c; -} +/* Best hash sizes are of power of two */ +#define jhash_size(n) ((u32)1<<(n)) +/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ +#define jhash_mask(n) (jhash_size(n)-1) + +extern u32 jhash(const void *key, u32 length, u32 initval); +extern u32 jhash2(const u32 *k, u32 length, u32 initval); +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval); static inline u32 jhash_2words(u32 a, u32 b, u32 initval) { diff --git a/lib/Makefile b/lib/Makefile index e6a3763..a1a4932 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -10,7 +10,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o \ idr.o int_sqrt.o extable.o prio_tree.o \ - sha1.o irq_regs.o reciprocal_div.o argv_split.o \ + jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o prio_heap.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o flex_array.o diff --git a/lib/jhash.c b/lib/jhash.c new file mode 100644 index 0000000..538277b --- /dev/null +++ b/lib/jhash.c @@ -0,0 +1,154 @@ +/* jhash.c: Jenkins hash support. + * + * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net) + * + * http://burtleburtle.net/bob/hash/ + * + * These are the credits from Bob's sources: + * + * lookup3.c, by Bob Jenkins, May 2006, Public Domain. + * + * These are functions for producing 32-bit hashes for hash table lookup. + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() + * are externally useful functions. Routines to test the hash are included + * if SELF_TEST is defined. You can use this free for any purpose. It's in + * the public domain. It has no warranty. + * + * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu) + * + * I've modified Bob's hash to be useful in the Linux kernel, and + * any bugs present are my fault. + * Jozsef + */ +#include <linux/bitops.h> +#include <linux/module.h> +#include <linux/unaligned/packed_struct.h> +#include <linux/jhash.h> + +/* __jhash_mix -- mix 3 32-bit values reversibly. */ +#define __jhash_mix(a, b, c) \ +{ \ + a -= c; a ^= rol32(c, 4); c += b; \ + b -= a; b ^= rol32(a, 6); a += c; \ + c -= b; c ^= rol32(b, 8); b += a; \ + a -= c; a ^= rol32(c, 16); c += b; \ + b -= a; b ^= rol32(a, 19); a += c; \ + c -= b; c ^= rol32(b, 4); b += a; \ +} + +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ +#define __jhash_final(a, b, c) \ +{ \ + c ^= b; c -= rol32(b, 14); \ + a ^= c; a -= rol32(c, 11); \ + b ^= a; b -= rol32(a, 25); \ + c ^= b; c -= rol32(b, 16); \ + a ^= c; a -= rol32(c, 4); \ + b ^= a; b -= rol32(a, 14); \ + c ^= b; c -= rol32(b, 24); \ +} + +/* An arbitrary initial parameter */ +#define JHASH_INITVAL 0xdeadbeef + +/* jhash - hash an arbitrary key + * @k: sequence of bytes as key + * @length: the length of the key + * @initval: the previous hash, or an arbitray value + * + * The generic version, hashes an arbitrary sequence of bytes. + * No alignment or length assumptions are made about the input key. + * + * Returns the hash value of the key. The result depends on endianness. + */ +u32 jhash(const void *key, u32 length, u32 initval) +{ + u32 a, b, c; + const u8 *k = key; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + length + initval; + + /* All but the last block: affect some 32 bits of (a,b,c) */ + while (length > 12) { + a += __get_unaligned_cpu32(k); + b += __get_unaligned_cpu32(k + 4); + c += __get_unaligned_cpu32(k + 8); + __jhash_mix(a, b, c); + length -= 12; + k += 12; + } + /* Last block: affect all 32 bits of (c) */ + /* All the case statements fall through */ + switch (length) { + case 12: c += (u32)k[11]<<24; + case 11: c += (u32)k[10]<<16; + case 10: c += (u32)k[9]<<8; + case 9: c += k[8]; + case 8: b += (u32)k[7]<<24; + case 7: b += (u32)k[6]<<16; + case 6: b += (u32)k[5]<<8; + case 5: b += k[4]; + case 4: a += (u32)k[3]<<24; + case 3: a += (u32)k[2]<<16; + case 2: a += (u32)k[1]<<8; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash); + +/* jhash2 - hash an array of u32's + * @k: the key which must be an array of u32's + * @length: the number of u32's in the key + * @initval: the previous hash, or an arbitray value + * + * Returns the hash value of the key. + */ +u32 jhash2(const u32 *k, u32 length, u32 initval) +{ + u32 a, b, c; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + (length<<2) + initval; + + /* Handle most of the key */ + while (length > 3) { + a += k[0]; + b += k[1]; + c += k[2]; + __jhash_mix(a, b, c); + length -= 3; + k += 3; + } + + /* Handle the last 3 u32's: all the case statements fall through */ + switch (length) { + case 3: c += k[2]; + case 2: b += k[1]; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash2); + +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) +{ + a += JHASH_INITVAL; + b += JHASH_INITVAL; + c += initval; + + __jhash_mix(a, b, c); + + return c; +} +EXPORT_SYMBOL(jhash_3words); -- 1.7.0.4 Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 14:41 ` Jozsef Kadlecsik @ 2010-11-25 17:21 ` Eric Dumazet 2010-11-25 21:14 ` Jozsef Kadlecsik 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2010-11-25 17:21 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell Le jeudi 25 novembre 2010 à 15:41 +0100, Jozsef Kadlecsik a écrit : ... > +/* __jhash_mix -- mix 3 32-bit values reversibly. */ > +#define __jhash_mix(a, b, c) \ > +{ \ > + a -= c; a ^= rol32(c, 4); c += b; \ > + b -= a; b ^= rol32(a, 6); a += c; \ > + c -= b; c ^= rol32(b, 8); b += a; \ > + a -= c; a ^= rol32(c, 16); c += b; \ > + b -= a; b ^= rol32(a, 19); a += c; \ > + c -= b; c ^= rol32(b, 4); b += a; \ > +} > + > +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ > +#define __jhash_final(a, b, c) \ > +{ \ > + c ^= b; c -= rol32(b, 14); \ > + a ^= c; a -= rol32(c, 11); \ > + b ^= a; b -= rol32(a, 25); \ > + c ^= b; c -= rol32(b, 16); \ > + a ^= c; a -= rol32(c, 4); \ > + b ^= a; b -= rol32(a, 14); \ > + c ^= b; c -= rol32(b, 24); \ > +} > + So we now have a special __jhash_final(a, b, c) thing for the last values. > +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ > +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) > +{ > + a += JHASH_INITVAL; > + b += JHASH_INITVAL; > + c += initval; > + > + __jhash_mix(a, b, c); > + > + return c; > +} > +EXPORT_SYMBOL(jhash_3words); But you dont use it in jhash_3words(). I do think jhash_3words() should stay inlined, unless maybe CONFIG_CC_OPTIMIZE_FOR_SIZE=y We hit it several time per packet in network stack in RX path. Once in skb_get_rxhash() (unless device fills skb->rxhash) Once at least in conntrack (if used). Once in UDP or TCP stack -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 17:21 ` Eric Dumazet @ 2010-11-25 21:14 ` Jozsef Kadlecsik 0 siblings, 0 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 21:14 UTC (permalink / raw) To: Eric Dumazet Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell On Thu, 25 Nov 2010, Eric Dumazet wrote: > Le jeudi 25 novembre 2010 ? 15:41 +0100, Jozsef Kadlecsik a ?crit : > > > +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ > > +#define __jhash_final(a, b, c) \ > > +{ \ > > + c ^= b; c -= rol32(b, 14); \ > > + a ^= c; a -= rol32(c, 11); \ > > + b ^= a; b -= rol32(a, 25); \ > > + c ^= b; c -= rol32(b, 16); \ > > + a ^= c; a -= rol32(c, 4); \ > > + b ^= a; b -= rol32(a, 14); \ > > + c ^= b; c -= rol32(b, 24); \ > > +} > > + > > So we now have a special __jhash_final(a, b, c) thing for the last > values. Yes, and... > > +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ > > +u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) > > +{ > > + a += JHASH_INITVAL; > > + b += JHASH_INITVAL; > > + c += initval; > > + > > + __jhash_mix(a, b, c); > > + > > + return c; > > +} > > +EXPORT_SYMBOL(jhash_3words); > > But you dont use it in jhash_3words(). ... excellent spotting! That should be __jhash_final in jhash_3words. > I do think jhash_3words() should stay inlined, unless maybe > CONFIG_CC_OPTIMIZE_FOR_SIZE=y > > We hit it several time per packet in network stack in RX path. > > Once in skb_get_rxhash() (unless device fills skb->rxhash) > Once at least in conntrack (if used). > Once in UDP or TCP stack There follows the new version of the second patch: jhash_3words is moved back to inlined form and the bug in it fixed. Thanks for your thorough reviewing indeed! The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). The new hash function - mixes better than lookup2(): it passes the check that every input bit changes every output bit 50% of the time, while lookup2() failed it. - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster than lookup2() depending on the key length. The patch replaces the lookup2() implementation of the 'jhash*' functions with that of lookup3(). You can read a longer comparison of the two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> --- include/linux/jhash.h | 140 ++++++++----------------------------------------- lib/Makefile | 2 +- lib/jhash.c | 127 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 151 insertions(+), 118 deletions(-) create mode 100644 lib/jhash.c diff --git a/include/linux/jhash.h b/include/linux/jhash.h index ced1159..963abad 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -1,131 +1,37 @@ #ifndef _LINUX_JHASH_H #define _LINUX_JHASH_H -/* jhash.h: Jenkins hash support. - * - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net) - * - * http://burtleburtle.net/bob/hash/ - * - * These are the credits from Bob's sources: - * - * lookup2.c, by Bob Jenkins, December 1996, Public Domain. - * hash(), hash2(), hash3, and mix() are externally useful functions. - * Routines to test the hash are included if SELF_TEST is defined. - * You can use this free for any purpose. It has no warranty. - * - * Copyright (C) 2003 David S. Miller (davem@redhat.com) - * - * I've modified Bob's hash to be useful in the Linux kernel, and - * any bugs present are surely my fault. -DaveM - */ - -/* NOTE: Arguments are modified. */ -#define __jhash_mix(a, b, c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ -} - -/* The golden ration: an arbitrary value */ -#define JHASH_GOLDEN_RATIO 0x9e3779b9 - -/* The most generic version, hashes an arbitrary sequence - * of bytes. No alignment or length assumptions are made about - * the input key. - */ -static inline u32 jhash(const void *key, u32 length, u32 initval) -{ - u32 a, b, c, len; - const u8 *k = key; - - len = length; - a = b = JHASH_GOLDEN_RATIO; - c = initval; - - while (len >= 12) { - a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24)); - b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24)); - c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24)); - - __jhash_mix(a,b,c); - - k += 12; - len -= 12; - } - - c += length; - switch (len) { - case 11: c += ((u32)k[10]<<24); - case 10: c += ((u32)k[9]<<16); - case 9 : c += ((u32)k[8]<<8); - case 8 : b += ((u32)k[7]<<24); - case 7 : b += ((u32)k[6]<<16); - case 6 : b += ((u32)k[5]<<8); - case 5 : b += k[4]; - case 4 : a += ((u32)k[3]<<24); - case 3 : a += ((u32)k[2]<<16); - case 2 : a += ((u32)k[1]<<8); - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; +/* Best hash sizes are of power of two */ +#define jhash_size(n) ((u32)1<<(n)) +/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ +#define jhash_mask(n) (jhash_size(n)-1) + +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ +#define __jhash_final(a, b, c) \ +{ \ + c ^= b; c -= rol32(b, 14); \ + a ^= c; a -= rol32(c, 11); \ + b ^= a; b -= rol32(a, 25); \ + c ^= b; c -= rol32(b, 16); \ + a ^= c; a -= rol32(c, 4); \ + b ^= a; b -= rol32(a, 14); \ + c ^= b; c -= rol32(b, 24); \ } -/* A special optimized version that handles 1 or more of u32s. - * The length parameter here is the number of u32s in the key. - */ -static inline u32 jhash2(const u32 *k, u32 length, u32 initval) -{ - u32 a, b, c, len; - - a = b = JHASH_GOLDEN_RATIO; - c = initval; - len = length; - - while (len >= 3) { - a += k[0]; - b += k[1]; - c += k[2]; - __jhash_mix(a, b, c); - k += 3; len -= 3; - } - - c += length * 4; - - switch (len) { - case 2 : b += k[1]; - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); - - return c; -} +/* An arbitrary initial parameter */ +#define JHASH_INITVAL 0xdeadbeef +extern u32 jhash(const void *key, u32 length, u32 initval); +extern u32 jhash2(const u32 *k, u32 length, u32 initval); -/* A special ultra-optimized versions that knows they are hashing exactly - * 3, 2 or 1 word(s). - * - * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally - * done at the end is not done here. - */ +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) { - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; + a += JHASH_INITVAL; + b += JHASH_INITVAL; c += initval; - __jhash_mix(a, b, c); + __jhash_final(a, b, c); return c; } diff --git a/lib/Makefile b/lib/Makefile index e6a3763..a1a4932 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -10,7 +10,7 @@ endif lib-y := ctype.o string.o vsprintf.o cmdline.o \ rbtree.o radix-tree.o dump_stack.o \ idr.o int_sqrt.o extable.o prio_tree.o \ - sha1.o irq_regs.o reciprocal_div.o argv_split.o \ + jhash.o sha1.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o prio_heap.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o flex_array.o diff --git a/lib/jhash.c b/lib/jhash.c new file mode 100644 index 0000000..e0c8d11 --- /dev/null +++ b/lib/jhash.c @@ -0,0 +1,127 @@ +/* jhash.c: Jenkins hash support. + * + * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net) + * + * http://burtleburtle.net/bob/hash/ + * + * These are the credits from Bob's sources: + * + * lookup3.c, by Bob Jenkins, May 2006, Public Domain. + * + * These are functions for producing 32-bit hashes for hash table lookup. + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() + * are externally useful functions. Routines to test the hash are included + * if SELF_TEST is defined. You can use this free for any purpose. It's in + * the public domain. It has no warranty. + * + * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu) + * + * I've modified Bob's hash to be useful in the Linux kernel, and + * any bugs present are my fault. + * Jozsef + */ +#include <linux/bitops.h> +#include <linux/module.h> +#include <linux/unaligned/packed_struct.h> +#include <linux/jhash.h> + +/* __jhash_mix -- mix 3 32-bit values reversibly. */ +#define __jhash_mix(a, b, c) \ +{ \ + a -= c; a ^= rol32(c, 4); c += b; \ + b -= a; b ^= rol32(a, 6); a += c; \ + c -= b; c ^= rol32(b, 8); b += a; \ + a -= c; a ^= rol32(c, 16); c += b; \ + b -= a; b ^= rol32(a, 19); a += c; \ + c -= b; c ^= rol32(b, 4); b += a; \ +} + +/* jhash - hash an arbitrary key + * @k: sequence of bytes as key + * @length: the length of the key + * @initval: the previous hash, or an arbitray value + * + * The generic version, hashes an arbitrary sequence of bytes. + * No alignment or length assumptions are made about the input key. + * + * Returns the hash value of the key. The result depends on endianness. + */ +u32 jhash(const void *key, u32 length, u32 initval) +{ + u32 a, b, c; + const u8 *k = key; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + length + initval; + + /* All but the last block: affect some 32 bits of (a,b,c) */ + while (length > 12) { + a += __get_unaligned_cpu32(k); + b += __get_unaligned_cpu32(k + 4); + c += __get_unaligned_cpu32(k + 8); + __jhash_mix(a, b, c); + length -= 12; + k += 12; + } + /* Last block: affect all 32 bits of (c) */ + /* All the case statements fall through */ + switch (length) { + case 12: c += (u32)k[11]<<24; + case 11: c += (u32)k[10]<<16; + case 10: c += (u32)k[9]<<8; + case 9: c += k[8]; + case 8: b += (u32)k[7]<<24; + case 7: b += (u32)k[6]<<16; + case 6: b += (u32)k[5]<<8; + case 5: b += k[4]; + case 4: a += (u32)k[3]<<24; + case 3: a += (u32)k[2]<<16; + case 2: a += (u32)k[1]<<8; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash); + +/* jhash2 - hash an array of u32's + * @k: the key which must be an array of u32's + * @length: the number of u32's in the key + * @initval: the previous hash, or an arbitray value + * + * Returns the hash value of the key. + */ +u32 jhash2(const u32 *k, u32 length, u32 initval) +{ + u32 a, b, c; + + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + (length<<2) + initval; + + /* Handle most of the key */ + while (length > 3) { + a += k[0]; + b += k[1]; + c += k[2]; + __jhash_mix(a, b, c); + length -= 3; + k += 3; + } + + /* Handle the last 3 u32's: all the case statements fall through */ + switch (length) { + case 3: c += k[2]; + case 2: b += k[1]; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } + + return c; +} +EXPORT_SYMBOL(jhash2); + -- 1.7.0.4 Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] The new jhash implementation 2010-11-25 13:15 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik 2010-11-25 13:49 ` Eric Dumazet @ 2010-11-27 3:31 ` Rusty Russell 2010-12-03 12:38 ` [PATCH 0/2] New jhash function Jozsef Kadlecsik 1 sibling, 1 reply; 18+ messages in thread From: Rusty Russell @ 2010-11-27 3:31 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds On Thu, 25 Nov 2010 11:45:08 pm Jozsef Kadlecsik wrote: > The current jhash.h implements the lookup2() hash function by Bob Jenkins. > However, lookup2() is outdated as Bob wrote a new hash function called > lookup3(). The new hash function Oops, we discussed this ages ago, looks like it fell through the cracks. Thanks Jozsef! > +extern u32 jhash(const void *key, u32 length, u32 initval); > +extern u32 jhash2(const u32 *k, u32 length, u32 initval); > +extern u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval); This last one can be marked __attribute__((const)) for a little extra compiler help, too. Cheers, Rusty. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2] New jhash function 2010-11-27 3:31 ` Rusty Russell @ 2010-12-03 12:38 ` Jozsef Kadlecsik 2010-12-03 12:39 ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik 2010-12-08 17:09 ` [PATCH 0/2] New jhash function David Miller 0 siblings, 2 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-12-03 12:38 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik Hi, The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). There is a longer comparison of those two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. Please consider applying the following patches. Speed I wrote a small benchmark program to compare jhash2 and jhash3 (you can download it from http://www.kfki.hu/~kadlec/sw/netfilter/jhash23.tgz). On a machine with Core2 Duo and compiled with -O2, the ratio of the execution time for the byte variants of the hash functions were (in parens the different key sizes): jhash2/jhash3 (4 bytes): 1.587518 jhash2/jhash3 (8 bytes): 1.282824 jhash2/jhash3 (12 bytes): 2.349628 jhash2/jhash3 (16 bytes): 1.466988 jhash2/jhash3 (32 bytes): 1.501063 jhash2/jhash3 (64 bytes): 1.587527 jhash2/jhash3 (128 bytes): 1.628037 jhash2/jhash3 (256 bytes): 1.648318 Similarly, for the word variants jhash2/jhash3 (1 words): 1.300904 jhash2/jhash3 (2 words): 1.316108 jhash2/jhash3 (3 words): 2.249708 jhash2/jhash3 (4 words): 1.460192 jhash2/jhash3 (8 words): 1.501438 jhash2/jhash3 (16 words): 1.558039 jhash2/jhash3 (32 words): 1.520082 jhash2/jhash3 (64 words): 1.528347 Sizes When compiling just the byte variants the sizes are text data bss dec hex filename 661 0 0 661 295 jhashb2.o 441 0 0 441 1b9 jhashb3.o and for the word variants text data bss dec hex filename 354 0 0 354 162 jhashw2.o 248 0 0 248 f8 jhashw3.o I compiled the kernel with "allyesconfig", in three variants: jhash2, jhash3 and jhash3 un-inlined text data bss dec hex filename 69297477 11282083 35904032 116483592 6f16608 vmlinux.jhash2 69293829 11282083 35903728 116479640 6f15698 vmlinux.jhash3 69288578 11282083 35902336 116472997 6f13ca5 vmlinux.jhash3-uninlined With jhash3 we can gain 3.6k and un-inlining shrinks the code with an additional 5.2k. In the patch I left jhash(3) inlined. Uniformity According to Bob Jenkins, lookup3() mixes better than lookup2(): it passes the check that every input bit changes every output bit 50% of the time, while lookup2() failed it. In order to verify it I added jhash3 to the cttest program, which tests hash functions with (artifical, worst case) netfilter conntrack data and calculates the statistics (chi-square, probability of longest bucket, etc). You can find the program and the results here: http://www.kfki.hu/~kadlec/sw/netfilter/ct3/ - to sum up, lookup3() produces uniform key values and no weakness could be spotted. Many thanks to Eric Dumazet for his thorough reviewing. Dave applied the first patch to net-next-2.6. Jozsef Kadlecsik (2): Remove calls to jhash internals The new jhash implementation include/linux/jhash.h | 183 ++++++++++++++++++++++---------------- net/ipv6/inet6_connection_sock.c | 18 ++-- net/ipv6/reassembly.c | 36 ++++---- 3 files changed, 129 insertions(+), 108 deletions(-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] Remove calls to jhash internals 2010-12-03 12:38 ` [PATCH 0/2] New jhash function Jozsef Kadlecsik @ 2010-12-03 12:39 ` Jozsef Kadlecsik 2010-12-03 12:39 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik 2010-12-08 17:09 ` [PATCH 0/2] New jhash function David Miller 1 sibling, 1 reply; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-12-03 12:39 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik Because the jhash implementation changes, replace the calls to the jhash internal macros with calls to the jhash functions. --- net/ipv6/inet6_connection_sock.c | 18 ++++++++---------- net/ipv6/reassembly.c | 36 ++++++++++++++++-------------------- 2 files changed, 24 insertions(+), 30 deletions(-) diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c index 8a16280..861d252 100644 --- a/net/ipv6/inet6_connection_sock.c +++ b/net/ipv6/inet6_connection_sock.c @@ -60,18 +60,16 @@ EXPORT_SYMBOL_GPL(inet6_csk_bind_conflict); static u32 inet6_synq_hash(const struct in6_addr *raddr, const __be16 rport, const u32 rnd, const u16 synq_hsize) { - u32 a = (__force u32)raddr->s6_addr32[0]; - u32 b = (__force u32)raddr->s6_addr32[1]; - u32 c = (__force u32)raddr->s6_addr32[2]; - - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += rnd; - __jhash_mix(a, b, c); - - a += (__force u32)raddr->s6_addr32[3]; - b += (__force u32)rport; - __jhash_mix(a, b, c); + u32 c; + + c = jhash_3words((__force u32)raddr->s6_addr32[0], + (__force u32)raddr->s6_addr32[1], + (__force u32)raddr->s6_addr32[2], + rnd); + + c = jhash_2words((__force u32)raddr->s6_addr32[3], + (__force u32)rport, + c); return c & (synq_hsize - 1); } diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index c7ba314..5e57490 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -104,26 +104,22 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev, unsigned int inet6_hash_frag(__be32 id, const struct in6_addr *saddr, const struct in6_addr *daddr, u32 rnd) { - u32 a, b, c; - - a = (__force u32)saddr->s6_addr32[0]; - b = (__force u32)saddr->s6_addr32[1]; - c = (__force u32)saddr->s6_addr32[2]; - - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; - c += rnd; - __jhash_mix(a, b, c); - - a += (__force u32)saddr->s6_addr32[3]; - b += (__force u32)daddr->s6_addr32[0]; - c += (__force u32)daddr->s6_addr32[1]; - __jhash_mix(a, b, c); - - a += (__force u32)daddr->s6_addr32[2]; - b += (__force u32)daddr->s6_addr32[3]; - c += (__force u32)id; - __jhash_mix(a, b, c); + u32 c; + + c = jhash_3words((__force u32)saddr->s6_addr32[0], + (__force u32)saddr->s6_addr32[1], + (__force u32)saddr->s6_addr32[2], + rnd); + + c = jhash_3words((__force u32)saddr->s6_addr32[3], + (__force u32)daddr->s6_addr32[0], + (__force u32)daddr->s6_addr32[1], + c); + + c = jhash_3words((__force u32)daddr->s6_addr32[2], + (__force u32)daddr->s6_addr32[3], + (__force u32)id, + c); return c & (INETFRAGS_HASHSZ - 1); } -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/2] The new jhash implementation 2010-12-03 12:39 ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik @ 2010-12-03 12:39 ` Jozsef Kadlecsik 0 siblings, 0 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-12-03 12:39 UTC (permalink / raw) To: linux-kernel Cc: netdev, netfilter-devel, Linus Torvalds, Rusty Russell, Jozsef Kadlecsik The current jhash.h implements the lookup2() hash function by Bob Jenkins. However, lookup2() is outdated as Bob wrote a new hash function called lookup3(). The patch replaces the lookup2() implementation of the 'jhash*' functions with that of lookup3(). You can read a longer comparison of the two and other hash functions at http://burtleburtle.net/bob/hash/doobs.html. --- include/linux/jhash.h | 183 ++++++++++++++++++++++++++++--------------------- 1 files changed, 105 insertions(+), 78 deletions(-) diff --git a/include/linux/jhash.h b/include/linux/jhash.h index ced1159..47cb09e 100644 --- a/include/linux/jhash.h +++ b/include/linux/jhash.h @@ -3,129 +3,156 @@ /* jhash.h: Jenkins hash support. * - * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net) + * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net) * * http://burtleburtle.net/bob/hash/ * * These are the credits from Bob's sources: * - * lookup2.c, by Bob Jenkins, December 1996, Public Domain. - * hash(), hash2(), hash3, and mix() are externally useful functions. - * Routines to test the hash are included if SELF_TEST is defined. - * You can use this free for any purpose. It has no warranty. + * lookup3.c, by Bob Jenkins, May 2006, Public Domain. * - * Copyright (C) 2003 David S. Miller (davem@redhat.com) + * These are functions for producing 32-bit hashes for hash table lookup. + * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() + * are externally useful functions. Routines to test the hash are included + * if SELF_TEST is defined. You can use this free for any purpose. It's in + * the public domain. It has no warranty. + * + * Copyright (C) 2009-2010 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu) * * I've modified Bob's hash to be useful in the Linux kernel, and - * any bugs present are surely my fault. -DaveM + * any bugs present are my fault. + * Jozsef */ +#include <linux/bitops.h> +#include <linux/unaligned/packed_struct.h> + +/* Best hash sizes are of power of two */ +#define jhash_size(n) ((u32)1<<(n)) +/* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ +#define jhash_mask(n) (jhash_size(n)-1) + +/* __jhash_mix -- mix 3 32-bit values reversibly. */ +#define __jhash_mix(a, b, c) \ +{ \ + a -= c; a ^= rol32(c, 4); c += b; \ + b -= a; b ^= rol32(a, 6); a += c; \ + c -= b; c ^= rol32(b, 8); b += a; \ + a -= c; a ^= rol32(c, 16); c += b; \ + b -= a; b ^= rol32(a, 19); a += c; \ + c -= b; c ^= rol32(b, 4); b += a; \ +} -/* NOTE: Arguments are modified. */ -#define __jhash_mix(a, b, c) \ -{ \ - a -= b; a -= c; a ^= (c>>13); \ - b -= c; b -= a; b ^= (a<<8); \ - c -= a; c -= b; c ^= (b>>13); \ - a -= b; a -= c; a ^= (c>>12); \ - b -= c; b -= a; b ^= (a<<16); \ - c -= a; c -= b; c ^= (b>>5); \ - a -= b; a -= c; a ^= (c>>3); \ - b -= c; b -= a; b ^= (a<<10); \ - c -= a; c -= b; c ^= (b>>15); \ +/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ +#define __jhash_final(a, b, c) \ +{ \ + c ^= b; c -= rol32(b, 14); \ + a ^= c; a -= rol32(c, 11); \ + b ^= a; b -= rol32(a, 25); \ + c ^= b; c -= rol32(b, 16); \ + a ^= c; a -= rol32(c, 4); \ + b ^= a; b -= rol32(a, 14); \ + c ^= b; c -= rol32(b, 24); \ } -/* The golden ration: an arbitrary value */ -#define JHASH_GOLDEN_RATIO 0x9e3779b9 +/* An arbitrary initial parameter */ +#define JHASH_INITVAL 0xdeadbeef -/* The most generic version, hashes an arbitrary sequence - * of bytes. No alignment or length assumptions are made about - * the input key. +/* jhash - hash an arbitrary key + * @k: sequence of bytes as key + * @length: the length of the key + * @initval: the previous hash, or an arbitray value + * + * The generic version, hashes an arbitrary sequence of bytes. + * No alignment or length assumptions are made about the input key. + * + * Returns the hash value of the key. The result depends on endianness. */ static inline u32 jhash(const void *key, u32 length, u32 initval) { - u32 a, b, c, len; + u32 a, b, c; const u8 *k = key; - len = length; - a = b = JHASH_GOLDEN_RATIO; - c = initval; - - while (len >= 12) { - a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24)); - b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24)); - c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24)); - - __jhash_mix(a,b,c); + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + length + initval; + /* All but the last block: affect some 32 bits of (a,b,c) */ + while (length > 12) { + a += __get_unaligned_cpu32(k); + b += __get_unaligned_cpu32(k + 4); + c += __get_unaligned_cpu32(k + 8); + __jhash_mix(a, b, c); + length -= 12; k += 12; - len -= 12; } - - c += length; - switch (len) { - case 11: c += ((u32)k[10]<<24); - case 10: c += ((u32)k[9]<<16); - case 9 : c += ((u32)k[8]<<8); - case 8 : b += ((u32)k[7]<<24); - case 7 : b += ((u32)k[6]<<16); - case 6 : b += ((u32)k[5]<<8); - case 5 : b += k[4]; - case 4 : a += ((u32)k[3]<<24); - case 3 : a += ((u32)k[2]<<16); - case 2 : a += ((u32)k[1]<<8); - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); + /* Last block: affect all 32 bits of (c) */ + /* All the case statements fall through */ + switch (length) { + case 12: c += (u32)k[11]<<24; + case 11: c += (u32)k[10]<<16; + case 10: c += (u32)k[9]<<8; + case 9: c += k[8]; + case 8: b += (u32)k[7]<<24; + case 7: b += (u32)k[6]<<16; + case 6: b += (u32)k[5]<<8; + case 5: b += k[4]; + case 4: a += (u32)k[3]<<24; + case 3: a += (u32)k[2]<<16; + case 2: a += (u32)k[1]<<8; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } return c; } -/* A special optimized version that handles 1 or more of u32s. - * The length parameter here is the number of u32s in the key. +/* jhash2 - hash an array of u32's + * @k: the key which must be an array of u32's + * @length: the number of u32's in the key + * @initval: the previous hash, or an arbitray value + * + * Returns the hash value of the key. */ static inline u32 jhash2(const u32 *k, u32 length, u32 initval) { - u32 a, b, c, len; + u32 a, b, c; - a = b = JHASH_GOLDEN_RATIO; - c = initval; - len = length; + /* Set up the internal state */ + a = b = c = JHASH_INITVAL + (length<<2) + initval; - while (len >= 3) { + /* Handle most of the key */ + while (length > 3) { a += k[0]; b += k[1]; c += k[2]; __jhash_mix(a, b, c); - k += 3; len -= 3; + length -= 3; + k += 3; } - c += length * 4; - - switch (len) { - case 2 : b += k[1]; - case 1 : a += k[0]; - }; - - __jhash_mix(a,b,c); + /* Handle the last 3 u32's: all the case statements fall through */ + switch (length) { + case 3: c += k[2]; + case 2: b += k[1]; + case 1: a += k[0]; + __jhash_final(a, b, c); + case 0: /* Nothing left to add */ + break; + } return c; } -/* A special ultra-optimized versions that knows they are hashing exactly - * 3, 2 or 1 word(s). - * - * NOTE: In particular the "c += length; __jhash_mix(a,b,c);" normally - * done at the end is not done here. - */ +/* jhash_3words - hash exactly 3, 2 or 1 word(s) */ static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) { - a += JHASH_GOLDEN_RATIO; - b += JHASH_GOLDEN_RATIO; + a += JHASH_INITVAL; + b += JHASH_INITVAL; c += initval; - __jhash_mix(a, b, c); + __jhash_final(a, b, c); return c; } -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] New jhash function 2010-12-03 12:38 ` [PATCH 0/2] New jhash function Jozsef Kadlecsik 2010-12-03 12:39 ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik @ 2010-12-08 17:09 ` David Miller 2010-12-08 21:23 ` Rusty Russell 1 sibling, 1 reply; 18+ messages in thread From: David Miller @ 2010-12-08 17:09 UTC (permalink / raw) To: kadlec; +Cc: linux-kernel, netdev, netfilter-devel, torvalds, rusty From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Date: Fri, 3 Dec 2010 13:38:59 +0100 > The current jhash.h implements the lookup2() hash function by Bob Jenkins. > However, lookup2() is outdated as Bob wrote a new hash function called > lookup3(). There is a longer comparison of those two and other hash > functions at http://burtleburtle.net/bob/hash/doobs.html. > > Please consider applying the following patches. Patch #1 is already in the net-next-2.6 tree, and as long as there are no major objections to the general crowd (including Rusty et al.) I am happy to put patch #2 into my tree as well. Rusty, does the current version of patch #2 look good to you? Thanks! ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] New jhash function 2010-12-08 17:09 ` [PATCH 0/2] New jhash function David Miller @ 2010-12-08 21:23 ` Rusty Russell 2010-12-10 4:19 ` David Miller 0 siblings, 1 reply; 18+ messages in thread From: Rusty Russell @ 2010-12-08 21:23 UTC (permalink / raw) To: David Miller; +Cc: kadlec, linux-kernel, netdev, netfilter-devel, torvalds On Thu, 9 Dec 2010 03:39:54 am David Miller wrote: > From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> > Date: Fri, 3 Dec 2010 13:38:59 +0100 > > > The current jhash.h implements the lookup2() hash function by Bob Jenkins. > > However, lookup2() is outdated as Bob wrote a new hash function called > > lookup3(). There is a longer comparison of those two and other hash > > functions at http://burtleburtle.net/bob/hash/doobs.html. > > > > Please consider applying the following patches. > > Patch #1 is already in the net-next-2.6 tree, and as long as there are > no major objections to the general crowd (including Rusty et al.) I am > happy to put patch #2 into my tree as well. > > Rusty, does the current version of patch #2 look good to you? Yes, 2/2 good. Thanks Jozsef! Acked-by: Rusty Russell <rusty@rustcorp.com.au> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/2] New jhash function 2010-12-08 21:23 ` Rusty Russell @ 2010-12-10 4:19 ` David Miller 0 siblings, 0 replies; 18+ messages in thread From: David Miller @ 2010-12-10 4:19 UTC (permalink / raw) To: rusty; +Cc: kadlec, linux-kernel, netdev, netfilter-devel, torvalds From: Rusty Russell <rusty@rustcorp.com.au> Date: Thu, 9 Dec 2010 07:53:45 +1030 > On Thu, 9 Dec 2010 03:39:54 am David Miller wrote: >> From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> >> Date: Fri, 3 Dec 2010 13:38:59 +0100 >> >> > The current jhash.h implements the lookup2() hash function by Bob Jenkins. >> > However, lookup2() is outdated as Bob wrote a new hash function called >> > lookup3(). There is a longer comparison of those two and other hash >> > functions at http://burtleburtle.net/bob/hash/doobs.html. >> > >> > Please consider applying the following patches. >> >> Patch #1 is already in the net-next-2.6 tree, and as long as there are >> no major objections to the general crowd (including Rusty et al.) I am >> happy to put patch #2 into my tree as well. >> >> Rusty, does the current version of patch #2 look good to you? > > Yes, 2/2 good. Thanks Jozsef! > > Acked-by: Rusty Russell <rusty@rustcorp.com.au> Applied, thanks everyone. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Prepare the tree for un-inlined jhash. 2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik 2010-11-25 13:15 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik @ 2010-11-25 13:39 ` Jan Engelhardt 2010-11-25 13:57 ` Jozsef Kadlecsik 1 sibling, 1 reply; 18+ messages in thread From: Jan Engelhardt @ 2010-11-25 13:39 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote: >jhash is widely used in the kernel and because the functions >are inlined, the cost in size is significant. Also, the new jhash >functions are slightly larger than the previous ones so better un-inline. >As a preparation step, the calls to the internal macros are replaced >with the plain jhash function calls. Do you have a non-normative allyesconfig/allmodconfig build whose size(1) you can run on, to show approximately just how much it differs? thanks, Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] Prepare the tree for un-inlined jhash. 2010-11-25 13:39 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt @ 2010-11-25 13:57 ` Jozsef Kadlecsik 0 siblings, 0 replies; 18+ messages in thread From: Jozsef Kadlecsik @ 2010-11-25 13:57 UTC (permalink / raw) To: Jan Engelhardt Cc: linux-kernel, netdev, netfilter-devel, Linus Torvalds, Rusty Russell On Thu, 25 Nov 2010, Jan Engelhardt wrote: > On Thursday 2010-11-25 14:15, Jozsef Kadlecsik wrote: > > >jhash is widely used in the kernel and because the functions > >are inlined, the cost in size is significant. Also, the new jhash > >functions are slightly larger than the previous ones so better un-inline. > >As a preparation step, the calls to the internal macros are replaced > >with the plain jhash function calls. > > Do you have a non-normative allyesconfig/allmodconfig build whose > size(1) you can run on, to show approximately just how much it differs? In the cover mail I referred the link to the message from Ilpo Jarvinen: "I once looked into inlining cost and jhash functions were among the most wasteful (kernel-wide). Multiple jhash bodies were 100+ bytes, and the overall cost was 10k+." Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-12-10 4:19 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-25 13:15 [PATCH 0/2] New jhash function Jozsef Kadlecsik 2010-11-25 13:15 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jozsef Kadlecsik 2010-11-25 13:15 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik 2010-11-25 13:49 ` Eric Dumazet 2010-11-25 13:55 ` Changli Gao 2010-11-25 14:05 ` Eric Dumazet 2010-11-25 14:41 ` Jozsef Kadlecsik 2010-11-25 17:21 ` Eric Dumazet 2010-11-25 21:14 ` Jozsef Kadlecsik 2010-11-27 3:31 ` Rusty Russell 2010-12-03 12:38 ` [PATCH 0/2] New jhash function Jozsef Kadlecsik 2010-12-03 12:39 ` [PATCH 1/2] Remove calls to jhash internals Jozsef Kadlecsik 2010-12-03 12:39 ` [PATCH 2/2] The new jhash implementation Jozsef Kadlecsik 2010-12-08 17:09 ` [PATCH 0/2] New jhash function David Miller 2010-12-08 21:23 ` Rusty Russell 2010-12-10 4:19 ` David Miller 2010-11-25 13:39 ` [PATCH 1/2] Prepare the tree for un-inlined jhash Jan Engelhardt 2010-11-25 13:57 ` Jozsef Kadlecsik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).