netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] [PATCH] Improve hash function used for full_name_hash()
@ 2010-01-04 20:09 Guenter Roeck
  2010-01-04 20:29 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:09 UTC (permalink / raw)
  To: netdev

Please comment on this proposed patch. It is similar but more generic than 
a previously proposed change to dev_name_hash() which tried to address 
the same problem.

The hash function currently used for full_name_hash() produces a large number
of collisions if hashed names are similar. This can cause performance problems
if a large number of similar names exist in the kernel (e.g., if there is
a large number of virtual interfaces).

For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
the resulting minimum hash bucket depth is 0, the maximum depth is 563,
and the standard deviation is ~136.

With this patch applied, the same test results in a minimum bucket depth
of 37, a maximum bucket depth of 42, and a standard deviation of ~1.02.

The hash factor of 41 was chosen for the following reasons:
- The resulting standard deviation is significantly better than the standard
  deviation of the original hash function for all tested hash table sizes
  (2^x, x=4..16).
- The hash function is simple.
- The resulting code does not require a multiply instruction
  (tested: x86, mips, powerpc).
- The resulting code is more efficient than the code generated for the
  original hash (x86, gcc -O2: 3 instead of 7 instructions).
- The resulting code also works well with more random strings
  (tested with all file names in a given Linux system).

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
---
 include/linux/dcache.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 30b93b2..772755d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -53,7 +53,7 @@ extern struct dentry_stat_t dentry_stat;
 static inline unsigned long
 partial_name_hash(unsigned long c, unsigned long prevhash)
 {
-	return (prevhash + (c << 4) + (c >> 4)) * 11;
+	return (prevhash + c) * 41;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
@ 2010-01-04 20:29 ` Stephen Hemminger
  2010-01-04 20:47   ` David Miller
  2010-01-04 20:51   ` Guenter Roeck
  2010-01-04 20:44 ` Eric Dumazet
  2010-01-04 20:46 ` David Miller
  2 siblings, 2 replies; 8+ messages in thread
From: Stephen Hemminger @ 2010-01-04 20:29 UTC (permalink / raw)
  To: guenter.roeck; +Cc: netdev

On Mon, 04 Jan 2010 12:09:44 -0800
Guenter Roeck <guenter.roeck@ericsson.com> wrote:

> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.

The problem was a missing call to hash_32() on the result, and is
already fixed in current kernel.

-- 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
  2010-01-04 20:29 ` Stephen Hemminger
@ 2010-01-04 20:44 ` Eric Dumazet
  2010-01-04 20:51   ` David Miller
  2010-01-04 20:53   ` Guenter Roeck
  2010-01-04 20:46 ` David Miller
  2 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-01-04 20:44 UTC (permalink / raw)
  To: guenter.roeck; +Cc: netdev

Le 04/01/2010 21:09, Guenter Roeck a écrit :
> Please comment on this proposed patch. It is similar but more generic than 
> a previously proposed change to dev_name_hash() which tried to address 
> the same problem.
> 
> The hash function currently used for full_name_hash() produces a large number
> of collisions if hashed names are similar. This can cause performance problems
> if a large number of similar names exist in the kernel (e.g., if there is
> a large number of virtual interfaces).
> 
> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.
> 

I would be very surprised, since we worked quite a lot on this subject some months ago...

Which tree are you using ?

This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
(netdev: fold name hash properly (v3))
Date:   Tue Nov 10 07:20:34 2009 +0000

Here is actual hash distribution for (eth0 -> eth9999) 

37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36

This seems good enough.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
  2010-01-04 20:29 ` Stephen Hemminger
  2010-01-04 20:44 ` Eric Dumazet
@ 2010-01-04 20:46 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:46 UTC (permalink / raw)
  To: guenter.roeck; +Cc: netdev, linux-kernel

From: Guenter Roeck <guenter.roeck@ericsson.com>
Date: Mon, 04 Jan 2010 12:09:44 -0800

> Please comment on this proposed patch. It is similar but more generic than 
> a previously proposed change to dev_name_hash() which tried to address 
> the same problem.

Since this changes a filesystem subsystem header file and effects
parts of the kernel outside of networking, you really do need to
CC: linux-kernel at a minimum.  Added...

> The hash function currently used for full_name_hash() produces a large number
> of collisions if hashed names are similar. This can cause performance problems
> if a large number of similar names exist in the kernel (e.g., if there is
> a large number of virtual interfaces).
> 
> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.
> 
> With this patch applied, the same test results in a minimum bucket depth
> of 37, a maximum bucket depth of 42, and a standard deviation of ~1.02.
> 
> The hash factor of 41 was chosen for the following reasons:
> - The resulting standard deviation is significantly better than the standard
>   deviation of the original hash function for all tested hash table sizes
>   (2^x, x=4..16).
> - The hash function is simple.
> - The resulting code does not require a multiply instruction
>   (tested: x86, mips, powerpc).
> - The resulting code is more efficient than the code generated for the
>   original hash (x86, gcc -O2: 3 instead of 7 instructions).
> - The resulting code also works well with more random strings
>   (tested with all file names in a given Linux system).
> 
> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> ---
>  include/linux/dcache.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 30b93b2..772755d 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -53,7 +53,7 @@ extern struct dentry_stat_t dentry_stat;
>  static inline unsigned long
>  partial_name_hash(unsigned long c, unsigned long prevhash)
>  {
> -	return (prevhash + (c << 4) + (c >> 4)) * 11;
> +	return (prevhash + c) * 41;
>  }
>  
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:29 ` Stephen Hemminger
@ 2010-01-04 20:47   ` David Miller
  2010-01-04 20:51   ` Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:47 UTC (permalink / raw)
  To: shemminger; +Cc: guenter.roeck, netdev, linux-kernel

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 4 Jan 2010 12:29:12 -0800

> On Mon, 04 Jan 2010 12:09:44 -0800
> Guenter Roeck <guenter.roeck@ericsson.com> wrote:
> 
>> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
>> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
>> and the standard deviation is ~136.
> 
> The problem was a missing call to hash_32() on the result, and is
> already fixed in current kernel.

Added linux-kernel CC: to followup...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:29 ` Stephen Hemminger
  2010-01-04 20:47   ` David Miller
@ 2010-01-04 20:51   ` Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org

Ok, thanks.

Guenter

On Mon, 2010-01-04 at 15:29 -0500, Stephen Hemminger wrote:
> On Mon, 04 Jan 2010 12:09:44 -0800
> Guenter Roeck <guenter.roeck@ericsson.com> wrote:
> 
> > For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> > the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> > and the standard deviation is ~136.
> 
> The problem was a missing call to hash_32() on the result, and is
> already fixed in current kernel.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:44 ` Eric Dumazet
@ 2010-01-04 20:51   ` David Miller
  2010-01-04 20:53   ` Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: guenter.roeck, netdev, linux-kernel

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 04 Jan 2010 21:44:10 +0100

> Le 04/01/2010 21:09, Guenter Roeck a écrit :
>> Please comment on this proposed patch. It is similar but more generic than 
>> a previously proposed change to dev_name_hash() which tried to address 
>> the same problem.
>> 
>> The hash function currently used for full_name_hash() produces a large number
>> of collisions if hashed names are similar. This can cause performance problems
>> if a large number of similar names exist in the kernel (e.g., if there is
>> a large number of virtual interfaces).
>> 
>> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
>> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
>> and the standard deviation is ~136.
>> 

Adding linux-kernel to CC: for followups...

> I would be very surprised, since we worked quite a lot on this subject some months ago...
> 
> Which tree are you using ?
> 
> This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
> (netdev: fold name hash properly (v3))
> Date:   Tue Nov 10 07:20:34 2009 +0000
> 
> Here is actual hash distribution for (eth0 -> eth9999) 
> 
> 37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
> 51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
> 29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
> 43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
> 46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
> 30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
> 44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
> 37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
> 35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
> 56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
> 30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
> 47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
> 43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
> 29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
> 46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
> 35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36
> 
> This seems good enough.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
  2010-01-04 20:44 ` Eric Dumazet
  2010-01-04 20:51   ` David Miller
@ 2010-01-04 20:53   ` Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev@vger.kernel.org

Never mind - my kernel had missed the commit below. I agree, this is now
good enough.

Thanks,
Guenter

On Mon, 2010-01-04 at 15:44 -0500, Eric Dumazet wrote:
> Le 04/01/2010 21:09, Guenter Roeck a écrit :
> > Please comment on this proposed patch. It is similar but more generic than 
> > a previously proposed change to dev_name_hash() which tried to address 
> > the same problem.
> > 
> > The hash function currently used for full_name_hash() produces a large number
> > of collisions if hashed names are similar. This can cause performance problems
> > if a large number of similar names exist in the kernel (e.g., if there is
> > a large number of virtual interfaces).
> > 
> > For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> > the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> > and the standard deviation is ~136.
> > 
> 
> I would be very surprised, since we worked quite a lot on this subject some months ago...
> 
> Which tree are you using ?
> 
> This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
> (netdev: fold name hash properly (v3))
> Date:   Tue Nov 10 07:20:34 2009 +0000
> 
> Here is actual hash distribution for (eth0 -> eth9999) 
> 
> 37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
> 51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
> 29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
> 43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
> 46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
> 30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
> 44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
> 37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
> 35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
> 56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
> 30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
> 47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
> 43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
> 29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
> 46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
> 35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36
> 
> This seems good enough.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-01-04 20:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
2010-01-04 20:29 ` Stephen Hemminger
2010-01-04 20:47   ` David Miller
2010-01-04 20:51   ` Guenter Roeck
2010-01-04 20:44 ` Eric Dumazet
2010-01-04 20:51   ` David Miller
2010-01-04 20:53   ` Guenter Roeck
2010-01-04 20:46 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).