* [RFC] [PATCH] Improve hash function used for full_name_hash()
@ 2010-01-04 20:09 Guenter Roeck
2010-01-04 20:29 ` Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:09 UTC (permalink / raw)
To: netdev
Please comment on this proposed patch. It is similar but more generic than
a previously proposed change to dev_name_hash() which tried to address
the same problem.
The hash function currently used for full_name_hash() produces a large number
of collisions if hashed names are similar. This can cause performance problems
if a large number of similar names exist in the kernel (e.g., if there is
a large number of virtual interfaces).
For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
the resulting minimum hash bucket depth is 0, the maximum depth is 563,
and the standard deviation is ~136.
With this patch applied, the same test results in a minimum bucket depth
of 37, a maximum bucket depth of 42, and a standard deviation of ~1.02.
The hash factor of 41 was chosen for the following reasons:
- The resulting standard deviation is significantly better than the standard
deviation of the original hash function for all tested hash table sizes
(2^x, x=4..16).
- The hash function is simple.
- The resulting code does not require a multiply instruction
(tested: x86, mips, powerpc).
- The resulting code is more efficient than the code generated for the
original hash (x86, gcc -O2: 3 instead of 7 instructions).
- The resulting code also works well with more random strings
(tested with all file names in a given Linux system).
Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
---
include/linux/dcache.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 30b93b2..772755d 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -53,7 +53,7 @@ extern struct dentry_stat_t dentry_stat;
static inline unsigned long
partial_name_hash(unsigned long c, unsigned long prevhash)
{
- return (prevhash + (c << 4) + (c >> 4)) * 11;
+ return (prevhash + c) * 41;
}
/*
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
@ 2010-01-04 20:29 ` Stephen Hemminger
2010-01-04 20:47 ` David Miller
2010-01-04 20:51 ` Guenter Roeck
2010-01-04 20:44 ` Eric Dumazet
2010-01-04 20:46 ` David Miller
2 siblings, 2 replies; 8+ messages in thread
From: Stephen Hemminger @ 2010-01-04 20:29 UTC (permalink / raw)
To: guenter.roeck; +Cc: netdev
On Mon, 04 Jan 2010 12:09:44 -0800
Guenter Roeck <guenter.roeck@ericsson.com> wrote:
> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.
The problem was a missing call to hash_32() on the result, and is
already fixed in current kernel.
--
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
2010-01-04 20:29 ` Stephen Hemminger
@ 2010-01-04 20:44 ` Eric Dumazet
2010-01-04 20:51 ` David Miller
2010-01-04 20:53 ` Guenter Roeck
2010-01-04 20:46 ` David Miller
2 siblings, 2 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-01-04 20:44 UTC (permalink / raw)
To: guenter.roeck; +Cc: netdev
Le 04/01/2010 21:09, Guenter Roeck a écrit :
> Please comment on this proposed patch. It is similar but more generic than
> a previously proposed change to dev_name_hash() which tried to address
> the same problem.
>
> The hash function currently used for full_name_hash() produces a large number
> of collisions if hashed names are similar. This can cause performance problems
> if a large number of similar names exist in the kernel (e.g., if there is
> a large number of virtual interfaces).
>
> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.
>
I would be very surprised, since we worked quite a lot on this subject some months ago...
Which tree are you using ?
This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
(netdev: fold name hash properly (v3))
Date: Tue Nov 10 07:20:34 2009 +0000
Here is actual hash distribution for (eth0 -> eth9999)
37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36
This seems good enough.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
2010-01-04 20:29 ` Stephen Hemminger
2010-01-04 20:44 ` Eric Dumazet
@ 2010-01-04 20:46 ` David Miller
2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:46 UTC (permalink / raw)
To: guenter.roeck; +Cc: netdev, linux-kernel
From: Guenter Roeck <guenter.roeck@ericsson.com>
Date: Mon, 04 Jan 2010 12:09:44 -0800
> Please comment on this proposed patch. It is similar but more generic than
> a previously proposed change to dev_name_hash() which tried to address
> the same problem.
Since this changes a filesystem subsystem header file and effects
parts of the kernel outside of networking, you really do need to
CC: linux-kernel at a minimum. Added...
> The hash function currently used for full_name_hash() produces a large number
> of collisions if hashed names are similar. This can cause performance problems
> if a large number of similar names exist in the kernel (e.g., if there is
> a large number of virtual interfaces).
>
> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> and the standard deviation is ~136.
>
> With this patch applied, the same test results in a minimum bucket depth
> of 37, a maximum bucket depth of 42, and a standard deviation of ~1.02.
>
> The hash factor of 41 was chosen for the following reasons:
> - The resulting standard deviation is significantly better than the standard
> deviation of the original hash function for all tested hash table sizes
> (2^x, x=4..16).
> - The hash function is simple.
> - The resulting code does not require a multiply instruction
> (tested: x86, mips, powerpc).
> - The resulting code is more efficient than the code generated for the
> original hash (x86, gcc -O2: 3 instead of 7 instructions).
> - The resulting code also works well with more random strings
> (tested with all file names in a given Linux system).
>
> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
> ---
> include/linux/dcache.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 30b93b2..772755d 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -53,7 +53,7 @@ extern struct dentry_stat_t dentry_stat;
> static inline unsigned long
> partial_name_hash(unsigned long c, unsigned long prevhash)
> {
> - return (prevhash + (c << 4) + (c >> 4)) * 11;
> + return (prevhash + c) * 41;
> }
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:29 ` Stephen Hemminger
@ 2010-01-04 20:47 ` David Miller
2010-01-04 20:51 ` Guenter Roeck
1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:47 UTC (permalink / raw)
To: shemminger; +Cc: guenter.roeck, netdev, linux-kernel
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 4 Jan 2010 12:29:12 -0800
> On Mon, 04 Jan 2010 12:09:44 -0800
> Guenter Roeck <guenter.roeck@ericsson.com> wrote:
>
>> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
>> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
>> and the standard deviation is ~136.
>
> The problem was a missing call to hash_32() on the result, and is
> already fixed in current kernel.
Added linux-kernel CC: to followup...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:29 ` Stephen Hemminger
2010-01-04 20:47 ` David Miller
@ 2010-01-04 20:51 ` Guenter Roeck
1 sibling, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:51 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
Ok, thanks.
Guenter
On Mon, 2010-01-04 at 15:29 -0500, Stephen Hemminger wrote:
> On Mon, 04 Jan 2010 12:09:44 -0800
> Guenter Roeck <guenter.roeck@ericsson.com> wrote:
>
> > For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> > the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> > and the standard deviation is ~136.
>
> The problem was a missing call to hash_32() on the result, and is
> already fixed in current kernel.
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:44 ` Eric Dumazet
@ 2010-01-04 20:51 ` David Miller
2010-01-04 20:53 ` Guenter Roeck
1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2010-01-04 20:51 UTC (permalink / raw)
To: eric.dumazet; +Cc: guenter.roeck, netdev, linux-kernel
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 04 Jan 2010 21:44:10 +0100
> Le 04/01/2010 21:09, Guenter Roeck a écrit :
>> Please comment on this proposed patch. It is similar but more generic than
>> a previously proposed change to dev_name_hash() which tried to address
>> the same problem.
>>
>> The hash function currently used for full_name_hash() produces a large number
>> of collisions if hashed names are similar. This can cause performance problems
>> if a large number of similar names exist in the kernel (e.g., if there is
>> a large number of virtual interfaces).
>>
>> For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
>> the resulting minimum hash bucket depth is 0, the maximum depth is 563,
>> and the standard deviation is ~136.
>>
Adding linux-kernel to CC: for followups...
> I would be very surprised, since we worked quite a lot on this subject some months ago...
>
> Which tree are you using ?
>
> This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
> (netdev: fold name hash properly (v3))
> Date: Tue Nov 10 07:20:34 2009 +0000
>
> Here is actual hash distribution for (eth0 -> eth9999)
>
> 37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
> 51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
> 29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
> 43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
> 46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
> 30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
> 44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
> 37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
> 35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
> 56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
> 30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
> 47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
> 43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
> 29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
> 46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
> 35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36
>
> This seems good enough.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC] [PATCH] Improve hash function used for full_name_hash()
2010-01-04 20:44 ` Eric Dumazet
2010-01-04 20:51 ` David Miller
@ 2010-01-04 20:53 ` Guenter Roeck
1 sibling, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2010-01-04 20:53 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev@vger.kernel.org
Never mind - my kernel had missed the commit below. I agree, this is now
good enough.
Thanks,
Guenter
On Mon, 2010-01-04 at 15:44 -0500, Eric Dumazet wrote:
> Le 04/01/2010 21:09, Guenter Roeck a écrit :
> > Please comment on this proposed patch. It is similar but more generic than
> > a previously proposed change to dev_name_hash() which tried to address
> > the same problem.
> >
> > The hash function currently used for full_name_hash() produces a large number
> > of collisions if hashed names are similar. This can cause performance problems
> > if a large number of similar names exist in the kernel (e.g., if there is
> > a large number of virtual interfaces).
> >
> > For example, when hashing "eth0" .. "eth9999" with a hash table size of 256,
> > the resulting minimum hash bucket depth is 0, the maximum depth is 563,
> > and the standard deviation is ~136.
> >
>
> I would be very surprised, since we worked quite a lot on this subject some months ago...
>
> Which tree are you using ?
>
> This is not true since commit 08e9897d512fe7a67e46209543b3815b57a36dc7
> (netdev: fold name hash properly (v3))
> Date: Tue Nov 10 07:20:34 2009 +0000
>
> Here is actual hash distribution for (eth0 -> eth9999)
>
> 37 37 36 49 43 36 36 44 35 27 36 45 38 42 52 49
> 51 40 52 43 33 29 41 42 40 47 51 51 47 47 46 41
> 29 34 41 43 41 46 51 52 46 43 48 36 30 34 46 39
> 43 49 53 51 42 51 41 33 28 44 42 38 44 54 51 45
> 46 48 39 29 36 35 33 33 44 43 40 37 45 37 28 23
> 30 36 30 44 42 43 44 37 40 35 26 28 34 35 34 37
> 44 43 44 37 41 33 21 27 36 33 33 34 35 39 27 34
> 37 28 25 32 36 31 40 43 43 45 39 40 33 24 26 34
> 35 39 38 42 45 41 36 39 36 31 33 43 42 44 49 49
> 56 42 46 43 36 28 40 44 39 43 50 54 46 43 48 43
> 30 32 42 42 42 47 53 51 46 44 48 35 30 36 44 37
> 47 49 49 53 41 49 42 35 29 40 42 43 43 51 52 49
> 43 47 43 28 32 35 37 34 38 44 44 34 42 43 27 21
> 29 39 31 40 43 47 40 36 43 34 25 27 36 34 34 36
> 46 44 42 39 41 32 22 27 35 35 32 35 35 39 27 33
> 35 31 24 31 38 31 36 41 47 46 35 41 37 23 23 36
>
> This seems good enough.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-01-04 20:52 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-04 20:09 [RFC] [PATCH] Improve hash function used for full_name_hash() Guenter Roeck
2010-01-04 20:29 ` Stephen Hemminger
2010-01-04 20:47 ` David Miller
2010-01-04 20:51 ` Guenter Roeck
2010-01-04 20:44 ` Eric Dumazet
2010-01-04 20:51 ` David Miller
2010-01-04 20:53 ` Guenter Roeck
2010-01-04 20:46 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).