From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: [RFC] [PATCH] Improve hash function used for full_name_hash() Date: Mon, 04 Jan 2010 12:46:12 -0800 (PST) Message-ID: <20100104.124612.158883690.davem@davemloft.net> References: <1262635784.8178.253.camel@groeck-laptop> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org To: guenter.roeck@ericsson.com Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:48337 "EHLO sunset.davemloft.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750770Ab0ADUqJ (ORCPT ); Mon, 4 Jan 2010 15:46:09 -0500 In-Reply-To: <1262635784.8178.253.camel@groeck-laptop> Sender: netdev-owner@vger.kernel.org List-ID: From: Guenter Roeck Date: Mon, 04 Jan 2010 12:09:44 -0800 > Please comment on this proposed patch. It is similar but more generic than > a previously proposed change to dev_name_hash() which tried to address > the same problem. Since this changes a filesystem subsystem header file and effects parts of the kernel outside of networking, you really do need to CC: linux-kernel at a minimum. Added... > The hash function currently used for full_name_hash() produces a large number > of collisions if hashed names are similar. This can cause performance problems > if a large number of similar names exist in the kernel (e.g., if there is > a large number of virtual interfaces). > > For example, when hashing "eth0" .. "eth9999" with a hash table size of 256, > the resulting minimum hash bucket depth is 0, the maximum depth is 563, > and the standard deviation is ~136. > > With this patch applied, the same test results in a minimum bucket depth > of 37, a maximum bucket depth of 42, and a standard deviation of ~1.02. > > The hash factor of 41 was chosen for the following reasons: > - The resulting standard deviation is significantly better than the standard > deviation of the original hash function for all tested hash table sizes > (2^x, x=4..16). > - The hash function is simple. > - The resulting code does not require a multiply instruction > (tested: x86, mips, powerpc). > - The resulting code is more efficient than the code generated for the > original hash (x86, gcc -O2: 3 instead of 7 instructions). > - The resulting code also works well with more random strings > (tested with all file names in a given Linux system). > > Signed-off-by: Guenter Roeck > --- > include/linux/dcache.h | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/include/linux/dcache.h b/include/linux/dcache.h > index 30b93b2..772755d 100644 > --- a/include/linux/dcache.h > +++ b/include/linux/dcache.h > @@ -53,7 +53,7 @@ extern struct dentry_stat_t dentry_stat; > static inline unsigned long > partial_name_hash(unsigned long c, unsigned long prevhash) > { > - return (prevhash + (c << 4) + (c >> 4)) * 11; > + return (prevhash + c) * 41; > } > > /* > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html