From: Linus Torvalds <torvalds@linux-foundation.org>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Stephen Hemminger <stephen.hemminger@vyatta.com>,
Andrew Morton <akpm@linux-foundation.org>,
Octavian Purdila <opurdila@ixiacom.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH] dcache: better name hash function
Date: Tue, 27 Oct 2009 10:32:44 -0700 (PDT) [thread overview]
Message-ID: <alpine.LFD.2.01.0910271017170.31845@localhost.localdomain> (raw)
In-Reply-To: <20091027100736.5303f1ab@nehalam>
On Tue, 27 Oct 2009, Stephen Hemminger wrote:
>
> Rather than wasting space, or doing expensive, modulus; just folding
> the higher bits back with XOR redistributes the bits better.
Please don't make up any new hash functions without having a better input
set than the one you seem to use.
The 'fnv' function I can believe in, because the whole "multiply by big
prime number" thing to spread out the bits is a very traditional model.
But making up a new hash function based on essentially consecutive names
is absolutely the wrong thing to do. You need a much better corpus of path
component names for testing.
> The following seems to give best results (combination of 16bit trick
> and string17).
.. and these kinds of games are likely to work badly on some
architectures. Don't use 16-bit values, and don't use 'get_unaligned()'.
Both tend to work fine on x86, but likely suck on some other
architectures.
Also remember that the critical hash function needs to check for '/' and
'\0' while at it, which is one reason why it does things byte-at-a-time.
If you try to be smart, you'd need to be smart about the end condition
too.
The loop to optimize is _not_ based on 'name+len', it is this code:
this.name = name;
c = *(const unsigned char *)name;
hash = init_name_hash();
do {
name++;
hash = partial_name_hash(c, hash);
c = *(const unsigned char *)name;
} while (c && (c != '/'));
this.len = name - (const char *) this.name;
this.hash = end_name_hash(hash);
(which depends on us having already removed all slashed at the head, and
knowing that the string is not zero-sized)
So doing things multiple bytes at a time is certainly still possible, but
you would always have to find the slashes/NUL's in there first. Doing that
efficiently and portably is not trivial - especially since a lot of
critical path components are short.
(Remember: there may be just a few 'bin' directory names, but if you do
performance analysis, 'bin' as a path component is probably hashed a lot
more than 'five_slutty_bimbos_and_a_donkey.jpg'. So the relative weighting
of importance of the filename should probably include the frequency it
shows up in pathname lookup)
Linus
next prev parent reply other threads:[~2009-10-27 17:32 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9986527.24561256620662709.JavaMail.root@tahiti.vyatta.com>
2009-10-27 5:19 ` [PATCH] dcache: better name hash function Stephen Hemminger
2009-10-27 5:24 ` David Miller
2009-10-27 17:22 ` [PATCH] net: fold network name hash Stephen Hemminger
2009-10-27 18:02 ` Octavian Purdila
2009-10-27 22:04 ` [PATCH] net: fold network name hash (v2) Stephen Hemminger
2009-10-28 6:07 ` Eric Dumazet
2009-10-28 9:28 ` David Miller
2009-10-28 15:57 ` Stephen Hemminger
2009-10-27 6:07 ` [PATCH] dcache: better name hash function Eric Dumazet
2009-10-27 6:50 ` Eric Dumazet
2009-10-27 7:29 ` Eric Dumazet
2009-10-27 17:07 ` Stephen Hemminger
2009-10-27 17:32 ` Linus Torvalds [this message]
2009-10-27 23:08 ` Stephen Hemminger
2009-10-27 23:41 ` Linus Torvalds
2009-10-28 0:10 ` Stephen Hemminger
2009-10-28 0:58 ` Linus Torvalds
2009-10-28 1:56 ` Stephen Hemminger
[not found] ` <4AE72B91.7040700@gmail.com>
2009-10-27 17:35 ` Stephen Hemminger
2009-10-25 19:58 [PATCH next-next-2.6] netdev: better dev_name_hash Octavian Purdila
2009-10-26 4:43 ` Stephen Hemminger
2009-10-26 22:36 ` [PATCH] dcache: better name hash function Stephen Hemminger <shemminger@vyatta.com>, Al Viro
2009-10-27 2:45 ` Eric Dumazet
2009-10-27 3:53 ` Stephen Hemminger
2009-10-27 16:38 ` Rick Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.01.0910271017170.31845@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=akpm@linux-foundation.org \
--cc=eric.dumazet@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=opurdila@ixiacom.com \
--cc=shemminger@vyatta.com \
--cc=stephen.hemminger@vyatta.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).