From: Daniel Phillips <phillips@innominate.de>
To: Linus Torvalds <torvalds@transmeta.com>, linux-kernel@vger.kernel.org
Subject: Re: [rfc] Near-constant time directory index for Ext2
Date: Thu, 22 Feb 2001 23:30:51 +0100 [thread overview]
Message-ID: <3A95931B.A70DCB8A@innominate.de> (raw)
In-Reply-To: <3A947953.B88A23E2@innominate.de> <Pine.LNX.4.10.10102211929070.1129-100000@penguin.transmeta.com>
Linus Torvalds wrote:
>
> On Thu, 22 Feb 2001, Daniel Phillips wrote:
> >
> > In the first heat of hash races - creating 20,000 files in one directory
> > - dentry::hash lost out to my original hack::dx_hash, causing a high
> > percentage of leaf blocks to remain exactly half full and slowing down
> > the whole thing by about 5%. (This was under uml - I haven't tried it
> > native yet but I expect the results to be similar.)
> >
> > Contender Result
> > ========= ======
> > dentry::hash Average fullness = 2352 (57%)
> > hack::dx_hash Average fullness = 2758 (67%)
> >
> > This suggests that dentry::hash is producing distinctly non-dispersed
> > results and needs to be subjected to further scrutiny. I'll run the
> > next heat of hash races tomorrow, probably with R5, and CRC32 too if I
> > have time.
>
> I'd love to hear the results from R5, as that seems to be the reiserfs
> favourite, and I'm trying it out in 2.4.2 because it was so easy to plug
> in..
In this round there were two new contenders:
- ReiserFS's R5
- Bob Jenkins' hash
Eirik Fuller pointed me to the latter, the subject of a very interesting
article in Dr. Dobbs, available online here:
http://burtleburtle.net/bob/hash/doobs.html
As before, the runs are for 20,000 creates and I report only the
fullness, because I'm still running these under UML. Suffice to say
that the total running time is roughly related to the average fullness
with a variance of about 15% from the best to the worst. Eventually I
will rerun the entire series of tests natively and provide more detailed
statistics. Here are the results from the second heat of hash races:
Contender Result
========= ======
dentry::hash Average fullness = 2352 (57%)
daniel::hack_hash Average fullness = 2758 (67%)
bob::hash Average fullness = 2539
(61%)
reiserfs::r5 Average fullness = 2064 (50%)
Just looking at R5 I knew it wasn't going to do well in this application
because it's similar to a number of hash functions I tried with the same
idea in mind: to place similar names together in the same leaf block.
That turned out to be not very important compared to achieving a
relatively high fullness of leaf blocks. The problem with R5 when used
with my htree is, it doesn't give very uniform dispersal But according
to Chris Mason (see his post) it does work very well for ReiserFS. This
provides a little more evidence that my htree scheme is a quite
different from other approaches.
u32 r5_hash (const char *msg, int len)
{
u32 a=0;
while(*msg) {
a += *msg << 4;
a += *msg >> 4;
a *= 11;
msg++;
}
return a;
}
I expected more from bob::hash since it's very carefully well-thought
out in terms of dispersal and avoidance of 'funnelling' (the property
that determines the probabililty collision), but it still fell short of
hack_hash's performance. Oh well. Tomorrow I'll try CRC32\x13.
The bottom line: dx_hack_hash is still the reigning champion. OK, come
out and take a bow:
unsigned dx_hack_hash (const char *name, int len)
{
u32 hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
while (len--)
{
u32 hash = hash1 + (hash0 ^ (*name++ * 71523));
if (hash < 0) hash -= 0x7fffffff;
hash1 = hash0;
hash0 = hash;
}
return hash0;
}
--
Daniel
next prev parent reply other threads:[~2001-02-22 22:32 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-02-20 15:04 [rfc] Near-constant time directory index for Ext2 Daniel Phillips
2001-02-20 20:03 ` Linus Torvalds
2001-02-20 21:08 ` Jeremy Jackson
2001-02-20 21:20 ` Mike Dresser
2001-02-20 22:36 ` Jeremy Jackson
2001-02-20 23:08 ` Daniel Phillips
2001-02-21 1:04 ` Bernd Eckenfels
2001-02-21 16:38 ` Daniel Phillips
2001-02-20 22:58 ` Jonathan Morton
2001-02-20 21:41 ` Daniel Phillips
2001-02-21 0:22 ` Linus Torvalds
2001-02-21 0:30 ` Alan Cox
2001-02-21 2:35 ` Ed Tomlinson
2001-02-21 23:13 ` Linus Torvalds
2001-02-21 23:34 ` Davide Libenzi
2001-02-21 23:59 ` Linus Torvalds
2001-02-21 23:57 ` H. Peter Anvin
2001-02-22 0:35 ` Ed Tomlinson
2001-02-21 1:01 ` Andreas Dilger
2001-02-22 2:28 ` Daniel Phillips
2001-02-22 3:30 ` Linus Torvalds
2001-02-22 16:33 ` Chris Mason
2001-02-22 22:30 ` Daniel Phillips [this message]
2001-02-21 17:21 ` Davide Libenzi
2001-02-21 21:08 ` Martin Mares
2001-02-21 21:29 ` Davide Libenzi
2001-02-21 21:32 ` Martin Mares
2001-02-21 21:59 ` Davide Libenzi
2001-02-21 22:26 ` Martin Mares
2001-02-21 22:43 ` Davide Libenzi
2001-02-21 22:14 ` H. Peter Anvin
2001-02-21 22:32 ` Martin Mares
2001-02-21 22:38 ` H. Peter Anvin
2001-02-21 22:50 ` Martin Mares
2001-02-21 22:54 ` H. Peter Anvin
2001-02-21 23:07 ` Martin Mares
2001-02-21 23:15 ` H. Peter Anvin
2001-02-21 23:42 ` Daniel Phillips
2001-02-21 23:52 ` Davide Libenzi
[not found] ` <3A945081.E6EB78F4@innominate.de>
2001-02-21 23:48 ` H. Peter Anvin
2001-02-22 1:22 ` Daniel Phillips
2001-02-22 1:42 ` H. Peter Anvin
2001-02-22 2:03 ` Andreas Dilger
2001-02-22 2:41 ` H. Peter Anvin
2001-02-22 3:43 ` Daniel Phillips
2001-02-22 4:02 ` Linus Torvalds
2001-02-22 5:19 ` Linus Torvalds
2001-02-22 11:31 ` Ingo Oeser
2001-02-22 18:20 ` Linus Torvalds
2001-02-22 4:02 ` H. Peter Anvin
2001-02-22 7:03 ` Andreas Dilger
2001-02-22 4:03 ` H. Peter Anvin
2001-02-22 10:35 ` Alan Cox
2001-02-23 0:59 ` Felix von Leitner
2001-02-22 3:08 ` Daniel Phillips
2001-02-22 8:06 ` [rfc] [LONG] " Andreas Dilger
2001-02-22 7:20 ` [rfc] " Bill Wendling
2001-02-22 8:34 ` Rogier Wolff
2001-02-21 23:26 ` Jamie Lokier
2001-02-22 19:04 ` Kai Henningsen
2001-02-22 6:23 ` [Ext2-devel] " tytso
2001-02-22 7:24 ` Daniel Phillips
2001-02-22 13:20 ` tytso
2001-02-22 18:16 ` Andreas Dilger
2001-02-22 23:04 ` Daniel Phillips
2001-02-23 20:11 ` tytso
2001-02-24 0:32 ` Andreas Dilger
2001-02-22 23:40 ` tytso
2001-02-22 18:38 ` Kai Henningsen
[not found] <Pine.LNX.4.10.10102211740550.1933-100000@coffee.psychology.mcmaster.ca>
2001-02-21 22:47 ` H. Peter Anvin
-- strict thread matches above, loose matches on Subject: below --
2001-02-23 1:52 Andries.Brouwer
2001-02-23 21:43 ` Ralph Loader
2001-02-23 22:37 ` Guest section DW
2001-02-24 2:47 ` Ralph Loader
2001-02-24 5:34 ` Ralph Loader
2001-02-23 2:49 Andries.Brouwer
2001-02-23 3:42 ` Daniel Phillips
2001-02-23 12:20 ` Jonathan Morton
2001-02-23 18:57 ` Andreas Dilger
2001-02-23 12:38 Andries.Brouwer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3A95931B.A70DCB8A@innominate.de \
--to=phillips@innominate.de \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox