From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: David Laight <David.Laight@aculab.com>,
'Herbert Xu' <herbert@gondor.apana.org.au>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Thomas Graf <tgraf@suug.ch>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
"rcu@vger.kernel.org" <rcu@vger.kernel.org>
Subject: Re: [PATCH 0/1] Rosebush, a new hash table
Date: Sun, 25 Feb 2024 05:01:19 +0000 [thread overview]
Message-ID: <ZdrJn0lkFeYGuYIC@casper.infradead.org> (raw)
In-Reply-To: <2s73sed5n6kxg42xqceenjtcwxys4j2r5dc5x4fdtwkmhkw3go@7viy7qli43wd>
On Sat, Feb 24, 2024 at 10:18:31PM -0500, Kent Overstreet wrote:
> On Sat, Feb 24, 2024 at 10:10:27PM +0000, David Laight wrote:
> > I remember playing around with the elf symbol table for a browser
> > and all its shared libraries.
> > While the hash function is pretty trivial, it really didn't matter
> > whether you divided 2^n, 2^n-1 or 'the prime below 2^n' some hash
> > chains were always long.
>
> that's a pretty bad hash, even golden ratio hash would be better, but
> still bad; you really should be using at least jhash.
There's a "fun" effect; essentially the "biased observer" effect which
leads students to erroneously conclude that the majority of classes are
oversubscribed. As somebody observed in this thread, for some usecases
you only look up hashes which actually exist.
Task a trivial example where you have four entries unevenly distributed
between two buckets, three in one bucket and one in the other. Now 3/4
of your lookups hit in one bucket and 1/4 in the other bucket.
Obviously it's not as pronounced if you have 1000 buckets with 1000
entries randomly distributed between the buckets. But that distribution
is not nearly as even as you might expect:
$ ./distrib
0: 362
1: 371
2: 193
3: 57
4: 13
5: 4
That's using lrand48() to decide which bucket to use, so not even a
"quality of hash" problem, just a "your mathematical intuition may not
be right here".
To put this data another way, 371 entries are in a bucket with a single
entry, 384 are in a bucket with two entries, 171 are in a 3-entry
bucket, 52 are in a 4-entry bucket and 20 are in a 5-entry bucket.
$ cat distrib.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
int bucket[1000];
int freq[10];
int main(int argc, char **argv)
{
int i;
for (i = 0; i < 1000; i++)
bucket[lrand48() % 1000]++;
for (i = 0; i < 1000; i++)
freq[bucket[i]]++;
for (i = 0; i < 10; i++)
printf("%d: %d\n", i, freq[i]);
return 0;
}
(ok, quibble about "well, 1000 doesn't divide INT_MAX evenly so your
random number generation is biased", but i maintain that will not
materially affect these results due to it affecting only 0.00003% of
numbers generated)
next prev parent reply other threads:[~2024-02-25 5:01 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 20:37 [PATCH 0/1] Rosebush, a new hash table Matthew Wilcox (Oracle)
2024-02-22 20:37 ` [PATCH 1/1] rosebush: Add new data structure Matthew Wilcox (Oracle)
2024-02-25 6:38 ` Al Viro
2024-02-23 11:37 ` [PATCH 0/1] Rosebush, a new hash table Peng Zhang
2024-02-23 13:55 ` Jason A. Donenfeld
2024-02-23 18:40 ` Kent Overstreet
2024-02-24 0:20 ` Herbert Xu
2024-02-24 22:10 ` David Laight
2024-02-25 0:50 ` Herbert Xu
2024-02-25 3:20 ` Kent Overstreet
2024-02-25 3:18 ` Kent Overstreet
2024-02-25 5:01 ` Matthew Wilcox [this message]
2024-02-25 5:32 ` Herbert Xu
2024-02-25 5:51 ` Kent Overstreet
2024-02-25 5:53 ` Herbert Xu
2024-02-25 6:14 ` Kent Overstreet
2024-02-25 6:17 ` Herbert Xu
2024-02-25 14:47 ` David Laight
2024-02-25 21:48 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZdrJn0lkFeYGuYIC@casper.infradead.org \
--to=willy@infradead.org \
--cc=David.Laight@aculab.com \
--cc=herbert@gondor.apana.org.au \
--cc=kent.overstreet@linux.dev \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maple-tree@lists.infradead.org \
--cc=netdev@vger.kernel.org \
--cc=rcu@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).