From: Josef Bacik <jbacik@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Andi Kleen <andi@firstfloor.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>, Chris Mason <clm@fb.com>
Subject: Re: Name hashing function causing a perf regression
Date: Fri, 12 Sep 2014 15:52:26 -0400 [thread overview]
Message-ID: <54134EFA.2030101@fb.com> (raw)
In-Reply-To: <CA+55aFyFNEk7XkukAcPa3O75u69yE57bVTGbiawb8sBMu-NPUg@mail.gmail.com>
On 09/12/2014 03:21 PM, Linus Torvalds wrote:
> On Fri, Sep 12, 2014 at 12:11 PM, Andi Kleen <andi@firstfloor.org> wrote:
>> Josef Bacik <jbacik@fb.com> writes:
>>>
>>> So the question is what do we do here? I tested other random strings
>>> and every one of them ended up worse as far as collisions go with the
>>> new function vs the old one. I assume we want to keep the word at a
>>> time functionality, so should we switch to a different hashing scheme,
>>> like murmur3/fnv/xxhash/crc32c/whatever? Or should we just go back to
>>
>> Would be interesting to try murmur3.
>
> I seriously doubt it's the word-at-a-time part, since Josef reports
> that it's "suboptimal for < sizeof(unsigned long) string names", and
> for those, there is no data loss at all.
>
> The main difference is that the new hash doesn't try to finish the
> hash particularly well. Nobody complained up until now.
>
> The old hash kept mixing up the bits for each byte it encounters,
> while the new hash really only does that mixing at the end. And its
> mixing is particularly stupid and weak: see fold_hash() (and then
> d_hash() does something very similar).
>
> So the _first_ thing to test would be to try making "fold_hash()"
> smarter. Perhaps using "hash_long(hash, 32)" instead?
>
> Linus
>
Ok with the hash_long(hash, 32) change I get this
[jbacik@devbig005 ~/local] ./hash
Old hash table had 1000000 entries, 0 dupes, 0 max dupes
New hash table had 331504 entries, 668496 dupes, 5 max dupes
We had 292735 buckets with a p50 of 3 dupes, p90 of 4 dupes, p99 of 5
dupes for the new hash
So that looks much better, not perfect but hlist_for_each through 5
entries isn't going to kill us, I'll build a kernel with this and get
back shortly with real numbers. Thanks,
Josef
next prev parent reply other threads:[~2014-09-12 19:53 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-09 19:30 Name hashing function causing a perf regression Josef Bacik
2014-09-12 19:11 ` Andi Kleen
2014-09-12 19:21 ` Linus Torvalds
2014-09-12 19:52 ` Josef Bacik [this message]
2014-09-12 20:39 ` Linus Torvalds
2014-09-12 21:25 ` Josef Bacik
2014-09-12 22:01 ` Linus Torvalds
2014-09-12 22:08 ` Josef Bacik
2014-09-12 22:25 ` Linus Torvalds
2014-09-13 18:58 ` Linus Torvalds
2014-09-15 1:32 ` Linus Torvalds
2014-09-15 2:49 ` Tetsuo Handa
2014-09-15 3:37 ` Linus Torvalds
2014-09-15 4:58 ` Tetsuo Handa
2014-09-15 14:17 ` Linus Torvalds
2014-09-15 15:55 ` Josef Bacik
2014-09-15 16:22 ` Linus Torvalds
2014-09-15 16:25 ` Al Viro
2014-09-15 16:33 ` Linus Torvalds
2014-09-15 16:35 ` Greg KH
2014-09-15 16:45 ` Linus Torvalds
2014-09-15 16:53 ` Jiri Slaby
2014-09-15 17:31 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54134EFA.2030101@fb.com \
--to=jbacik@fb.com \
--cc=andi@firstfloor.org \
--cc=clm@fb.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.