All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>, Chris Mason <clm@fb.com>
Subject: Re: Name hashing function causing a perf regression
Date: Fri, 12 Sep 2014 18:08:47 -0400	[thread overview]
Message-ID: <54136EEF.4050209@fb.com> (raw)
In-Reply-To: <CA+55aFy7ZHLQ6pz5n5Moa8FHubwTcApMk5VEse8BjO6B80xzkQ@mail.gmail.com>

On 09/12/2014 06:01 PM, Linus Torvalds wrote:
> On Fri, Sep 12, 2014 at 2:25 PM, Josef Bacik <jbacik@fb.com> wrote:
>>
>> Ok I have a good direction to take this in, so far the best results have
>> been to change fold_hash() to hash_64(hash, 32) and change d_hash to do this
>>
>> static int *d_hash(int *table, unsigned long parent, unsigned int hash)
>> {
>>          hash += (unsigned long) parent / L1_CACHE_BYTES;
>>          hash += hash_32(hash, d_hash_shift);
>>          return table + (hash & d_hash_mask);
>> }
>>
>
> It really should be sufficient to just do
>
>    static int *d_hash(int *table, unsigned long parent, unsigned int hash)
>    {
>          hash += (unsigned long) parent / L1_CACHE_BYTES;
>          return table + hash_32(hash, d_hash_shift);
>    }
>
> because the hash_32() function should already reduce the hash to its
> second argument (d_hash_shift) and mix around the bits sufficiently.
>
> I'm a *bit* nervous about the cost of this all, especially on CPU's
> where integer multiplies are expensive, but obviously we need to
> improve on the final hashing.
>
> Just out of interest, how good/bad does the hash look if the *only*
> change you do is the above d_hash() thing (ie just leave the
> fold_hash() thing alone?)
>
>


[jbacik@devbig005 ~/local] ./hash
Old hash table had 1000000 entries, 0 dupes, 0 max dupes
New hash table had 62200 entries, 937800 dupes, 90 max dupes
Fnv hash table had 988406 entries, 11594 dupes, 3 max dupes
We had 49800 buckets with a p50 of 8 dupes, p90 of 45 dupes, p99 of 80 
dupes for the new hash

"New hash" is the number you want.  Not as bad as originally, not as 
good as replacing fold_hash with hash_64 and not changing d_hash.  Thanks,

Josef

  reply	other threads:[~2014-09-12 22:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-09 19:30 Name hashing function causing a perf regression Josef Bacik
2014-09-12 19:11 ` Andi Kleen
2014-09-12 19:21   ` Linus Torvalds
2014-09-12 19:52     ` Josef Bacik
2014-09-12 20:39       ` Linus Torvalds
2014-09-12 21:25         ` Josef Bacik
2014-09-12 22:01           ` Linus Torvalds
2014-09-12 22:08             ` Josef Bacik [this message]
2014-09-12 22:25               ` Linus Torvalds
2014-09-13 18:58                 ` Linus Torvalds
2014-09-15  1:32                   ` Linus Torvalds
2014-09-15  2:49                     ` Tetsuo Handa
2014-09-15  3:37                       ` Linus Torvalds
2014-09-15  4:58                         ` Tetsuo Handa
2014-09-15 14:17                           ` Linus Torvalds
2014-09-15 15:55                     ` Josef Bacik
2014-09-15 16:22                       ` Linus Torvalds
2014-09-15 16:25                         ` Al Viro
2014-09-15 16:33                           ` Linus Torvalds
2014-09-15 16:35                         ` Greg KH
2014-09-15 16:45                           ` Linus Torvalds
2014-09-15 16:53                             ` Jiri Slaby
2014-09-15 17:31                             ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54136EEF.4050209@fb.com \
    --to=jbacik@fb.com \
    --cc=andi@firstfloor.org \
    --cc=clm@fb.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.