From mboxrd@z Thu Jan  1 00:00:00 1970
From: Josef Bacik <jbacik@fb.com>
Subject: Re: Name hashing function causing a perf regression
Date: Fri, 12 Sep 2014 17:25:50 -0400
Message-ID: <541364DE.5030104@fb.com>
References: <540F5562.1050505@fb.com>	<8761gs64dx.fsf@tassilo.jf.intel.com>	<CA+55aFyFNEk7XkukAcPa3O75u69yE57bVTGbiawb8sBMu-NPUg@mail.gmail.com>	<54134EFA.2030101@fb.com> <CA+55aFw52k0wKkVP85G+x_OoQwyfvP5yFEguHn8E4qd4gLOUiw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Andi Kleen <andi@firstfloor.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>, Chris Mason <clm@fb.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:38993 "EHLO
	mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK)
	by vger.kernel.org with ESMTP id S1752106AbaILV0a (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Fri, 12 Sep 2014 17:26:30 -0400
In-Reply-To: <CA+55aFw52k0wKkVP85G+x_OoQwyfvP5yFEguHn8E4qd4gLOUiw@mail.gmail.com>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 09/12/2014 04:39 PM, Linus Torvalds wrote:
> On Fri, Sep 12, 2014 at 12:52 PM, Josef Bacik <jbacik@fb.com> wrote:
>>
>> So that looks much better, not perfect but hlist_for_each through 5 entries
>> isn't going to kill us, I'll build a kernel with this and get back shortly
>> with real numbers.
>
> So one thing to look out for is that the fold_hash() is only effective
> for 64-bit architectures, and I'm pretty sure that we end up having a
> very similar issue on 32-bit architectures that have
>
>   - two 32-bit words with the fairly simplisting mixing (basically
> "first word * 9 + second word")
>
>     This is probably ok'ish, in that you end up still having a 32-bit
> number with *most* of the bits being relevant, but we should think
> about this too.
>
>   - the real problem is likely the "d_hash()" function that narrows the
> 32-bit hash down to d_hash_mask bits, and drops a lot of bits as it
> does so (ie d_hash_mask is usually on the order of 12 bits, so it
> really only uses 24 bits of the 32-bit hash value, and 12 bits is
> particularly bad anyway, since it's a multiple of three which is what
> the '9' in the first simplistic mixing used too.
>
> So it *might* be better to fix this all at d_hash() instead, and make
> that use "hash_u32(hash, d_hash_bits)" instead. And maybe keep the
> fold_hash() stage fairly stupid.
>
> Anyway, what I'm trying to say is that your numbers are certainly
> *much* better than the hash bucket list used to be, but we should
> probably tweak it a bit more. I'm not sure you really need to go to a
> real hash function like murmur, but it would also be a good idea to at
> least look at making some initial value be randomized so that people
> also can't quite as easily perhaps try to hit worst-case situations
> where they control the hash and can make really long buckets (the fact
> that we mix in the parent pointer and many filesystems have some
> directory size limit might make that harder anyway, but..)
>

Ok I have a good direction to take this in, so far the best results have 
been to change fold_hash() to hash_64(hash, 32) and change d_hash to do this

static int *d_hash(int *table, unsigned long parent, unsigned int hash)
{
         hash += (unsigned long) parent / L1_CACHE_BYTES;
         hash += hash_32(hash, d_hash_shift);
         return table + (hash & d_hash_mask);
}

which has gotten us 890k buckets.  I'll futz with it some more on Monday 
and send out a patch with whatever gives me the best numbers.  Thanks 
for the help, hashing is not my forte,

Josef