From: Andi Kleen <ak@suse.de>
To: torvalds@transmeta.com (Linus Torvalds)
Cc: linux-kernel@vger.kernel.org, lse-tech@projects.sourceforge.net
Subject: Re: [PATCH] New dcache / inode hash tuning patch
Date: 28 Feb 2003 09:34:24 +0100 [thread overview]
Message-ID: <p73n0kg7qi7.fsf@amdsimf.suse.de> (raw)
In-Reply-To: torvalds@transmeta.com's message of "28 Feb 2003 00:24:06 +0100"
torvalds@transmeta.com (Linus Torvalds) writes:
[please read the whole post before answering. thanks]
> But quite frankly, if the hash list heads are actually noticeable
> memory users, the hash is likely to be _way_ too big. The list heads
> are _much_ smaller than the entries they point to, the hash list just
> shouldn't be big enough that it really matters.
On big machines it is currently 1+MB. IMHO this is way too big.
>
> - hash list cache footprint
>
> Again: the hash head array itself is at least dense in the cache, and
> each entry is _much_ smaller than the actual data structures it
> points to. So even if you improve the hash heads to be better from a
> cache standpoint, you're only getting a very small percentage of the
> real cache costs.
It would be possible to cache line optimize the layout of struct dentry
in addition. May be an interesting add-on project for someone...
But for lookup walking even one cache line - the one containing d_hash -
should be needed. Unless d_hash is unlucky enough to cross a cache
line for its two members ... but I doubt that.
>
> So let's say that the cache costs of the dcache is 4% (according to
> the oprofile run), then saving a few procent of that is not actually
> going to be noticeable at a user level.
>
> And the downsides of the hash list is that addition/removal is costlier
> due to the conditionals, and a non-conditional version (a common
But the list walking is faster. Testing for NULL generates much better
code on i386 than having to dedicate a register for storing the head
to test against. List walking happens more often than insertion/deletion.
I believe the conditionals are completely left in the noise compared
to the cache misses the two pointer head version causes. You can execute
a lot of conditionals in the time needed to serve one cache miss!
Please take a look at the x86 assembly generated by list_for_each
vs hlist_for_each. hlist_for_each looks much nicer, especially when you
can use the register normally wasted on the head for something else
in the loop body.
In case of dcache rcu it also made things simpler/faster because it didn't
require the complicated is_bucket race breaker check.
> In other words: it may be that our current dentry hashes are too big,
> and that is certainly worth fixing if so. But the "hlist" approach very
> fundamentally cannot really help the _real_ problem very much, and it
> will (slightly) hurt the case where the hashes are actually cached.
I admit it is a kind of micro optimization, but I believe it is an useful
one. Frankly wasting two pointers for a hash bucket in a potentially
big hash table is just s*d.
>
> So I really think that the only longterm fix is to make the lookup data
> structures be "local" to the base of the lookup, in order to get away
> from the inherently non-local nature of the current hash lookups.
Yes, that may be a good idea. I don't have time to work on this though.
Still even with local lookups single pointer buckets will likely help ;)
Also isn't it a bit late in the 2.5 cycle to think about such radical
changes like local lookup? It sounds more like a nice 2.7 project. I
believe my patch with a bit more tweaking (my current 64K hash table
seems to be too small) is suitable even for an soon to be stable
kernel.
Also my patch had some other changes that I believe should be included
anyways because they're independent and improvement. It replaces the
max_dentries race break hack with a better algorithm to detect cycles on walk.
Also it does more prefetches while list walking which I believe to be
useful.
-Andi
next parent reply other threads:[~2003-02-28 8:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20030226164904.GA21342@wotan.suse.de.suse.lists.linux.kernel>
[not found] ` <b3m5sd$1ad$1@penguin.transmeta.com.suse.lists.linux.kernel>
2003-02-28 8:34 ` Andi Kleen [this message]
2003-02-28 10:27 ` [Lse-tech] Re: [PATCH] New dcache / inode hash tuning patch Paul Menage
2003-02-28 10:40 ` Andi Kleen
2003-02-28 18:47 ` Linus Torvalds
2003-02-28 18:59 ` Andi Kleen
2003-03-01 0:49 ` Jan Harkes
2003-03-01 1:17 ` Andi Kleen
2003-03-01 4:31 ` Ed Tomlinson
2003-03-01 9:08 ` Andi Kleen
2003-03-01 12:34 ` Ed Tomlinson
2003-03-01 18:34 ` Linus Torvalds
2003-02-26 16:49 Andi Kleen
2003-02-27 23:10 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=p73n0kg7qi7.fsf@amdsimf.suse.de \
--to=ak@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@projects.sourceforge.net \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox