public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Dcache hash distrubition patches
@ 2003-03-10 17:45 Martin J. Bligh
  2003-03-10 17:52 ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Martin J. Bligh @ 2003-03-10 17:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andi Kleen

Thanks to Andi for the distribution count patches. I did some quick
numbers with this, just by running "find / | xargs ls -l > /dev/null"
to populate the cache, then dumped the numbers out.

larry:~# grep dentry /proc/slabinfo
dentry_cache      880531 884880    160 36870 36870    1 :  248  124

  count length
 444569 0
 381295 1
 163442 2
  46937 3
  10042 4
   1747 5
    253 6
     30 7
      5 8

Conclusion: the hash distribution (for this simple test) looks fine
to me.

M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-10 17:45 Dcache hash distrubition patches Martin J. Bligh
@ 2003-03-10 17:52 ` Andi Kleen
  2003-03-11  7:41   ` Martin J. Bligh
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2003-03-10 17:52 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, Andi Kleen


> Conclusion: the hash distribution (for this simple test) looks fine
> to me.

Yes, because of the overkill size of the hash table. With a 100K + entry
table you can make near every hash function look good ;)

Try to reduce it to a smaller number of buckets and see what happens.

-Andi




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-10 17:52 ` Andi Kleen
@ 2003-03-11  7:41   ` Martin J. Bligh
  2003-03-11  7:47     ` William Lee Irwin III
  2003-03-11 15:23     ` Andi Kleen
  0 siblings, 2 replies; 8+ messages in thread
From: Martin J. Bligh @ 2003-03-11  7:41 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

>> Conclusion: the hash distribution (for this simple test) looks fine
>> to me.
> 
> Yes, because of the overkill size of the hash table. With a 100K + entry
> table you can make near every hash function look good ;)
> 
> Try to reduce it to a smaller number of buckets and see what happens.

OK, after I've stopped being an idiot, and misreading the code, I have 
some numbers. They still look pretty good to me. I shrunk us from
1,048,576 buckets to 65536, and loaded 1,150,000 entries in there.



      5 3
      9 4
     44 5
    113 6
    243 7
    519 8
   1059 9
   1613 10
   2458 11
   3506 12
   4515 13
   5349 14
   6071 15
   6328 16
   6369 17
   5862 18
   5228 19
   4305 20
   3546 21
   2613 22
   1981 23
   1382 24
    928 25
    602 26
    368 27
    230 28
    115 29
     75 30
     45 31
     16 32
     14 33
      3 34
      4 35
      2 36

It's not perfect, but not bad either. Some mathematician can go calculate
just how imperfect it is over random distribution, but it looks OK to me ;-)

M


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-11  7:41   ` Martin J. Bligh
@ 2003-03-11  7:47     ` William Lee Irwin III
  2003-03-11 15:23     ` Andi Kleen
  1 sibling, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2003-03-11  7:47 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andi Kleen, linux-kernel

On Mon, Mar 10, 2003 at 11:41:07PM -0800, Martin J. Bligh wrote:
> It's not perfect, but not bad either. Some mathematician can go calculate
> just how imperfect it is over random distribution, but it looks OK to me ;-)

Okay, the standards are a bit more stringent for hash functions. You
basically want the total number of collisions to be minimized.

That said, standard statistical things are useful for first-pass
discrimination.

This looks basically okay, but if it's really just even with respect
to cache pressure the need for a different data structure in the
eventual future is clear. I'd need a baseline (pre-hlist) dcache
snapshot to compare against to get a better idea.


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-11  7:41   ` Martin J. Bligh
  2003-03-11  7:47     ` William Lee Irwin III
@ 2003-03-11 15:23     ` Andi Kleen
  2003-03-11 15:31       ` Martin J. Bligh
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2003-03-11 15:23 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andi Kleen, linux-kernel

On Tue, Mar 11, 2003 at 08:41:07AM +0100, Martin J. Bligh wrote:
> some numbers. They still look pretty good to me. I shrunk us from
> 1,048,576 buckets to 65536, and loaded 1,150,000 entries in there.

Interesting would be to find the sweet spot with the smallest hash table 
size that still performs well. Not sure if find / is a good workload
for that though.

Also same for inode hash (but I don't have statistics for that right now)

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-11 15:23     ` Andi Kleen
@ 2003-03-11 15:31       ` Martin J. Bligh
  2003-03-11 16:27         ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Martin J. Bligh @ 2003-03-11 15:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

>> some numbers. They still look pretty good to me. I shrunk us from
>> 1,048,576 buckets to 65536, and loaded 1,150,000 entries in there.
> 
> Interesting would be to find the sweet spot with the smallest hash table 
> size that still performs well. Not sure if find / is a good workload
> for that though.

Difficult, as it depends how many files are in the working set of the 
machine, really. Right now it eats 4Mb of lowmem ... 1M entries seems
to be about 150Mb of slab for dentries, which is probably as much as
anyone wants ... but it's nice to keep those hash chains short ;-)

I can try 1Mb or something I suppose ... what's the purpose here,
to keep the cachelines of the bucket heads warm? Not sure it's worth
the tradeoff, as we have to touch another line for each element we
walk?

I take it you're happy enough with the current hash function distribution?
 
> Also same for inode hash (but I don't have statistics for that right now)

I could hack something up ... but 1 machine ain't going to cut it. I
suspect I'd have a much smaller inode hash, as I tend to have masses
of kernel trees, mostly hardlinked to each other.

M.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-11 15:31       ` Martin J. Bligh
@ 2003-03-11 16:27         ` Andi Kleen
  2003-03-11 17:29           ` Anton Blanchard
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2003-03-11 16:27 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andi Kleen, linux-kernel

On Tue, Mar 11, 2003 at 04:31:23PM +0100, Martin J. Bligh wrote:
> I can try 1Mb or something I suppose ... what's the purpose here,
> to keep the cachelines of the bucket heads warm? Not sure it's worth
> the tradeoff, as we have to touch another line for each element we
> walk?

Use less cache for the hash table overall.

Use less lowmem.

Ideally there would be no tradeoff if you can still get reasonable 
hash chain length with smaller tables (= overall win)

> 
> I take it you're happy enough with the current hash function distribution?

At least I don't know how to improve it.



> > Also same for inode hash (but I don't have statistics for that right now)
> 
> I could hack something up ... but 1 machine ain't going to cut it. I
> suspect I'd have a much smaller inode hash, as I tend to have masses
> of kernel trees, mostly hardlinked to each other.

I doubt inode cache is very critical, except perhaps in NFS server loads
(but even the nfs server has an own frontend cache)
Normally the dcache should bear most of the load.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Dcache hash distrubition patches
  2003-03-11 16:27         ` Andi Kleen
@ 2003-03-11 17:29           ` Anton Blanchard
  0 siblings, 0 replies; 8+ messages in thread
From: Anton Blanchard @ 2003-03-11 17:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Martin J. Bligh, linux-kernel


> I doubt inode cache is very critical, except perhaps in NFS server loads
> (but even the nfs server has an own frontend cache)
> Normally the dcache should bear most of the load.

I did some analysis of the hashes ages ago. From memory the regular
grouping of inodes caused this unbalanced distribution:

http://samba.org/~anton/linux/hashes/1/icache.png

Anton

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2003-03-11 17:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-10 17:45 Dcache hash distrubition patches Martin J. Bligh
2003-03-10 17:52 ` Andi Kleen
2003-03-11  7:41   ` Martin J. Bligh
2003-03-11  7:47     ` William Lee Irwin III
2003-03-11 15:23     ` Andi Kleen
2003-03-11 15:31       ` Martin J. Bligh
2003-03-11 16:27         ` Andi Kleen
2003-03-11 17:29           ` Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox