[PATCH] dentry and inode cache hash algorithm performance changes.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] dentry and inode cache hash algorithm performance changes.
@ 2004-04-30 19:55 Jose R. Santos
  2004-05-01 12:08 ` Olaf Dietsche
  0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 19:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, anton, dheger, slpratt

err... Seems I've send this to the wrong list.  
Sending to linux-kernel this time :)

-----------------------------------------------------------------------

Hi Andrew,

Could you consider this patch for inclusion into mainline kernel?  It 
alleviates some issues seen with Linux when accessing millions of files
on machines with large amounts of RAM (+32GB).  Both algorithms are base
on some studies that Dominique Heger was doing on hash table efficiencies
in Linux.  The dentry hash table has been tested in small systems with 
one internal IDE hard disk as well as in large SMP with many fiberchanel
disks.  Dominique claims that in all the testing done, they did not see
one case were this has function provided worst performance and that in
most test they were seeing better performance.

The inode hash function was done by me base on Dominique's original work
and has only been stress tested with SpecSFS.  It provided a 3% 
improvement over the default algorithm in the SpecSFS results and speed 
ups in the response time of almost all filesystem operations the benchmark
stress.  With the better distribution is as also possible to reduce the 
number of inode buckets for 32 million to 16 million and still get a slightly
better results.

Anton was nice enough to provide some graphs that show the distribution 
before and after the patch at http://samba.org/~anton/linux/sfs/1/

Thanks 

-JRS

# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1582  -> 1.1583 
#	         fs/dcache.c	1.70    -> 1.71   
#	          fs/inode.c	1.113   -> 1.114  
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 04/04/30	jsantos@rx8.austin.ibm.com	1.1583
# Hash functions changes that show improvements on SpecSFS when a large amount of files is used (+20 Million).
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Fri Apr 30 12:14:23 2004
+++ b/fs/dcache.c	Fri Apr 30 12:14:23 2004
@@ -28,6 +28,7 @@
 #include <asm/uaccess.h>
 #include <linux/security.h>
 #include <linux/seqlock.h>
+#include <linux/hash.h>

 #define DCACHE_PARANOIA 1
 /* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@

 static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
 {
-	hash += (unsigned long) parent / L1_CACHE_BYTES;
-	hash = hash ^ (hash >> D_HASHBITS);
+	hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+	hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
 	return dentry_hashtable + (hash & D_HASHMASK);
 }

diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Fri Apr 30 12:14:23 2004
+++ b/fs/inode.c	Fri Apr 30 12:14:23 2004
@@ -671,8 +671,9 @@

 static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
 {
-	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
-	tmp = tmp + (tmp >> I_HASHBITS);
+	unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
+			/ L1_CACHE_BYTES);
+	tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
 	return tmp & I_HASHMASK;
 }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
       [not found] ` <20040430131832.45be6956.akpm@osdl.org>
@ 2004-04-30 20:57   ` Jose R. Santos
  2004-04-30 21:33     ` Jose R. Santos
  0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 20:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jose R. Santos, linux-kernel, anton, dheger

On 04/30/04 15:18:32, Andrew Morton wrote:
> "Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
> >
> > diff -Nru a/fs/inode.c b/fs/inode.c
> > --- a/fs/inode.c	Fri Apr 30 12:14:23 2004
> > +++ b/fs/inode.c	Fri Apr 30 12:14:23 2004
> > @@ -671,8 +671,9 @@
> >  
> >  static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
> >  {
> > -	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
> > -	tmp = tmp + (tmp >> I_HASHBITS);
> > +	unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
> > +			/ L1_CACHE_BYTES);
> > +	tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
> >  	return tmp & I_HASHMASK;
> >  }
> >  
> 
> Are you sure about this?  It's doing:
> 
> 	tmp = hashval + sb ^ ((GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES)
> 
> should it not be:
> 
> 	tmp = hashval + (sb ^ (GOLDEN_RATIO_PRIME + hashval)) / L1_CACHE_BYTES
> 
> ?

err... Wrote the patch to fast.  It should read

	tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES

I screw up... I'll send a fixed patch in a while.

-JRS



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-04-30 20:57   ` [PATCH] dentry and inode cache hash algorithm performance changes Jose R. Santos
@ 2004-04-30 21:33     ` Jose R. Santos
  2004-04-30 22:02       ` Andrew Morton
  0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 21:33 UTC (permalink / raw)
  To: Jose R. Santos; +Cc: Andrew Morton, Jose R. Santos, linux-kernel, anton, dheger

On 04/30/04 15:57:01, Jose R. Santos wrote:
> err... Wrote the patch to fast.  It should read
> 
> 	tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
> 
> I screw up... I'll send a fixed patch in a while.

Just notice I've made another error in the inode hash code.

Fixed patch (I hope) with beautification.

-JRS


diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c	Fri Apr 30 16:27:41 2004
+++ b/fs/dcache.c	Fri Apr 30 16:27:41 2004
@@ -28,6 +28,7 @@
 #include <asm/uaccess.h>
 #include <linux/security.h>
 #include <linux/seqlock.h>
+#include <linux/hash.h>
 
 #define DCACHE_PARANOIA 1
 /* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@
 
 static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
 {
-	hash += (unsigned long) parent / L1_CACHE_BYTES;
-	hash = hash ^ (hash >> D_HASHBITS);
+	hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+	hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
 	return dentry_hashtable + (hash & D_HASHMASK);
 }
 
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c	Fri Apr 30 16:27:41 2004
+++ b/fs/inode.c	Fri Apr 30 16:27:41 2004
@@ -671,12 +671,13 @@
 
 static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
 {
-	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
-	tmp = tmp + (tmp >> I_HASHBITS);
+	unsigned long tmp;
+	
+	tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+			L1_CACHE_BYTES;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
 	return tmp & I_HASHMASK;
 }
-
-/* Yeah, I know about quadratic hash. Maybe, later. */
 
 /**
  *	iunique - get a unique inode number

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-04-30 21:33     ` Jose R. Santos
@ 2004-04-30 22:02       ` Andrew Morton
  2004-04-30 23:42         ` Jose R. Santos
  2004-05-04 13:12         ` Jose R. Santos
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2004-04-30 22:02 UTC (permalink / raw)
  To: Jose R. Santos; +Cc: jrsantos, linux-kernel, anton, dheger

"Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
>
> On 04/30/04 15:57:01, Jose R. Santos wrote:
> > err... Wrote the patch to fast.  It should read
> > 
> > 	tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
> > 
> > I screw up... I'll send a fixed patch in a while.
> 
> Just notice I've made another error in the inode hash code.
> 
> Fixed patch (I hope) with beautification.

Does this mean you need to redo the instrumentation and benchmarking?  If
so, please do that and send the numbers along?  There's no particular
urgency on that, but we should do it.

Also, I'd be interested in understanding what the input to the hashing
functions looked like in this testing.  It could be that the new hash just
happens to work well with one particular test's dataset.  Please convince
us otherwise ;)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-04-30 22:02       ` Andrew Morton
@ 2004-04-30 23:42         ` Jose R. Santos
  2004-05-04 13:12         ` Jose R. Santos
  1 sibling, 0 replies; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 23:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jose R. Santos, jrsantos, linux-kernel, anton, dheger

On 04/30/04 17:02:56, Andrew Morton wrote:
> Does this mean you need to redo the instrumentation and benchmarking?  If
> so, please do that and send the numbers along?  There's no particular
> urgency on that, but we should do it.

No, the screw up was made we I created the patch on a clean source
tree.  This patch represents the actual data gather on those graphs.

> Also, I'd be interested in understanding what the input to the hashing
> functions looked like in this testing.  It could be that the new hash just
> happens to work well with one particular test's dataset.  Please convince
> us otherwise ;)

hehehe...  I knew it could't be that easy. :)

For the dentry hash function, some of my other coorkers had put this
hash function through various testing and have concluded that the 
hash function was equal or better than the default hash function.
These runs were done with a (hopefully to be Open Source soon)
benchmark called FFSB which can simulate various io patters across
many filesystems and variable file sizes.

SpecSFS fileset is basically a lot of small file which varies 
depending on the size of the run.  For a not so big SMP system
the number of file is in the +20 Million files range.  Of those
20 million files only 10% are access randomly by the client.  The 
purpose of this is that the benchmark tries to stress not only
the NFS layer but, VM and Filesystems layers as well.  The filesets
are also hundreds of gigabytes in size in order to promote disk 
head movement by guaranteeing cache misses in memory.  SFS 27% of 
the workload are lookups __d_lookup has showing high in my 
profiles.

For the inode hash the problem that I see is that when running 
a benchmark with this huge fileset we end up trying to free a
lot of inode entries during the run while trying to put new 
entries in cache.  We end up calling ifind_fast() which calls
find_inodes_fast() held under inode_lock.  In order to avoid 
holding the inode_lock we needed to avoid having long chains 
in that hash function.

When I took a look at the original hash function, I found it 
to be a bit to simple for any workload.  My solution (which I 
took advantage of Dominique's work) was to create a hash 
that function that could generate completely different hashes
depending on the hashval and the superblock in order to have 
the hash scale as we added more filesystems to the machine.

Both of these problems can be somewhat tuned out by increasing
the number of buckets of both d and i cache but it got to a 
point were I had 256MB of inode and 128MB in dentry hash buckets 
on a not so large SMP.  With the hash changes I have been able 
to reduce the number of buckets to 128MB for inode cache and to 
32MB for dentry cache and still get better performance.

If it help my case...  I haven't been running this benchmark 
for long, so I haven't been able to find a way to cheat.  I 
need to come up with generic solutions until I can find a
cheat for the benchmark. :) 

-JRS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-04-30 19:55 Jose R. Santos
@ 2004-05-01 12:08 ` Olaf Dietsche
  2004-05-01 15:08   ` Jose R. Santos
  0 siblings, 1 reply; 12+ messages in thread
From: Olaf Dietsche @ 2004-05-01 12:08 UTC (permalink / raw)
  To: Jose R. Santos; +Cc: akpm, linux-kernel, anton, dheger, slpratt

"Jose R. Santos" <jrsantos@austin.ibm.com> writes:

> Anton was nice enough to provide some graphs that show the distribution 
> before and after the patch at http://samba.org/~anton/linux/sfs/1/

Judging from the graphs (!), I don't see a real difference for
dcache. There's just one outlier (depth 11) for the old hash function,
which seems to be translated to multiple depth 9 entries. The
histograms seem to confirm, that there's at most a minimal difference,
if they'd be equally scaled.

Maybe this is due to qstr hashes, which were used by both approaches
as input?

The inode hash seems to be distributed more evenly with the new hash
function. Although the inode histogram suggests, that most buckets are
in the 0-2 depth range, with the old hash function leading slightly in
the zero depth range.

Regards, Olaf.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-05-01 12:08 ` Olaf Dietsche
@ 2004-05-01 15:08   ` Jose R. Santos
  2004-05-20 13:34     ` Raghavan
  0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-01 15:08 UTC (permalink / raw)
  To: Olaf Dietsche; +Cc: akpm, linux-kernel, anton, dheger, slpratt

* Olaf Dietsche <olaf+list.linux-kernel@olafdietsche.de> [2004-05-01 14:08:26 +0200]:
> Judging from the graphs (!), I don't see a real difference for
> dcache. There's just one outlier (depth 11) for the old hash function,
> which seems to be translated to multiple depth 9 entries. The
> histograms seem to confirm, that there's at most a minimal difference,
> if they'd be equally scaled.
> 
> Maybe this is due to qstr hashes, which were used by both approaches
> as input?

SpecSFS is not really the best benchmark to show the efficiency of the
dentry hash function.  I need to come up with a better defense for the
this hash functions.  While I did not do the study for this hash
function, mathematically speaking this hash algorithm should always
create a equal or better hash.  SFS just shows equal (well, slightly
better), so Ill work on getting some more data to back up the "better"
claim.

> The inode hash seems to be distributed more evenly with the new hash
> function. Although the inode histogram suggests, that most buckets are
> in the 0-2 depth range, with the old hash function leading slightly in
> the zero depth range.

Thats a good thing.  With the new hash function, we get 25% more bucket
usage and most of the bucket are only 1 object deep.  This buckets take
up memory so we better use them.   The old hash functions was no very
efficient in spreading the hashes across all the buckets, with the new
hash function we have 4.5 times more buckets with only 1 object deep so
it scales better as we increase the number of buckets as well.

It also provides a 3% improvement on the overall SFS number with half
the number of buckets use which I believe its a great improvement from 
just a hash algorithm change.

-JRS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-04-30 22:02       ` Andrew Morton
  2004-04-30 23:42         ` Jose R. Santos
@ 2004-05-04 13:12         ` Jose R. Santos
  2004-05-04 18:55           ` Andrew Morton
  1 sibling, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-04 13:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, anton, dheger

* Andrew Morton <akpm@osdl.org> [2004-04-30 15:02:56 -0700]:
> Also, I'd be interested in understanding what the input to the hashing
> functions looked like in this testing.  It could be that the new hash just
> happens to work well with one particular test's dataset.  Please convince
> us otherwise ;)

Andrew - Is there any workload you want me to run to show that this hash
function is going to be equal or better that the one already provided
in Linux?

Remember that my claim is not the this hash function will be better for
every IO workload.  I claim it should not have worst performance than
the default hash function but on some workloads it should perform
better.  The workloads that this should show improvements are those that
use GB of memory to store inode and dentry cache data.  I have run some
test on my old BP6 machine and other than a small improvements while
running find, I did not see any improvements but no regressions either.
Again, if you have a particular workload in mind, Ill be happy to run it
on some of my systems.

Thanks,

-JRS

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-05-04 13:12         ` Jose R. Santos
@ 2004-05-04 18:55           ` Andrew Morton
  2004-05-07 13:04             ` Jose R. Santos
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2004-05-04 18:55 UTC (permalink / raw)
  To: Jose R. Santos; +Cc: linux-kernel, anton, dheger

"Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
>
> * Andrew Morton <akpm@osdl.org> [2004-04-30 15:02:56 -0700]:
> > Also, I'd be interested in understanding what the input to the hashing
> > functions looked like in this testing.  It could be that the new hash just
> > happens to work well with one particular test's dataset.  Please convince
> > us otherwise ;)
> 
> Andrew - Is there any workload you want me to run to show that this hash
> function is going to be equal or better that the one already provided
> in Linux?

Not really - it sounds like you've covered it pretty well.  Did you try SDET?

It could be that reducing the hash table size will turn pretty much any
workload into a test of the hash quality.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-05-04 18:55           ` Andrew Morton
@ 2004-05-07 13:04             ` Jose R. Santos
  2004-05-08  1:03               ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-07 13:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jose R. Santos, linux-kernel, anton, dheger, slpratt

On 05/04/04 13:55:10, Andrew Morton wrote:
> > Andrew - Is there any workload you want me to run to show that this hash
> > function is going to be equal or better that the one already provided
> > in Linux?
> 
> Not really - it sounds like you've covered it pretty well.  Did you try SDET?
> 
> It could be that reducing the hash table size will turn pretty much any
> workload into a test of the hash quality.

Sorry for the late reply...

Steve Pratt seem to have a SDET setup already and he did me the favor of 
running SDET with a reduce dentry entry hash table size.  I belive that
his table suggest that less than 3% change is acceptable variability, but
overall he got a 5% better number using the new hash algorith.

-JRS

=========================================================================
A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs 
B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02


<6>Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
<4>Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes) 

Results:Throughput 

                                          tolerance = 0.00 + 3.00% of A
                      A            B
   Threads      Ops/sec      Ops/sec    %diff         diff    tolerance
---------- ------------ ------------ -------- ------------ ------------
         1    4341.9300    4401.9500     1.38        60.02       130.26 
         2    8242.2000    8165.1200    -0.94       -77.08       247.27 
         4   15274.4900   15257.1000    -0.11       -17.39       458.23 
         8   21326.9200   21320.7000    -0.03        -6.22       639.81 
        16   23056.2100   24282.8000     5.32      1226.59       691.69  * 
        32   23397.2500   24684.6100     5.50      1287.36       701.92  * 
        64   23372.7600   23632.6500     1.11       259.89       701.18 
       128   17009.3900   16651.9600    -2.10      -357.43       510.28 
=========================================================================

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-05-07 13:04             ` Jose R. Santos
@ 2004-05-08  1:03               ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2004-05-08  1:03 UTC (permalink / raw)
  To: Jose R. Santos
  Cc: Andrew Morton, Linux Kernel Mailing List, Anton Blanchard, dheger,
	slpratt

On Fri, 2004-05-07 at 06:04, Jose R. Santos wrote:
> On 05/04/04 13:55:10, Andrew Morton wrote:
> > > Andrew - Is there any workload you want me to run to show that this hash
> > > function is going to be equal or better that the one already provided
> > > in Linux?
> > 
> > Not really - it sounds like you've covered it pretty well.  Did you try SDET?
> > 
> > It could be that reducing the hash table size will turn pretty much any
> > workload into a test of the hash quality.
> 
> Sorry for the late reply...
> 
> Steve Pratt seem to have a SDET setup already and he did me the favor of 
> running SDET with a reduce dentry entry hash table size.  I belive that
> his table suggest that less than 3% change is acceptable variability, but
> overall he got a 5% better number using the new hash algorith.

It's usually best to keep increasing the number of SDET iterations that
you average against, at least until the averages start to become a bit
less bouncy.  Also, mounting ramfs on /tmp can _really_ help lower its
variability, probably because of gcc.  

You might be lucky enough to get some consistently good numbers that
way.

-- Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
  2004-05-01 15:08   ` Jose R. Santos
@ 2004-05-20 13:34     ` Raghavan
  0 siblings, 0 replies; 12+ messages in thread
From: Raghavan @ 2004-05-20 13:34 UTC (permalink / raw)
  To: Jose R. Santos; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

Hi Jose,

Did u try the new hashing algorithm with dcache bench?
dcachebench is focussed entirely on dcache performance.
I had measured the performance of the dcachebench on
2.6.6 kernel  with and without the new hashing algorithm
and noticed significant decrease in performance with the
new hash algorithm.  Enclosed is the mail from LKML that dicusses
this.

Also I wrote an instrumentation patch that measures the
number of clock cycles spent by the hash and lookup.
The hash time saw an increase(obviously) but I did see an
increase in the time spent in the lookup function too.
Attached is the patch I used for the instrumentation.

Thanks and Regards
Raghav

On Sat, May 01, 2004 at 10:08:57AM -0500, Jose R. Santos wrote:
> * Olaf Dietsche <olaf+list.linux-kernel@olafdietsche.de> [2004-05-01 14:08:26 +0200]:
> > Judging from the graphs (!), I don't see a real difference for
> > dcache. There's just one outlier (depth 11) for the old hash function,
> > which seems to be translated to multiple depth 9 entries. The
> > histograms seem to confirm, that there's at most a minimal difference,
> > if they'd be equally scaled.
> > 
> > Maybe this is due to qstr hashes, which were used by both approaches
> > as input?
> 
> SpecSFS is not really the best benchmark to show the efficiency of the
> dentry hash function.  I need to come up with a better defense for the
> this hash functions.  While I did not do the study for this hash
> function, mathematically speaking this hash algorithm should always
> create a equal or better hash.  SFS just shows equal (well, slightly
> better), so Ill work on getting some more data to back up the "better"
> claim.
> 
> > The inode hash seems to be distributed more evenly with the new hash
> > function. Although the inode histogram suggests, that most buckets are
> > in the 0-2 depth range, with the old hash function leading slightly in
> > the zero depth range.
> 
> Thats a good thing.  With the new hash function, we get 25% more bucket
> usage and most of the bucket are only 1 object deep.  This buckets take
> up memory so we better use them.   The old hash functions was no very
> efficient in spreading the hashes across all the buckets, with the new
> hash function we have 4.5 times more buckets with only 1 object deep so
> it scales better as we increase the number of buckets as well.
> 
> It also provides a 3% improvement on the overall SFS number with half
> the number of buckets use which I believe its a great improvement from 
> just a hash algorithm change.
> 
> -JRS
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

[-- Attachment #2: original_mail --]
[-- Type: text/plain, Size: 1856 bytes --]



On Fri, 14 May 2004, Dipankar Sarma wrote:
> > 
> > 2.6.6                       10280             110
> > 
> > 2.6.6-bk                    10862             30
> 
> > To find out if the huge performance dip between the 2.6.6
> > and 2.6.6-bk is because of the hash changes, I removed the hash patch
> > from 2.6.6-bk and applied it to 2.6.6.
> > 
> > 2.6.6-bk with old hash      10685             34
> > 
> > 2.6.6 with new hash         10496             125
> > 
> > Looks like the new hashing function has brought down the performance.
> > Also some code outside dcache.c and inode.c seems to have pushed down
> > the performance in 2.6.6-bk.
> 
> OK, I am confused. These numbers show that the new hash function
> is better.

No, look again.

			old hash		new hash
	
	2.6.6:		10280			10496
	2.6.6-bk:	10685			10862

in both cases, the new hash makes each iteration about ~200us (or whatever 
the metric is) slower.

There's something _else_ going on too, since plain 2.6.6 is so clearly 
better than the others. I don't know why - the only thing -bk does is the 
hash change and some changes to GFP_NOFS behaviour (different return value 
from shrink_dentries or whatever). Which shouldn't even trigger, I'd have 
assumed this is all cached.

NOTE! Just "simple" things like variations in I$ layout of the kernel code 
can make a pretty big difference if you're unlucky. And the new dentry 
code doesn't align the things on a D$ cache line boundary, so there could 
easily be "consistent" differences from that - just from the order of 
dentries allocated etc.

But it's interesting to note that the hash does make a difference. The old 
hash was very much optimized for simplicity (those hash-calculation 
routines are some of the hottest in the kernel). But I don't see that a 
few extra instructions should make that big of a difference.

		Linus


[-- Attachment #3: patch --]
[-- Type: text/plain, Size: 11688 bytes --]

--- linux-2.6.6.orig/fs/dcache.c	2004-05-10 08:02:01.000000000 +0530
+++ linux-2.6.6/fs/dcache.c	2004-05-20 18:07:37.000000000 +0530
@@ -21,6 +21,7 @@
 #include <linux/slab.h>
 #include <linux/init.h>
 #include <linux/smp_lock.h>
+#include <linux/hash.h>
 #include <linux/cache.h>
 #include <linux/module.h>
 #include <linux/mount.h>
@@ -40,6 +41,19 @@ EXPORT_SYMBOL(dcache_lock);
 
 static kmem_cache_t *dentry_cache; 
 
+//#define OLD_HASH
+#define DB
+
+#ifdef DB
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/e820.h>
+unsigned long  d_hash_count, d_lookup_count;
+unsigned long d_hash_misses, d_lookup_misses;
+unsigned long d_hash_overflow, d_lookup_overflow;
+unsigned long hash_array[250], lookup_array[500];
+#endif
+
 /*
  * This is the single most critical data structure when it comes
  * to the dcache: the hashtable for lookups. Somebody should try
@@ -643,24 +657,23 @@ void shrink_dcache_anon(struct hlist_hea
 }
 
 /*
- * This is called from kswapd when we think we need some more memory.
+ * Scan `nr' dentries and return the number which remain.
+ *
+ * We need to avoid reentering the filesystem if the caller is performing a
+ * GFP_NOFS allocation attempt.  One example deadlock is:
+ *
+ * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache->
+ * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->put_inode->
+ * ext2_discard_prealloc->ext2_free_blocks->lock_super->DEADLOCK.
+ *
+ * In this case we return -1 to tell the caller that we baled.
  */
 static int shrink_dcache_memory(int nr, unsigned int gfp_mask)
 {
 	if (nr) {
-		/*
-		 * Nasty deadlock avoidance.
-		 *
-	 	 * ext2_new_block->getblk->GFP->shrink_dcache_memory->
-		 * prune_dcache->prune_one_dentry->dput->dentry_iput->iput->
-		 * inode->i_sb->s_op->put_inode->ext2_discard_prealloc->
-		 * ext2_free_blocks->lock_super->DEADLOCK.
-	 	 *
-	 	 * We should make sure we don't hold the superblock lock over
-	 	 * block allocations, but for now:
-		 */
-		if (gfp_mask & __GFP_FS)
-			prune_dcache(nr);
+		if (!(gfp_mask & __GFP_FS))
+			return -1;
+		prune_dcache(nr);
 	}
 	return dentry_stat.nr_unused;
 }
@@ -795,11 +808,45 @@ struct dentry * d_alloc_root(struct inod
 	return res;
 }
 
-static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
+static inline struct hlist_head *d_hash(struct dentry *parent,
+					unsigned long hash)
 {
+#ifdef DB
+        unsigned long long t1=0,t2=0;
+        preempt_disable();
+        rdtscll(t1);
+#endif
+
+#ifdef OLD_HASH
 	hash += (unsigned long) parent / L1_CACHE_BYTES;
 	hash = hash ^ (hash >> D_HASHBITS);
+#else
+	hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+	hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
+#endif
+
+#ifndef DB
 	return dentry_hashtable + (hash & D_HASHMASK);
+#else
+        rdtscll(t2);
+        preempt_enable();
+
+        if( d_hash_count < 0xffffffff )
+        {
+                if( (t2 > t1) && ( (t2 - t1) >= 0  ) && ((t2 - t1) < 5000))
+                {
+                        unsigned int c =  ((unsigned int)(t2 - t1)) /20;
+                        hash_array[c]++;
+                }
+                else
+                        d_hash_misses++;
+                d_hash_count++;
+        }
+        else d_hash_overflow++;
+
+        return dentry_hashtable + (hash & D_HASHMASK);
+#endif
+
 }
 
 /**
@@ -970,6 +1017,12 @@ struct dentry * __d_lookup(struct dentry
 	struct dentry *found = NULL;
 	struct hlist_node *node;
 
+#ifdef DB
+        unsigned long long t1=0,t2=0;
+        preempt_disable();
+        rdtscll(t1);
+#endif
+
 	rcu_read_lock();
 	
 	hlist_for_each (node, head) { 
@@ -1023,6 +1076,25 @@ struct dentry * __d_lookup(struct dentry
 		break;
  	}
  	rcu_read_unlock();
+#ifdef DB
+        rdtscll(t2);
+        preempt_enable();
+
+        if( d_lookup_count < 0xffffffff )
+        {
+                if( (t2 > t1) && ( (t2 - t1) >= 0  ) && ((t2 - t1) < 10000))
+                {
+                        unsigned int c =  ((unsigned int)(t2 - t1)) /20;
+                        lookup_array[c]++;
+                }
+                else
+                        d_lookup_misses++;
+                d_lookup_count++;
+        }
+        else d_lookup_overflow++;
+
+#endif
+
 
  	return found;
 }
--- linux-2.6.6.orig/fs/inode.c	2004-05-10 08:03:21.000000000 +0530
+++ linux-2.6.6/fs/inode.c	2004-05-20 18:07:24.000000000 +0530
@@ -32,6 +32,20 @@
  */
 #include <linux/buffer_head.h>
 
+#define OLD_HASH
+
+#define DB
+
+#ifdef DB
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/e820.h>
+unsigned long  i_hash_count, i_lookup_count;
+unsigned long i_hash_misses, i_lookup_misses;
+unsigned long i_hash_overflow, i_lookup_overflow;
+unsigned long i_hash_array[250], i_lookup_array[500];
+#endif
+
 /*
  * New inode.c implementation.
  *
@@ -490,6 +504,12 @@ static struct inode * find_inode(struct 
 	struct hlist_node *node;
 	struct inode * inode = NULL;
 
+#ifdef DB
+        unsigned long long t1=0,t2=0;
+        preempt_disable();
+        rdtscll(t1);
+#endif
+
 repeat:
 	hlist_for_each (node, head) { 
 		inode = hlist_entry(node, struct inode, i_hash);
@@ -503,6 +523,24 @@ repeat:
 		}
 		break;
 	}
+#ifdef DB
+        rdtscll(t2);
+        preempt_enable();
+
+        if( i_lookup_count < 0xffffffff )
+        {
+                if( (t2 > t1) && ( (t2 - t1) >= 0  ) && ((t2 - t1) < 10000))
+                {
+                        unsigned int c =  ((unsigned int)(t2 - t1)) /20;
+                        i_lookup_array[c]++;
+                }
+                else
+                        i_lookup_misses++;
+                i_lookup_count++;
+        }
+        else i_lookup_overflow++;
+#endif
+
 	return node ? inode : NULL;
 }
 
@@ -515,6 +553,12 @@ static struct inode * find_inode_fast(st
 	struct hlist_node *node;
 	struct inode * inode = NULL;
 
+#ifdef DB
+        unsigned long long t1=0,t2=0;
+        preempt_disable();
+        rdtscll(t1);
+#endif
+
 repeat:
 	hlist_for_each (node, head) {
 		inode = hlist_entry(node, struct inode, i_hash);
@@ -528,6 +572,23 @@ repeat:
 		}
 		break;
 	}
+#ifdef DB
+        rdtscll(t2);
+        preempt_enable();
+
+        if( i_lookup_count < 0xffffffff )
+        {
+                if( (t2 > t1) && ( (t2 - t1) >= 0  ) && ((t2 - t1) < 10000))
+                {
+                        unsigned int c =  ((unsigned int)(t2 - t1)) /20;
+                        i_lookup_array[c]++;
+                }
+                else
+                        i_lookup_misses++;
+                i_lookup_count++;
+        }
+        else i_lookup_overflow++;
+#endif
 	return node ? inode : NULL;
 }
 
@@ -671,12 +732,46 @@ static struct inode * get_new_inode_fast
 
 static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
 {
-	unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
+	unsigned long tmp;
+
+#ifdef DB
+        unsigned long long t1=0,t2=0;
+        preempt_disable();
+        rdtscll(t1);
+#endif
+
+#ifdef OLD_HASH
+	tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
 	tmp = tmp + (tmp >> I_HASHBITS);
+#else
+	tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+			L1_CACHE_BYTES;
+	tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
+#endif
+
+#ifndef DB
+	return tmp & I_HASHMASK;
+#else
+        rdtscll(t2);
+	preempt_enable();
+
+	if( i_hash_count < 0xffffffff )
+	{
+        	if( (t2 > t1) && ( (t2 - t1) >= 0  ) && ((t2 - t1) < 5000))
+		{
+       			unsigned int c =  ((unsigned int)(t2 - t1)) /20;
+			i_hash_array[c]++;
+		}
+		else
+			i_hash_misses++;
+		i_hash_count++;
+	}
+	else i_hash_overflow++;
+
 	return tmp & I_HASHMASK;
-}
 
-/* Yeah, I know about quadratic hash. Maybe, later. */
+#endif
+}
 
 /**
  *	iunique - get a unique inode number
--- linux-2.6.6.orig/fs/proc/proc_misc.c	2004-05-10 08:02:01.000000000 +0530
+++ linux-2.6.6/fs/proc/proc_misc.c	2004-05-20 18:07:51.000000000 +0530
@@ -51,6 +51,18 @@
 #include <asm/tlb.h>
 #include <asm/div64.h>
 
+#define DB
+#ifdef DB
+extern unsigned long  d_hash_average, d_hash_count, d_lookup_average, d_lookup_count;
+extern unsigned long d_hash_misses, d_lookup_misses;
+extern unsigned long d_hash_overflow, d_lookup_overflow;
+extern unsigned long hash_array[250], lookup_array[500];
+extern unsigned long  i_hash_average, i_hash_count, i_lookup_average, i_lookup_count;
+extern unsigned long i_hash_misses, i_lookup_misses;
+extern unsigned long i_hash_overflow, i_lookup_overflow;
+extern unsigned long i_hash_array[250], i_lookup_array[500];
+#endif
+
 #define LOAD_INT(x) ((x) >> FSHIFT)
 #define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
 /*
@@ -134,6 +146,85 @@ static struct vmalloc_info get_vmalloc_i
 	return vmi;
 }
 
+#ifdef DB
+static int proc_read_dcache_hash(char *page, char **start, off_t off,
+				 int count, int *eof, void *data)
+{
+	unsigned int i=0;
+	int len=0;
+
+
+  
+	len = sprintf(page + len, "\ndcache Hash statistics: \n");
+	
+	for(i=0; i < 250; i++)
+		len += sprintf(page + len, "%lu ", hash_array[i]);
+
+	len += sprintf(page + len, " \n");
+
+	len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",d_hash_misses, d_hash_overflow);
+
+	return proc_calc_metrics(page, start, off, count, eof, len);
+
+}
+
+static int proc_read_dcache_lookup(char *page, char **start, off_t off,
+				 int count, int *eof, void *data)
+{
+	unsigned int i=0;
+	int len=0;
+	
+	len += sprintf(page + len, "\ndcache lookup statistics: \n");
+
+	for(i=0; i < 500; i++)
+		len += sprintf(page + len, "%lu ", lookup_array[i]);
+
+	len += sprintf(page + len, " \n");
+
+	len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",d_lookup_misses, d_lookup_overflow);
+
+	return proc_calc_metrics(page, start, off, count, eof, len);
+}
+	
+
+static int proc_read_inode_hash(char *page, char **start, off_t off,
+				 int count, int *eof, void *data)
+{
+	unsigned int i=0;
+	int len=0;
+
+	len = sprintf(page + len, "\ninode Hash statistics: \n");
+
+	for(i=0; i < 250; i++)
+		len += sprintf(page + len, "%lu ", i_hash_array[i]);
+
+	len += sprintf(page + len, " \n");
+
+	len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",i_hash_misses, i_hash_overflow);
+
+	return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+static int proc_read_inode_lookup(char *page, char **start, off_t off,
+				 int count, int *eof, void *data)
+{
+	unsigned int i=0;
+	int len=0;
+
+	len = sprintf(page + len, "\ninode lookup statistics: \n");
+
+	for(i=0; i < 500; i++)
+		len += sprintf(page + len, "%lu ", i_lookup_array[i]);
+
+	len += sprintf(page + len, " \n");
+
+	len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",i_lookup_misses, i_lookup_overflow);
+
+	return proc_calc_metrics(page, start, off, count, eof, len);
+}
+#endif
+
+
 static int uptime_read_proc(char *page, char **start, off_t off,
 				 int count, int *eof, void *data)
 {
@@ -150,6 +241,7 @@ static int uptime_read_proc(char *page, 
 			(unsigned long) idle.tv_sec,
 			(idle.tv_nsec / (NSEC_PER_SEC / 100)));
 
+
 	return proc_calc_metrics(page, start, off, count, eof, len);
 }
 
@@ -651,6 +743,7 @@ static void create_seq_entry(char *name,
 		entry->proc_fops = f;
 }
 
+
 void __init proc_misc_init(void)
 {
 	struct proc_dir_entry *entry;
@@ -675,6 +768,12 @@ void __init proc_misc_init(void)
 		{"rtc",		ds1286_read_proc},
 #endif
 		{"locks",	locks_read_proc},
+#ifdef DB
+		{"dcache_hash",	proc_read_dcache_hash},
+		{"dcache_lookup",	proc_read_dcache_lookup},
+		{"inode_hash",	proc_read_inode_hash},
+		{"inode_lookup",	proc_read_inode_lookup},
+#endif
 		{"execdomains",	execdomains_read_proc},
 		{NULL,}
 	};

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-05-20 13:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20040430191539.GC14271@rx8.ibm.com>
     [not found] ` <20040430131832.45be6956.akpm@osdl.org>
2004-04-30 20:57   ` [PATCH] dentry and inode cache hash algorithm performance changes Jose R. Santos
2004-04-30 21:33     ` Jose R. Santos
2004-04-30 22:02       ` Andrew Morton
2004-04-30 23:42         ` Jose R. Santos
2004-05-04 13:12         ` Jose R. Santos
2004-05-04 18:55           ` Andrew Morton
2004-05-07 13:04             ` Jose R. Santos
2004-05-08  1:03               ` Dave Hansen
2004-04-30 19:55 Jose R. Santos
2004-05-01 12:08 ` Olaf Dietsche
2004-05-01 15:08   ` Jose R. Santos
2004-05-20 13:34     ` Raghavan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox