* [PATCH] dentry and inode cache hash algorithm performance changes.
@ 2004-04-30 19:55 Jose R. Santos
2004-05-01 12:08 ` Olaf Dietsche
0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 19:55 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, anton, dheger, slpratt
err... Seems I've send this to the wrong list.
Sending to linux-kernel this time :)
-----------------------------------------------------------------------
Hi Andrew,
Could you consider this patch for inclusion into mainline kernel? It
alleviates some issues seen with Linux when accessing millions of files
on machines with large amounts of RAM (+32GB). Both algorithms are base
on some studies that Dominique Heger was doing on hash table efficiencies
in Linux. The dentry hash table has been tested in small systems with
one internal IDE hard disk as well as in large SMP with many fiberchanel
disks. Dominique claims that in all the testing done, they did not see
one case were this has function provided worst performance and that in
most test they were seeing better performance.
The inode hash function was done by me base on Dominique's original work
and has only been stress tested with SpecSFS. It provided a 3%
improvement over the default algorithm in the SpecSFS results and speed
ups in the response time of almost all filesystem operations the benchmark
stress. With the better distribution is as also possible to reduce the
number of inode buckets for 32 million to 16 million and still get a slightly
better results.
Anton was nice enough to provide some graphs that show the distribution
before and after the patch at http://samba.org/~anton/linux/sfs/1/
Thanks
-JRS
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1582 -> 1.1583
# fs/dcache.c 1.70 -> 1.71
# fs/inode.c 1.113 -> 1.114
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 04/04/30 jsantos@rx8.austin.ibm.com 1.1583
# Hash functions changes that show improvements on SpecSFS when a large amount of files is used (+20 Million).
# --------------------------------------------
#
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c Fri Apr 30 12:14:23 2004
+++ b/fs/dcache.c Fri Apr 30 12:14:23 2004
@@ -28,6 +28,7 @@
#include <asm/uaccess.h>
#include <linux/security.h>
#include <linux/seqlock.h>
+#include <linux/hash.h>
#define DCACHE_PARANOIA 1
/* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@
static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
{
- hash += (unsigned long) parent / L1_CACHE_BYTES;
- hash = hash ^ (hash >> D_HASHBITS);
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
return dentry_hashtable + (hash & D_HASHMASK);
}
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c Fri Apr 30 12:14:23 2004
+++ b/fs/inode.c Fri Apr 30 12:14:23 2004
@@ -671,8 +671,9 @@
static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
{
- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
- tmp = tmp + (tmp >> I_HASHBITS);
+ unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
+ / L1_CACHE_BYTES);
+ tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
return tmp & I_HASHMASK;
}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
[not found] ` <20040430131832.45be6956.akpm@osdl.org>
@ 2004-04-30 20:57 ` Jose R. Santos
2004-04-30 21:33 ` Jose R. Santos
0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 20:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jose R. Santos, linux-kernel, anton, dheger
On 04/30/04 15:18:32, Andrew Morton wrote:
> "Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
> >
> > diff -Nru a/fs/inode.c b/fs/inode.c
> > --- a/fs/inode.c Fri Apr 30 12:14:23 2004
> > +++ b/fs/inode.c Fri Apr 30 12:14:23 2004
> > @@ -671,8 +671,9 @@
> >
> > static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
> > {
> > - unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
> > - tmp = tmp + (tmp >> I_HASHBITS);
> > + unsigned long tmp = (hashval + ((unsigned long) sb) ^ (GOLDEN_RATIO_PRIME + hashval)
> > + / L1_CACHE_BYTES);
> > + tmp = tmp + ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
> > return tmp & I_HASHMASK;
> > }
> >
>
> Are you sure about this? It's doing:
>
> tmp = hashval + sb ^ ((GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES)
>
> should it not be:
>
> tmp = hashval + (sb ^ (GOLDEN_RATIO_PRIME + hashval)) / L1_CACHE_BYTES
>
> ?
err... Wrote the patch to fast. It should read
tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
I screw up... I'll send a fixed patch in a while.
-JRS
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-04-30 20:57 ` [PATCH] dentry and inode cache hash algorithm performance changes Jose R. Santos
@ 2004-04-30 21:33 ` Jose R. Santos
2004-04-30 22:02 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 21:33 UTC (permalink / raw)
To: Jose R. Santos; +Cc: Andrew Morton, Jose R. Santos, linux-kernel, anton, dheger
On 04/30/04 15:57:01, Jose R. Santos wrote:
> err... Wrote the patch to fast. It should read
>
> tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
>
> I screw up... I'll send a fixed patch in a while.
Just notice I've made another error in the inode hash code.
Fixed patch (I hope) with beautification.
-JRS
diff -Nru a/fs/dcache.c b/fs/dcache.c
--- a/fs/dcache.c Fri Apr 30 16:27:41 2004
+++ b/fs/dcache.c Fri Apr 30 16:27:41 2004
@@ -28,6 +28,7 @@
#include <asm/uaccess.h>
#include <linux/security.h>
#include <linux/seqlock.h>
+#include <linux/hash.h>
#define DCACHE_PARANOIA 1
/* #define DCACHE_DEBUG 1 */
@@ -799,8 +800,8 @@
static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
{
- hash += (unsigned long) parent / L1_CACHE_BYTES;
- hash = hash ^ (hash >> D_HASHBITS);
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
return dentry_hashtable + (hash & D_HASHMASK);
}
diff -Nru a/fs/inode.c b/fs/inode.c
--- a/fs/inode.c Fri Apr 30 16:27:41 2004
+++ b/fs/inode.c Fri Apr 30 16:27:41 2004
@@ -671,12 +671,13 @@
static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
{
- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
- tmp = tmp + (tmp >> I_HASHBITS);
+ unsigned long tmp;
+
+ tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+ L1_CACHE_BYTES;
+ tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
return tmp & I_HASHMASK;
}
-
-/* Yeah, I know about quadratic hash. Maybe, later. */
/**
* iunique - get a unique inode number
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-04-30 21:33 ` Jose R. Santos
@ 2004-04-30 22:02 ` Andrew Morton
2004-04-30 23:42 ` Jose R. Santos
2004-05-04 13:12 ` Jose R. Santos
0 siblings, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2004-04-30 22:02 UTC (permalink / raw)
To: Jose R. Santos; +Cc: jrsantos, linux-kernel, anton, dheger
"Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
>
> On 04/30/04 15:57:01, Jose R. Santos wrote:
> > err... Wrote the patch to fast. It should read
> >
> > tmp = (hashval * sb) ^ (GOLDEN_RATIO_PRIME + hashval) / L1_CACHE_BYTES
> >
> > I screw up... I'll send a fixed patch in a while.
>
> Just notice I've made another error in the inode hash code.
>
> Fixed patch (I hope) with beautification.
Does this mean you need to redo the instrumentation and benchmarking? If
so, please do that and send the numbers along? There's no particular
urgency on that, but we should do it.
Also, I'd be interested in understanding what the input to the hashing
functions looked like in this testing. It could be that the new hash just
happens to work well with one particular test's dataset. Please convince
us otherwise ;)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-04-30 22:02 ` Andrew Morton
@ 2004-04-30 23:42 ` Jose R. Santos
2004-05-04 13:12 ` Jose R. Santos
1 sibling, 0 replies; 12+ messages in thread
From: Jose R. Santos @ 2004-04-30 23:42 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jose R. Santos, jrsantos, linux-kernel, anton, dheger
On 04/30/04 17:02:56, Andrew Morton wrote:
> Does this mean you need to redo the instrumentation and benchmarking? If
> so, please do that and send the numbers along? There's no particular
> urgency on that, but we should do it.
No, the screw up was made we I created the patch on a clean source
tree. This patch represents the actual data gather on those graphs.
> Also, I'd be interested in understanding what the input to the hashing
> functions looked like in this testing. It could be that the new hash just
> happens to work well with one particular test's dataset. Please convince
> us otherwise ;)
hehehe... I knew it could't be that easy. :)
For the dentry hash function, some of my other coorkers had put this
hash function through various testing and have concluded that the
hash function was equal or better than the default hash function.
These runs were done with a (hopefully to be Open Source soon)
benchmark called FFSB which can simulate various io patters across
many filesystems and variable file sizes.
SpecSFS fileset is basically a lot of small file which varies
depending on the size of the run. For a not so big SMP system
the number of file is in the +20 Million files range. Of those
20 million files only 10% are access randomly by the client. The
purpose of this is that the benchmark tries to stress not only
the NFS layer but, VM and Filesystems layers as well. The filesets
are also hundreds of gigabytes in size in order to promote disk
head movement by guaranteeing cache misses in memory. SFS 27% of
the workload are lookups __d_lookup has showing high in my
profiles.
For the inode hash the problem that I see is that when running
a benchmark with this huge fileset we end up trying to free a
lot of inode entries during the run while trying to put new
entries in cache. We end up calling ifind_fast() which calls
find_inodes_fast() held under inode_lock. In order to avoid
holding the inode_lock we needed to avoid having long chains
in that hash function.
When I took a look at the original hash function, I found it
to be a bit to simple for any workload. My solution (which I
took advantage of Dominique's work) was to create a hash
that function that could generate completely different hashes
depending on the hashval and the superblock in order to have
the hash scale as we added more filesystems to the machine.
Both of these problems can be somewhat tuned out by increasing
the number of buckets of both d and i cache but it got to a
point were I had 256MB of inode and 128MB in dentry hash buckets
on a not so large SMP. With the hash changes I have been able
to reduce the number of buckets to 128MB for inode cache and to
32MB for dentry cache and still get better performance.
If it help my case... I haven't been running this benchmark
for long, so I haven't been able to find a way to cheat. I
need to come up with generic solutions until I can find a
cheat for the benchmark. :)
-JRS
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-04-30 19:55 Jose R. Santos
@ 2004-05-01 12:08 ` Olaf Dietsche
2004-05-01 15:08 ` Jose R. Santos
0 siblings, 1 reply; 12+ messages in thread
From: Olaf Dietsche @ 2004-05-01 12:08 UTC (permalink / raw)
To: Jose R. Santos; +Cc: akpm, linux-kernel, anton, dheger, slpratt
"Jose R. Santos" <jrsantos@austin.ibm.com> writes:
> Anton was nice enough to provide some graphs that show the distribution
> before and after the patch at http://samba.org/~anton/linux/sfs/1/
Judging from the graphs (!), I don't see a real difference for
dcache. There's just one outlier (depth 11) for the old hash function,
which seems to be translated to multiple depth 9 entries. The
histograms seem to confirm, that there's at most a minimal difference,
if they'd be equally scaled.
Maybe this is due to qstr hashes, which were used by both approaches
as input?
The inode hash seems to be distributed more evenly with the new hash
function. Although the inode histogram suggests, that most buckets are
in the 0-2 depth range, with the old hash function leading slightly in
the zero depth range.
Regards, Olaf.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-05-01 12:08 ` Olaf Dietsche
@ 2004-05-01 15:08 ` Jose R. Santos
2004-05-20 13:34 ` Raghavan
0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-01 15:08 UTC (permalink / raw)
To: Olaf Dietsche; +Cc: akpm, linux-kernel, anton, dheger, slpratt
* Olaf Dietsche <olaf+list.linux-kernel@olafdietsche.de> [2004-05-01 14:08:26 +0200]:
> Judging from the graphs (!), I don't see a real difference for
> dcache. There's just one outlier (depth 11) for the old hash function,
> which seems to be translated to multiple depth 9 entries. The
> histograms seem to confirm, that there's at most a minimal difference,
> if they'd be equally scaled.
>
> Maybe this is due to qstr hashes, which were used by both approaches
> as input?
SpecSFS is not really the best benchmark to show the efficiency of the
dentry hash function. I need to come up with a better defense for the
this hash functions. While I did not do the study for this hash
function, mathematically speaking this hash algorithm should always
create a equal or better hash. SFS just shows equal (well, slightly
better), so Ill work on getting some more data to back up the "better"
claim.
> The inode hash seems to be distributed more evenly with the new hash
> function. Although the inode histogram suggests, that most buckets are
> in the 0-2 depth range, with the old hash function leading slightly in
> the zero depth range.
Thats a good thing. With the new hash function, we get 25% more bucket
usage and most of the bucket are only 1 object deep. This buckets take
up memory so we better use them. The old hash functions was no very
efficient in spreading the hashes across all the buckets, with the new
hash function we have 4.5 times more buckets with only 1 object deep so
it scales better as we increase the number of buckets as well.
It also provides a 3% improvement on the overall SFS number with half
the number of buckets use which I believe its a great improvement from
just a hash algorithm change.
-JRS
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-04-30 22:02 ` Andrew Morton
2004-04-30 23:42 ` Jose R. Santos
@ 2004-05-04 13:12 ` Jose R. Santos
2004-05-04 18:55 ` Andrew Morton
1 sibling, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-04 13:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel, anton, dheger
* Andrew Morton <akpm@osdl.org> [2004-04-30 15:02:56 -0700]:
> Also, I'd be interested in understanding what the input to the hashing
> functions looked like in this testing. It could be that the new hash just
> happens to work well with one particular test's dataset. Please convince
> us otherwise ;)
Andrew - Is there any workload you want me to run to show that this hash
function is going to be equal or better that the one already provided
in Linux?
Remember that my claim is not the this hash function will be better for
every IO workload. I claim it should not have worst performance than
the default hash function but on some workloads it should perform
better. The workloads that this should show improvements are those that
use GB of memory to store inode and dentry cache data. I have run some
test on my old BP6 machine and other than a small improvements while
running find, I did not see any improvements but no regressions either.
Again, if you have a particular workload in mind, Ill be happy to run it
on some of my systems.
Thanks,
-JRS
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-05-04 13:12 ` Jose R. Santos
@ 2004-05-04 18:55 ` Andrew Morton
2004-05-07 13:04 ` Jose R. Santos
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2004-05-04 18:55 UTC (permalink / raw)
To: Jose R. Santos; +Cc: linux-kernel, anton, dheger
"Jose R. Santos" <jrsantos@austin.ibm.com> wrote:
>
> * Andrew Morton <akpm@osdl.org> [2004-04-30 15:02:56 -0700]:
> > Also, I'd be interested in understanding what the input to the hashing
> > functions looked like in this testing. It could be that the new hash just
> > happens to work well with one particular test's dataset. Please convince
> > us otherwise ;)
>
> Andrew - Is there any workload you want me to run to show that this hash
> function is going to be equal or better that the one already provided
> in Linux?
Not really - it sounds like you've covered it pretty well. Did you try SDET?
It could be that reducing the hash table size will turn pretty much any
workload into a test of the hash quality.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-05-04 18:55 ` Andrew Morton
@ 2004-05-07 13:04 ` Jose R. Santos
2004-05-08 1:03 ` Dave Hansen
0 siblings, 1 reply; 12+ messages in thread
From: Jose R. Santos @ 2004-05-07 13:04 UTC (permalink / raw)
To: Andrew Morton; +Cc: Jose R. Santos, linux-kernel, anton, dheger, slpratt
On 05/04/04 13:55:10, Andrew Morton wrote:
> > Andrew - Is there any workload you want me to run to show that this hash
> > function is going to be equal or better that the one already provided
> > in Linux?
>
> Not really - it sounds like you've covered it pretty well. Did you try SDET?
>
> It could be that reducing the hash table size will turn pretty much any
> workload into a test of the hash quality.
Sorry for the late reply...
Steve Pratt seem to have a SDET setup already and he did me the favor of
running SDET with a reduce dentry entry hash table size. I belive that
his table suggest that less than 3% change is acceptable variability, but
overall he got a 5% better number using the new hash algorith.
-JRS
=========================================================================
A) x4408way1.sdet.2.6.5100000-8p.04-05-05_12.08.44 vs
B) x4408way1.sdet.2.6.5+hash-100000-8p.04-05-05_11.48.02
<6>Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
<4>Inode-cache hash table entries: 1048576 (order: 10, 4194304 bytes)
Results:Throughput
tolerance = 0.00 + 3.00% of A
A B
Threads Ops/sec Ops/sec %diff diff tolerance
---------- ------------ ------------ -------- ------------ ------------
1 4341.9300 4401.9500 1.38 60.02 130.26
2 8242.2000 8165.1200 -0.94 -77.08 247.27
4 15274.4900 15257.1000 -0.11 -17.39 458.23
8 21326.9200 21320.7000 -0.03 -6.22 639.81
16 23056.2100 24282.8000 5.32 1226.59 691.69 *
32 23397.2500 24684.6100 5.50 1287.36 701.92 *
64 23372.7600 23632.6500 1.11 259.89 701.18
128 17009.3900 16651.9600 -2.10 -357.43 510.28
=========================================================================
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-05-07 13:04 ` Jose R. Santos
@ 2004-05-08 1:03 ` Dave Hansen
0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2004-05-08 1:03 UTC (permalink / raw)
To: Jose R. Santos
Cc: Andrew Morton, Linux Kernel Mailing List, Anton Blanchard, dheger,
slpratt
On Fri, 2004-05-07 at 06:04, Jose R. Santos wrote:
> On 05/04/04 13:55:10, Andrew Morton wrote:
> > > Andrew - Is there any workload you want me to run to show that this hash
> > > function is going to be equal or better that the one already provided
> > > in Linux?
> >
> > Not really - it sounds like you've covered it pretty well. Did you try SDET?
> >
> > It could be that reducing the hash table size will turn pretty much any
> > workload into a test of the hash quality.
>
> Sorry for the late reply...
>
> Steve Pratt seem to have a SDET setup already and he did me the favor of
> running SDET with a reduce dentry entry hash table size. I belive that
> his table suggest that less than 3% change is acceptable variability, but
> overall he got a 5% better number using the new hash algorith.
It's usually best to keep increasing the number of SDET iterations that
you average against, at least until the averages start to become a bit
less bouncy. Also, mounting ramfs on /tmp can _really_ help lower its
variability, probably because of gcc.
You might be lucky enough to get some consistently good numbers that
way.
-- Dave
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] dentry and inode cache hash algorithm performance changes.
2004-05-01 15:08 ` Jose R. Santos
@ 2004-05-20 13:34 ` Raghavan
0 siblings, 0 replies; 12+ messages in thread
From: Raghavan @ 2004-05-20 13:34 UTC (permalink / raw)
To: Jose R. Santos; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]
Hi Jose,
Did u try the new hashing algorithm with dcache bench?
dcachebench is focussed entirely on dcache performance.
I had measured the performance of the dcachebench on
2.6.6 kernel with and without the new hashing algorithm
and noticed significant decrease in performance with the
new hash algorithm. Enclosed is the mail from LKML that dicusses
this.
Also I wrote an instrumentation patch that measures the
number of clock cycles spent by the hash and lookup.
The hash time saw an increase(obviously) but I did see an
increase in the time spent in the lookup function too.
Attached is the patch I used for the instrumentation.
Thanks and Regards
Raghav
On Sat, May 01, 2004 at 10:08:57AM -0500, Jose R. Santos wrote:
> * Olaf Dietsche <olaf+list.linux-kernel@olafdietsche.de> [2004-05-01 14:08:26 +0200]:
> > Judging from the graphs (!), I don't see a real difference for
> > dcache. There's just one outlier (depth 11) for the old hash function,
> > which seems to be translated to multiple depth 9 entries. The
> > histograms seem to confirm, that there's at most a minimal difference,
> > if they'd be equally scaled.
> >
> > Maybe this is due to qstr hashes, which were used by both approaches
> > as input?
>
> SpecSFS is not really the best benchmark to show the efficiency of the
> dentry hash function. I need to come up with a better defense for the
> this hash functions. While I did not do the study for this hash
> function, mathematically speaking this hash algorithm should always
> create a equal or better hash. SFS just shows equal (well, slightly
> better), so Ill work on getting some more data to back up the "better"
> claim.
>
> > The inode hash seems to be distributed more evenly with the new hash
> > function. Although the inode histogram suggests, that most buckets are
> > in the 0-2 depth range, with the old hash function leading slightly in
> > the zero depth range.
>
> Thats a good thing. With the new hash function, we get 25% more bucket
> usage and most of the bucket are only 1 object deep. This buckets take
> up memory so we better use them. The old hash functions was no very
> efficient in spreading the hashes across all the buckets, with the new
> hash function we have 4.5 times more buckets with only 1 object deep so
> it scales better as we increase the number of buckets as well.
>
> It also provides a 3% improvement on the overall SFS number with half
> the number of buckets use which I believe its a great improvement from
> just a hash algorithm change.
>
> -JRS
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
[-- Attachment #2: original_mail --]
[-- Type: text/plain, Size: 1856 bytes --]
On Fri, 14 May 2004, Dipankar Sarma wrote:
> >
> > 2.6.6 10280 110
> >
> > 2.6.6-bk 10862 30
>
> > To find out if the huge performance dip between the 2.6.6
> > and 2.6.6-bk is because of the hash changes, I removed the hash patch
> > from 2.6.6-bk and applied it to 2.6.6.
> >
> > 2.6.6-bk with old hash 10685 34
> >
> > 2.6.6 with new hash 10496 125
> >
> > Looks like the new hashing function has brought down the performance.
> > Also some code outside dcache.c and inode.c seems to have pushed down
> > the performance in 2.6.6-bk.
>
> OK, I am confused. These numbers show that the new hash function
> is better.
No, look again.
old hash new hash
2.6.6: 10280 10496
2.6.6-bk: 10685 10862
in both cases, the new hash makes each iteration about ~200us (or whatever
the metric is) slower.
There's something _else_ going on too, since plain 2.6.6 is so clearly
better than the others. I don't know why - the only thing -bk does is the
hash change and some changes to GFP_NOFS behaviour (different return value
from shrink_dentries or whatever). Which shouldn't even trigger, I'd have
assumed this is all cached.
NOTE! Just "simple" things like variations in I$ layout of the kernel code
can make a pretty big difference if you're unlucky. And the new dentry
code doesn't align the things on a D$ cache line boundary, so there could
easily be "consistent" differences from that - just from the order of
dentries allocated etc.
But it's interesting to note that the hash does make a difference. The old
hash was very much optimized for simplicity (those hash-calculation
routines are some of the hottest in the kernel). But I don't see that a
few extra instructions should make that big of a difference.
Linus
[-- Attachment #3: patch --]
[-- Type: text/plain, Size: 11688 bytes --]
--- linux-2.6.6.orig/fs/dcache.c 2004-05-10 08:02:01.000000000 +0530
+++ linux-2.6.6/fs/dcache.c 2004-05-20 18:07:37.000000000 +0530
@@ -21,6 +21,7 @@
#include <linux/slab.h>
#include <linux/init.h>
#include <linux/smp_lock.h>
+#include <linux/hash.h>
#include <linux/cache.h>
#include <linux/module.h>
#include <linux/mount.h>
@@ -40,6 +41,19 @@ EXPORT_SYMBOL(dcache_lock);
static kmem_cache_t *dentry_cache;
+//#define OLD_HASH
+#define DB
+
+#ifdef DB
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/e820.h>
+unsigned long d_hash_count, d_lookup_count;
+unsigned long d_hash_misses, d_lookup_misses;
+unsigned long d_hash_overflow, d_lookup_overflow;
+unsigned long hash_array[250], lookup_array[500];
+#endif
+
/*
* This is the single most critical data structure when it comes
* to the dcache: the hashtable for lookups. Somebody should try
@@ -643,24 +657,23 @@ void shrink_dcache_anon(struct hlist_hea
}
/*
- * This is called from kswapd when we think we need some more memory.
+ * Scan `nr' dentries and return the number which remain.
+ *
+ * We need to avoid reentering the filesystem if the caller is performing a
+ * GFP_NOFS allocation attempt. One example deadlock is:
+ *
+ * ext2_new_block->getblk->GFP->shrink_dcache_memory->prune_dcache->
+ * prune_one_dentry->dput->dentry_iput->iput->inode->i_sb->s_op->put_inode->
+ * ext2_discard_prealloc->ext2_free_blocks->lock_super->DEADLOCK.
+ *
+ * In this case we return -1 to tell the caller that we baled.
*/
static int shrink_dcache_memory(int nr, unsigned int gfp_mask)
{
if (nr) {
- /*
- * Nasty deadlock avoidance.
- *
- * ext2_new_block->getblk->GFP->shrink_dcache_memory->
- * prune_dcache->prune_one_dentry->dput->dentry_iput->iput->
- * inode->i_sb->s_op->put_inode->ext2_discard_prealloc->
- * ext2_free_blocks->lock_super->DEADLOCK.
- *
- * We should make sure we don't hold the superblock lock over
- * block allocations, but for now:
- */
- if (gfp_mask & __GFP_FS)
- prune_dcache(nr);
+ if (!(gfp_mask & __GFP_FS))
+ return -1;
+ prune_dcache(nr);
}
return dentry_stat.nr_unused;
}
@@ -795,11 +808,45 @@ struct dentry * d_alloc_root(struct inod
return res;
}
-static inline struct hlist_head * d_hash(struct dentry * parent, unsigned long hash)
+static inline struct hlist_head *d_hash(struct dentry *parent,
+ unsigned long hash)
{
+#ifdef DB
+ unsigned long long t1=0,t2=0;
+ preempt_disable();
+ rdtscll(t1);
+#endif
+
+#ifdef OLD_HASH
hash += (unsigned long) parent / L1_CACHE_BYTES;
hash = hash ^ (hash >> D_HASHBITS);
+#else
+ hash += ((unsigned long) parent ^ GOLDEN_RATIO_PRIME) / L1_CACHE_BYTES;
+ hash = hash ^ ((hash ^ GOLDEN_RATIO_PRIME) >> D_HASHBITS);
+#endif
+
+#ifndef DB
return dentry_hashtable + (hash & D_HASHMASK);
+#else
+ rdtscll(t2);
+ preempt_enable();
+
+ if( d_hash_count < 0xffffffff )
+ {
+ if( (t2 > t1) && ( (t2 - t1) >= 0 ) && ((t2 - t1) < 5000))
+ {
+ unsigned int c = ((unsigned int)(t2 - t1)) /20;
+ hash_array[c]++;
+ }
+ else
+ d_hash_misses++;
+ d_hash_count++;
+ }
+ else d_hash_overflow++;
+
+ return dentry_hashtable + (hash & D_HASHMASK);
+#endif
+
}
/**
@@ -970,6 +1017,12 @@ struct dentry * __d_lookup(struct dentry
struct dentry *found = NULL;
struct hlist_node *node;
+#ifdef DB
+ unsigned long long t1=0,t2=0;
+ preempt_disable();
+ rdtscll(t1);
+#endif
+
rcu_read_lock();
hlist_for_each (node, head) {
@@ -1023,6 +1076,25 @@ struct dentry * __d_lookup(struct dentry
break;
}
rcu_read_unlock();
+#ifdef DB
+ rdtscll(t2);
+ preempt_enable();
+
+ if( d_lookup_count < 0xffffffff )
+ {
+ if( (t2 > t1) && ( (t2 - t1) >= 0 ) && ((t2 - t1) < 10000))
+ {
+ unsigned int c = ((unsigned int)(t2 - t1)) /20;
+ lookup_array[c]++;
+ }
+ else
+ d_lookup_misses++;
+ d_lookup_count++;
+ }
+ else d_lookup_overflow++;
+
+#endif
+
return found;
}
--- linux-2.6.6.orig/fs/inode.c 2004-05-10 08:03:21.000000000 +0530
+++ linux-2.6.6/fs/inode.c 2004-05-20 18:07:24.000000000 +0530
@@ -32,6 +32,20 @@
*/
#include <linux/buffer_head.h>
+#define OLD_HASH
+
+#define DB
+
+#ifdef DB
+#include <asm/processor.h>
+#include <asm/msr.h>
+#include <asm/e820.h>
+unsigned long i_hash_count, i_lookup_count;
+unsigned long i_hash_misses, i_lookup_misses;
+unsigned long i_hash_overflow, i_lookup_overflow;
+unsigned long i_hash_array[250], i_lookup_array[500];
+#endif
+
/*
* New inode.c implementation.
*
@@ -490,6 +504,12 @@ static struct inode * find_inode(struct
struct hlist_node *node;
struct inode * inode = NULL;
+#ifdef DB
+ unsigned long long t1=0,t2=0;
+ preempt_disable();
+ rdtscll(t1);
+#endif
+
repeat:
hlist_for_each (node, head) {
inode = hlist_entry(node, struct inode, i_hash);
@@ -503,6 +523,24 @@ repeat:
}
break;
}
+#ifdef DB
+ rdtscll(t2);
+ preempt_enable();
+
+ if( i_lookup_count < 0xffffffff )
+ {
+ if( (t2 > t1) && ( (t2 - t1) >= 0 ) && ((t2 - t1) < 10000))
+ {
+ unsigned int c = ((unsigned int)(t2 - t1)) /20;
+ i_lookup_array[c]++;
+ }
+ else
+ i_lookup_misses++;
+ i_lookup_count++;
+ }
+ else i_lookup_overflow++;
+#endif
+
return node ? inode : NULL;
}
@@ -515,6 +553,12 @@ static struct inode * find_inode_fast(st
struct hlist_node *node;
struct inode * inode = NULL;
+#ifdef DB
+ unsigned long long t1=0,t2=0;
+ preempt_disable();
+ rdtscll(t1);
+#endif
+
repeat:
hlist_for_each (node, head) {
inode = hlist_entry(node, struct inode, i_hash);
@@ -528,6 +572,23 @@ repeat:
}
break;
}
+#ifdef DB
+ rdtscll(t2);
+ preempt_enable();
+
+ if( i_lookup_count < 0xffffffff )
+ {
+ if( (t2 > t1) && ( (t2 - t1) >= 0 ) && ((t2 - t1) < 10000))
+ {
+ unsigned int c = ((unsigned int)(t2 - t1)) /20;
+ i_lookup_array[c]++;
+ }
+ else
+ i_lookup_misses++;
+ i_lookup_count++;
+ }
+ else i_lookup_overflow++;
+#endif
return node ? inode : NULL;
}
@@ -671,12 +732,46 @@ static struct inode * get_new_inode_fast
static inline unsigned long hash(struct super_block *sb, unsigned long hashval)
{
- unsigned long tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
+ unsigned long tmp;
+
+#ifdef DB
+ unsigned long long t1=0,t2=0;
+ preempt_disable();
+ rdtscll(t1);
+#endif
+
+#ifdef OLD_HASH
+ tmp = hashval + ((unsigned long) sb / L1_CACHE_BYTES);
tmp = tmp + (tmp >> I_HASHBITS);
+#else
+ tmp = (hashval * (unsigned long)sb) ^ (GOLDEN_RATIO_PRIME + hashval) /
+ L1_CACHE_BYTES;
+ tmp = tmp ^ ((tmp ^ GOLDEN_RATIO_PRIME) >> I_HASHBITS);
+#endif
+
+#ifndef DB
+ return tmp & I_HASHMASK;
+#else
+ rdtscll(t2);
+ preempt_enable();
+
+ if( i_hash_count < 0xffffffff )
+ {
+ if( (t2 > t1) && ( (t2 - t1) >= 0 ) && ((t2 - t1) < 5000))
+ {
+ unsigned int c = ((unsigned int)(t2 - t1)) /20;
+ i_hash_array[c]++;
+ }
+ else
+ i_hash_misses++;
+ i_hash_count++;
+ }
+ else i_hash_overflow++;
+
return tmp & I_HASHMASK;
-}
-/* Yeah, I know about quadratic hash. Maybe, later. */
+#endif
+}
/**
* iunique - get a unique inode number
--- linux-2.6.6.orig/fs/proc/proc_misc.c 2004-05-10 08:02:01.000000000 +0530
+++ linux-2.6.6/fs/proc/proc_misc.c 2004-05-20 18:07:51.000000000 +0530
@@ -51,6 +51,18 @@
#include <asm/tlb.h>
#include <asm/div64.h>
+#define DB
+#ifdef DB
+extern unsigned long d_hash_average, d_hash_count, d_lookup_average, d_lookup_count;
+extern unsigned long d_hash_misses, d_lookup_misses;
+extern unsigned long d_hash_overflow, d_lookup_overflow;
+extern unsigned long hash_array[250], lookup_array[500];
+extern unsigned long i_hash_average, i_hash_count, i_lookup_average, i_lookup_count;
+extern unsigned long i_hash_misses, i_lookup_misses;
+extern unsigned long i_hash_overflow, i_lookup_overflow;
+extern unsigned long i_hash_array[250], i_lookup_array[500];
+#endif
+
#define LOAD_INT(x) ((x) >> FSHIFT)
#define LOAD_FRAC(x) LOAD_INT(((x) & (FIXED_1-1)) * 100)
/*
@@ -134,6 +146,85 @@ static struct vmalloc_info get_vmalloc_i
return vmi;
}
+#ifdef DB
+static int proc_read_dcache_hash(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ unsigned int i=0;
+ int len=0;
+
+
+
+ len = sprintf(page + len, "\ndcache Hash statistics: \n");
+
+ for(i=0; i < 250; i++)
+ len += sprintf(page + len, "%lu ", hash_array[i]);
+
+ len += sprintf(page + len, " \n");
+
+ len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",d_hash_misses, d_hash_overflow);
+
+ return proc_calc_metrics(page, start, off, count, eof, len);
+
+}
+
+static int proc_read_dcache_lookup(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ unsigned int i=0;
+ int len=0;
+
+ len += sprintf(page + len, "\ndcache lookup statistics: \n");
+
+ for(i=0; i < 500; i++)
+ len += sprintf(page + len, "%lu ", lookup_array[i]);
+
+ len += sprintf(page + len, " \n");
+
+ len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",d_lookup_misses, d_lookup_overflow);
+
+ return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+
+static int proc_read_inode_hash(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ unsigned int i=0;
+ int len=0;
+
+ len = sprintf(page + len, "\ninode Hash statistics: \n");
+
+ for(i=0; i < 250; i++)
+ len += sprintf(page + len, "%lu ", i_hash_array[i]);
+
+ len += sprintf(page + len, " \n");
+
+ len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",i_hash_misses, i_hash_overflow);
+
+ return proc_calc_metrics(page, start, off, count, eof, len);
+}
+
+static int proc_read_inode_lookup(char *page, char **start, off_t off,
+ int count, int *eof, void *data)
+{
+ unsigned int i=0;
+ int len=0;
+
+ len = sprintf(page + len, "\ninode lookup statistics: \n");
+
+ for(i=0; i < 500; i++)
+ len += sprintf(page + len, "%lu ", i_lookup_array[i]);
+
+ len += sprintf(page + len, " \n");
+
+ len += sprintf(page + len, "misses:%lu overflow:%lu \n \n",i_lookup_misses, i_lookup_overflow);
+
+ return proc_calc_metrics(page, start, off, count, eof, len);
+}
+#endif
+
+
static int uptime_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
@@ -150,6 +241,7 @@ static int uptime_read_proc(char *page,
(unsigned long) idle.tv_sec,
(idle.tv_nsec / (NSEC_PER_SEC / 100)));
+
return proc_calc_metrics(page, start, off, count, eof, len);
}
@@ -651,6 +743,7 @@ static void create_seq_entry(char *name,
entry->proc_fops = f;
}
+
void __init proc_misc_init(void)
{
struct proc_dir_entry *entry;
@@ -675,6 +768,12 @@ void __init proc_misc_init(void)
{"rtc", ds1286_read_proc},
#endif
{"locks", locks_read_proc},
+#ifdef DB
+ {"dcache_hash", proc_read_dcache_hash},
+ {"dcache_lookup", proc_read_dcache_lookup},
+ {"inode_hash", proc_read_inode_hash},
+ {"inode_lookup", proc_read_inode_lookup},
+#endif
{"execdomains", execdomains_read_proc},
{NULL,}
};
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-05-20 13:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20040430191539.GC14271@rx8.ibm.com>
[not found] ` <20040430131832.45be6956.akpm@osdl.org>
2004-04-30 20:57 ` [PATCH] dentry and inode cache hash algorithm performance changes Jose R. Santos
2004-04-30 21:33 ` Jose R. Santos
2004-04-30 22:02 ` Andrew Morton
2004-04-30 23:42 ` Jose R. Santos
2004-05-04 13:12 ` Jose R. Santos
2004-05-04 18:55 ` Andrew Morton
2004-05-07 13:04 ` Jose R. Santos
2004-05-08 1:03 ` Dave Hansen
2004-04-30 19:55 Jose R. Santos
2004-05-01 12:08 ` Olaf Dietsche
2004-05-01 15:08 ` Jose R. Santos
2004-05-20 13:34 ` Raghavan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox