From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: Ted Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsck performance.
Date: Wed, 23 Feb 2011 05:44:27 +0100 [thread overview]
Message-ID: <20110223044427.GM21917@bitwizard.nl> (raw)
In-Reply-To: <20110222221304.GH2924@thunk.org>
On Tue, Feb 22, 2011 at 05:13:04PM -0500, Ted Ts'o wrote:
> On Tue, Feb 22, 2011 at 09:32:28AM -0700, Andreas Dilger wrote:
> >
> > Any idea what the hash size does to memory usage? I wonder if we
> > can scale this based on the directory count, or if the memory usage
> > is minimal (only needed in case of tdb) then just make it the
> > default. It definitely appears to have been a major performance
> > boost.
>
> Yeah, that was my question. Your patch adds a magic number which
> probably works well on your machine (and I'm not really worried if
> someone has less than 1G --- here's a quarter kid, buy your self a
> real computer :-). But I wonder if we should be using a hash size
> which is sized automatically depending on available memory or file
> system size.
I fully agree that having "magic numbers" in the code is a bad thing.
A warning sign.
I don't agree with your argument that 1G RAM should be considered
minimal. There are storage boxes (single-disk-nas systems) out there
that run on a 300MHz ARM chip and have little RAM. Some of them use
ext[234].
For example: http://iomega.nas-central.org/wiki/Main_Page
64Mb RAM. I'm not sure wether that CPU is capable of virtual memory.
I just mentioned this one because a friend brought one into my office
last week. I don't think it happens to run Linux. On the other hand,
some of the "competition" do run Linux.
As to the "disadvantage" of using a large hash value:
As far as I can see, the library just seeks to that position in the
tdb file. With 32-bit file offsets (which is hardcoded into tdb), that
means the penalty is 4*hash_size of extra disk space. So with my
currently suggested value that comes to 4Mb.
As my current tdb database amounts to 1.5Gb I think the cost is
acceptable.
With the number of keys up to 1 million, we can expect a speedup of
1M/131 = about 7500. Above that we won't gain much anymore.
This is assuming that we have a next-to-perfect hash function. In fact
we don't because I see about a 30% hash bucket usage. And I surely
think my fsck has used over 1M of the keys....
I just tested the hash function: I hashed the first 10 million numbers
and got 91962 unique results (out of a possible 99931). That's only
about 10%. That's a lot worse than what e2fsck is seeing. And this is
the simplest case to get right.
Here is my test program.
#include <stdio.h>
#include <stdlib.h>
typedef unsigned int u32;
/* This is based on the hash algorithm from gdbm */
static unsigned int default_tdb_hash(unsigned char *key)
{
u32 value; /* Used to compute the hash value. */
u32 i; /* Used to cycle through random values. */
/* Set the initial value from the key size. */
for (value = 0x238F13AF * 4, i=0; i < 4; i++)
value = (value + (key[i] << (i*5 % 24)));
return (1103515243 * value + 12345);
}
int main (int argc, char **argv)
{
int i;
int max = 1000000;
if (argc > 1) max = atoi (argv[1]);
for (i=0;i < max;i++) {
printf ("%u %u\n", i, default_tdb_hash ((unsigned char *)&i) % 99931);
}
exit (0);
}
and here is the commandline I used to watch the results.
./a.out 10000000 | awk '{print $2}' | sort | uniq -c |sort -n | less
It seems my "prime generator" program is wrong too. I had thought to
choose a prime with 99931, but apparently it's not prime. (13*7687).
Which, for hashing should not be too bad, but I'll go look for a
prime and check again. Ok. Hash bucket usage shot up: 16%.
I just "designed" a new hash function, based on the "hash" page on
wikipedia.
static unsigned int my_tdb_hash(unsigned char *key)
{
u32 value; /* Used to compute the hash value. */
u32 i; /* Used to cycle through random values. */
/* Set the initial value from the key size. */
for (value = 0, i=0; i < 4; i++)
value = value * 256 + key[i] + (value >> 24) * 241;
return value;
}
It behaves MUCH better than the "default_tdb_hash" in that it has 100%
bucket usage (not almost, but exaclty 100%). It's not that hard to get
right.
The "hash" at the end (times BIGPRIME + RANDOMVALUE) in the original
is redundant. It only serves to make the results less obvious to
humans, but there is no computer-science relevant reason.
I'll shoot off an Email to the TDB guys as well.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ
next prev parent reply other threads:[~2011-02-23 4:44 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-20 9:06 fsck performance Rogier Wolff
2011-02-20 17:09 ` Ted Ts'o
2011-02-20 19:34 ` Ted Ts'o
2011-02-20 21:55 ` Rogier Wolff
2011-02-20 22:20 ` Ted Ts'o
2011-02-20 23:15 ` Rogier Wolff
2011-02-20 23:41 ` Ted Ts'o
2011-02-21 10:31 ` Amir Goldstein
2011-02-21 16:04 ` Paweł Brodacki
2011-02-21 18:00 ` Andreas Dilger
2011-02-22 10:20 ` Rogier Wolff
2011-02-22 13:36 ` Rogier Wolff
2011-02-22 13:54 ` Rogier Wolff
2011-02-22 16:32 ` Andreas Dilger
2011-02-22 22:13 ` Ted Ts'o
2011-02-23 4:44 ` Rogier Wolff [this message]
2011-02-23 11:32 ` Theodore Tso
2011-02-23 20:53 ` Rogier Wolff
2011-02-23 22:24 ` Andreas Dilger
2011-02-23 23:17 ` Ted Ts'o
2011-02-24 0:41 ` Andreas Dilger
2011-02-24 8:59 ` Rogier Wolff
2011-02-24 7:29 ` Rogier Wolff
2011-02-24 8:59 ` Amir Goldstein
2011-02-24 9:02 ` Rogier Wolff
2011-02-24 9:33 ` Amir Goldstein
2011-02-24 23:53 ` Rogier Wolff
2011-02-25 0:26 ` Daniel Taylor
2011-02-23 2:54 ` Rogier Wolff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110223044427.GM21917@bitwizard.nl \
--to=r.e.wolff@bitwizard.nl \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).