Re: fsck performance. - Rogier Wolff

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: Ted Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsck performance.
Date: Wed, 23 Feb 2011 05:44:27 +0100	[thread overview]
Message-ID: <20110223044427.GM21917@bitwizard.nl> (raw)
In-Reply-To: <20110222221304.GH2924@thunk.org>

On Tue, Feb 22, 2011 at 05:13:04PM -0500, Ted Ts'o wrote:
> On Tue, Feb 22, 2011 at 09:32:28AM -0700, Andreas Dilger wrote:
> > 
> > Any idea what the hash size does to memory usage?  I wonder if we
> > can scale this based on the directory count, or if the memory usage
> > is minimal (only needed in case of tdb) then just make it the
> > default. It definitely appears to have been a major performance
> > boost.
> 
> Yeah, that was my question.  Your patch adds a magic number which
> probably works well on your machine (and I'm not really worried if
> someone has less than 1G --- here's a quarter kid, buy your self a
> real computer :-).  But I wonder if we should be using a hash size
> which is sized automatically depending on available memory or file
> system size.

I fully agree that having "magic numbers" in the code is a bad thing.
A warning sign. 

I don't agree with your argument that 1G RAM should be considered
minimal. There are storage boxes (single-disk-nas systems) out there
that run on a 300MHz ARM chip and have little RAM. Some of them use
ext[234].

For example: http://iomega.nas-central.org/wiki/Main_Page

64Mb RAM. I'm not sure wether that CPU is capable of virtual memory.

I just mentioned this one because a friend brought one into my office
last week. I don't think it happens to run Linux. On the other hand,
some of the "competition" do run Linux.

As to the "disadvantage" of using a large hash value: 

As far as I can see, the library just seeks to that position in the
tdb file. With 32-bit file offsets (which is hardcoded into tdb), that
means the penalty is 4*hash_size of extra disk space. So with my
currently suggested value that comes to 4Mb.

As my current tdb database amounts to 1.5Gb I think the cost is
acceptable.

With the number of keys up to 1 million, we can expect a speedup of
1M/131 = about 7500. Above that we won't gain much anymore. 

This is assuming that we have a next-to-perfect hash function. In fact
we don't because I see about a 30% hash bucket usage. And I surely
think my fsck has used over 1M of the keys....

I just tested the hash function: I hashed the first 10 million numbers
and got 91962 unique results (out of a possible 99931). That's only
about 10%. That's a lot worse than what e2fsck is seeing. And this is
the simplest case to get right.

Here is my test program. 

#include <stdio.h>
#include <stdlib.h>
typedef unsigned int u32;

/* This is based on the hash algorithm from gdbm */
static unsigned int default_tdb_hash(unsigned char *key)
{
        u32 value;      /* Used to compute the hash value.  */
        u32   i;        /* Used to cycle through random values. */

        /* Set the initial value from the key size. */
        for (value = 0x238F13AF * 4, i=0; i < 4; i++)
                value = (value + (key[i] << (i*5 % 24)));

        return (1103515243 * value + 12345);
}

int main (int argc, char **argv)
{
  int i; 
  int max = 1000000; 

  if (argc > 1) max = atoi (argv[1]);
  for (i=0;i < max;i++) {
    printf ("%u %u\n", i, default_tdb_hash ((unsigned char *)&i) % 99931);
  }
  exit (0);
}

and here is the commandline I used to watch the results.

./a.out 10000000 | awk '{print $2}' | sort | uniq -c |sort -n  | less

It seems my "prime generator" program is wrong too. I had thought to
choose a prime with 99931, but apparently it's not prime. (13*7687).
Which, for hashing should not be too bad, but I'll go look for a
prime and check again. Ok. Hash bucket usage shot up: 16%. 

I just "designed" a new hash function, based on the "hash" page on
wikipedia.

static unsigned int my_tdb_hash(unsigned char *key)
{
        u32 value;      /* Used to compute the hash value.  */
        u32   i;        /* Used to cycle through random values. */

        /* Set the initial value from the key size. */
        for (value = 0, i=0; i < 4; i++)
                value = value * 256 + key[i] + (value >> 24) * 241;

        return value;
}

It behaves MUCH better than the "default_tdb_hash" in that it has 100%
bucket usage (not almost, but exaclty 100%). It's not that hard to get
right.

The "hash" at the end (times BIGPRIME + RANDOMVALUE) in the original
is redundant. It only serves to make the results less obvious to
humans, but there is no computer-science relevant reason.

I'll shoot off an Email to the TDB guys as well. 

	Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

next prev parent reply	other threads:[~2011-02-23  4:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-20  9:06 fsck performance Rogier Wolff
2011-02-20 17:09 ` Ted Ts'o
2011-02-20 19:34   ` Ted Ts'o
2011-02-20 21:55     ` Rogier Wolff
2011-02-20 22:20       ` Ted Ts'o
2011-02-20 23:15         ` Rogier Wolff
2011-02-20 23:41           ` Ted Ts'o
2011-02-21 10:31             ` Amir Goldstein
2011-02-21 16:04               ` Paweł Brodacki
2011-02-21 18:00                 ` Andreas Dilger
2011-02-22 10:20                   ` Rogier Wolff
2011-02-22 13:36                     ` Rogier Wolff
2011-02-22 13:54                       ` Rogier Wolff
2011-02-22 16:32                         ` Andreas Dilger
2011-02-22 22:13                           ` Ted Ts'o
2011-02-23  4:44                             ` Rogier Wolff [this message]
2011-02-23 11:32                               ` Theodore Tso
2011-02-23 20:53                                 ` Rogier Wolff
2011-02-23 22:24                                   ` Andreas Dilger
2011-02-23 23:17                                     ` Ted Ts'o
2011-02-24  0:41                                       ` Andreas Dilger
2011-02-24  8:59                                         ` Rogier Wolff
2011-02-24  7:29                                     ` Rogier Wolff
2011-02-24  8:59                                       ` Amir Goldstein
2011-02-24  9:02                                         ` Rogier Wolff
2011-02-24  9:33                                           ` Amir Goldstein
2011-02-24 23:53                                         ` Rogier Wolff
2011-02-25  0:26                                       ` Daniel Taylor
2011-02-23  2:54                           ` Rogier Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110223044427.GM21917@bitwizard.nl \
    --to=r.e.wolff@bitwizard.nl \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).