Re: fsck performance. - Rogier Wolff

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: Ted Ts'o <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: fsck performance.
Date: Wed, 23 Feb 2011 05:44:27 +0100	[thread overview]
Message-ID: <20110223044427.GM21917@bitwizard.nl> (raw)
In-Reply-To: <20110222221304.GH2924@thunk.org>

On Tue, Feb 22, 2011 at 05:13:04PM -0500, Ted Ts'o wrote:
> On Tue, Feb 22, 2011 at 09:32:28AM -0700, Andreas Dilger wrote:
> > 
> > Any idea what the hash size does to memory usage?  I wonder if we
> > can scale this based on the directory count, or if the memory usage
> > is minimal (only needed in case of tdb) then just make it the
> > default. It definitely appears to have been a major performance
> > boost.
> 
> Yeah, that was my question.  Your patch adds a magic number which
> probably works well on your machine (and I'm not really worried if
> someone has less than 1G --- here's a quarter kid, buy your self a
> real computer :-).  But I wonder if we should be using a hash size
> which is sized automatically depending on available memory or file
> system size.

I fully agree that having "magic numbers" in the code is a bad thing.
A warning sign. 

I don't agree with your argument that 1G RAM should be considered
minimal. There are storage boxes (single-disk-nas systems) out there
that run on a 300MHz ARM chip and have little RAM. Some of them use
ext[234].

For example: http://iomega.nas-central.org/wiki/Main_Page

64Mb RAM. I'm not sure wether that CPU is capable of virtual memory.

I just mentioned this one because a friend brought one into my office
last week. I don't think it happens to run Linux. On the other hand,
some of the "competition" do run Linux.

As to the "disadvantage" of using a large hash value: 

As far as I can see, the library just seeks to that position in the
tdb file. With 32-bit file offsets (which is hardcoded into tdb), that
means the penalty is 4*hash_size of extra disk space. So with my
currently suggested value that comes to 4Mb.

As my current tdb database amounts to 1.5Gb I think the cost is
acceptable.

With the number of keys up to 1 million, we can expect a speedup of
1M/131 = about 7500. Above that we won't gain much anymore. 

This is assuming that we have a next-to-perfect hash function. In fact
we don't because I see about a 30% hash bucket usage. And I surely
think my fsck has used over 1M of the keys....

I just tested the hash function: I hashed the first 10 million numbers
and got 91962 unique results (out of a possible 99931). That's only
about 10%. That's a lot worse than what e2fsck is seeing. And this is
the simplest case to get right.

Here is my test program. 

#include <stdio.h>
#include <stdlib.h>
typedef unsigned int u32;

/* This is based on the hash algorithm from gdbm */
static unsigned int default_tdb_hash(unsigned char *key)
{
        u32 value;      /* Used to compute the hash value.  */
        u32   i;        /* Used to cycle through random values. */

        /* Set the initial value from the key size. */
        for (value = 0x238F13AF * 4, i=0; i < 4; i++)
                value = (value + (key[i] << (i*5 % 24)));

        return (1103515243 * value + 12345);
}

int main (int argc, char **argv)
{
  int i; 
  int max = 1000000; 

  if (argc > 1) max = atoi (argv[1]);
  for (i=0;i < max;i++) {
    printf ("%u %u\n", i, default_tdb_hash ((unsigned char *)&i) % 99931);
  }
  exit (0);
}

and here is the commandline I used to watch the results.

./a.out 10000000 | awk '{print $2}' | sort | uniq -c |sort -n  | less

It seems my "prime generator" program is wrong too. I had thought to
choose a prime with 99931, but apparently it's not prime. (13*7687).
Which, for hashing should not be too bad, but I'll go look for a
prime and check again. Ok. Hash bucket usage shot up: 16%. 

I just "designed" a new hash function, based on the "hash" page on
wikipedia.

static unsigned int my_tdb_hash(unsigned char *key)
{
        u32 value;      /* Used to compute the hash value.  */
        u32   i;        /* Used to cycle through random values. */

        /* Set the initial value from the key size. */
        for (value = 0, i=0; i < 4; i++)
                value = value * 256 + key[i] + (value >> 24) * 241;

        return value;
}

It behaves MUCH better than the "default_tdb_hash" in that it has 100%
bucket usage (not almost, but exaclty 100%). It's not that hard to get
right.

The "hash" at the end (times BIGPRIME + RANDOMVALUE) in the original
is redundant. It only serves to make the results less obvious to
humans, but there is no computer-science relevant reason.

I'll shoot off an Email to the TDB guys as well. 

	Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

next prev parent reply	other threads:[~2011-02-23  4:44 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-20  9:06 fsck performance Rogier Wolff
2011-02-20 17:09 ` Ted Ts'o
2011-02-20 19:34   ` Ted Ts'o
2011-02-20 21:55     ` Rogier Wolff
2011-02-20 22:20       ` Ted Ts'o
2011-02-20 23:15         ` Rogier Wolff
2011-02-20 23:41           ` Ted Ts'o
2011-02-21 10:31             ` Amir Goldstein
2011-02-21 16:04               ` Paweł Brodacki
2011-02-21 18:00                 ` Andreas Dilger
2011-02-22 10:20                   ` Rogier Wolff
2011-02-22 13:36                     ` Rogier Wolff
2011-02-22 13:54                       ` Rogier Wolff
2011-02-22 16:32                         ` Andreas Dilger
2011-02-22 22:13                           ` Ted Ts'o
2011-02-23  4:44                             ` Rogier Wolff [this message]
2011-02-23 11:32                               ` Theodore Tso
2011-02-23 20:53                                 ` Rogier Wolff
2011-02-23 22:24                                   ` Andreas Dilger
2011-02-23 23:17                                     ` Ted Ts'o
2011-02-24  0:41                                       ` Andreas Dilger
2011-02-24  8:59                                         ` Rogier Wolff
2011-02-24  7:29                                     ` Rogier Wolff
2011-02-24  8:59                                       ` Amir Goldstein
2011-02-24  9:02                                         ` Rogier Wolff
2011-02-24  9:33                                           ` Amir Goldstein
2011-02-24 23:53                                         ` Rogier Wolff
2011-02-25  0:26                                       ` Daniel Taylor
2011-02-23  2:54                           ` Rogier Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110223044427.GM21917@bitwizard.nl \
    --to=r.e.wolff@bitwizard.nl \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.