linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rogier Wolff <R.E.Wolff@BitWizard.nl>
To: Theodore Tso <tytso@MIT.EDU>
Cc: Rogier Wolff <R.E.Wolff@bitwizard.nl>, linux-ext4@vger.kernel.org
Subject: Re: fsck performance.
Date: Wed, 23 Feb 2011 21:53:09 +0100	[thread overview]
Message-ID: <20110223205309.GA16661@bitwizard.nl> (raw)
In-Reply-To: <DC8EC4D5-BB64-4908-985B-C6D3EDA955E3@mit.edu>

On Wed, Feb 23, 2011 at 06:32:17AM -0500, Theodore Tso wrote:
> 
> On Feb 22, 2011, at 11:44 PM, Rogier Wolff wrote:
> 
> > 
> > I'll shoot off an Email to the TDB guys as well. 
 
> I'm pretty sure this won't come as a surprise to them.  I'm using
> the last version of TDB which was licensed under the GPLv2, and they
> relicensed to GPLv3 quite a while ago.  I remember hearing they had
> added a new hash algorithm to TDB since the relicensing, but those
> newer versions aren't available to e2fsprogs....

Well then.... 

You're free to use my "new" hash function, provided it is kept under
GPLv2 and not under GPLv3.

My implementation has been a "cleanroom" implementation in that I've
only looked at the specifications and implemented it from
there. Although no external attestation is available that I have been
completely shielded from the newer GPLv3 version... 

On a slightly different note: 

A pretty good estimate of the number of inodes is available in the
superblock (tot inodes - free inodes). A good hash size would be: "a
rough estimate of the number of inodes." Two or three times more or
less doesn't matter much. CPU is cheap. I'm not sure what the
estimate for the "dircount" tdb should be.

The amount of disk space that the tdb will use is at least: 
  overhead + hash_size * 4 + numrecords * (keysize + datasize +
                                                 perrecordoverhead)

There must also be some overhead to store the size of the keys and
data as both can be variable length. By implementing the "database"
ourselves we could optimize that out. I don't think it's worth the
trouble. 

With keysize equal 4, datasize also 4 and hash_size equal to numinodes
or numrecords, we would get

 overhead + numinodes * (12 + perrecordoverhead). 

In fact, my icount database grew to about 750Mb, with only 23M inodes,
so that means that apparently the perrecordoverhead is about 20 bytes.
This is the price you pay for using a much more versatile database
than what you really need. Disk is cheap (except when checking a root
filesystem!)

So... 

-- I suggest that for the icount tdb we move to using the superblock
info as the hash size.

-- I suggest that we use our own hash function. tdb allows us to
specify our own hash function. Instead of modifying the bad tdb, we'll
just keep it intact, and pass a better (local) hash function.


Does anybody know what the "dircount" tdb database holds, and what is
an estimate for the number of elements eventually in the database?  (I
could find out myself: I have the source. But I'm lazy. I'm a
programmer you know...).


On a separate note, my filesystem finished the fsck (33 hours (*)),
and I started the backups again... :-)

	Roger. 

*) that might include an estimated 1-5 hours of "Fix <y>?" waiting.

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
**    Delftechpark 26 2628 XH  Delft, The Netherlands. KVK: 27239233    **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

  reply	other threads:[~2011-02-23 20:53 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-20  9:06 fsck performance Rogier Wolff
2011-02-20 17:09 ` Ted Ts'o
2011-02-20 19:34   ` Ted Ts'o
2011-02-20 21:55     ` Rogier Wolff
2011-02-20 22:20       ` Ted Ts'o
2011-02-20 23:15         ` Rogier Wolff
2011-02-20 23:41           ` Ted Ts'o
2011-02-21 10:31             ` Amir Goldstein
2011-02-21 16:04               ` Paweł Brodacki
2011-02-21 18:00                 ` Andreas Dilger
2011-02-22 10:20                   ` Rogier Wolff
2011-02-22 13:36                     ` Rogier Wolff
2011-02-22 13:54                       ` Rogier Wolff
2011-02-22 16:32                         ` Andreas Dilger
2011-02-22 22:13                           ` Ted Ts'o
2011-02-23  4:44                             ` Rogier Wolff
2011-02-23 11:32                               ` Theodore Tso
2011-02-23 20:53                                 ` Rogier Wolff [this message]
2011-02-23 22:24                                   ` Andreas Dilger
2011-02-23 23:17                                     ` Ted Ts'o
2011-02-24  0:41                                       ` Andreas Dilger
2011-02-24  8:59                                         ` Rogier Wolff
2011-02-24  7:29                                     ` Rogier Wolff
2011-02-24  8:59                                       ` Amir Goldstein
2011-02-24  9:02                                         ` Rogier Wolff
2011-02-24  9:33                                           ` Amir Goldstein
2011-02-24 23:53                                         ` Rogier Wolff
2011-02-25  0:26                                       ` Daniel Taylor
2011-02-23  2:54                           ` Rogier Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110223205309.GA16661@bitwizard.nl \
    --to=r.e.wolff@bitwizard.nl \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).