public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Daniel Phillips <phillips@bonn-fries.net>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
	Andrew Morton <akpm@zip.com.au>,
	Christopher Li <chrisl@gnuchina.org>,
	Linux-kernel <linux-kernel@vger.kernel.org>,
	ext2-devel@lists.sourceforge.net
Subject: Re: [Ext2-devel] Re: Shrinking ext3 directories
Date: Fri, 21 Jun 2002 23:53:18 -0600	[thread overview]
Message-ID: <20020622055318.GA22411@clusterfs.com> (raw)
In-Reply-To: <E17LF65-0001K4-00@starship>

On Jun 21, 2002  05:28 +0200, Daniel Phillips wrote:
> I ran a bakeoff between your new half-md4 and dx_hack_hash on Ext2.  As 
> predicted, half-md4 does produce very even bucket distributions.  For 200,000 
> creates:
> 
>    half-md4:        2872 avg bytes filled per 4k block (70%)
>    dx_hack_hash:    2853 avg bytes filled per 4k block (69%)
> 
> but guess which was faster overall?
> 
>    half-md4:        user 0.43 system 6.88 real 0:07.33 CPU 99%
>    dx_hack_hash:    user 0.43 system 6.40 real 0:06.82 CPU 100%
> 
> This is quite reproducible: dx_hack_hash is always faster by about 6%.  This 
> must be due entirely to the difference in hashing cost, since half-md4 
> produces measurably better distributions.  Now what do we do?

While I normally advocate the "cheapest" way of implementing a given
solution (and dx_hack_hash is definitely the lowest-cost hash function
we could reasonably have), I would still be inclined to go with half-MD4
for this.  A few reasons for that:
1) CPUs are getting faster all the time
2) it is a well-understood algorithm that has very good behaviour
3) it is much harder to spoof MD4 than dx_hack_hash
4) it is probably better to have the most uniform hash function we can
   find than to do lots more block split/coalesce operations, so the
   extra cost of half-MD4 may be a benefit overall

It would be interesting to re-run this test to create a few million
entries, but with periodic deletes.

Hmm, now that I think about it, split/coalesce operations are only
important on create and delete, while the hash cost is paid for each
lookup as well.  It would be interesting to see the comparison with
a test something like this (sorry, don't have a system which has both
hash functions working right now):

#!/bin/sh
DEV=/dev/hda7
TESTDIR=/mnt/tmp
date
for d in `seq -f "directory_name_%05g" 1 10000` ; do
	mkdir $TESTDIR/$d
	for f in `seq -f "file_name_%05g" 1 10000` ; do
		touch $TESTDIR/$d/$d_$f
	done
done
date
umount $TESTDIR
mount -o noatime $DEV $TESTDIR
date
for d in `seq -f "directory_name_%05g" 10000 -1 1` ; do
	for f in `seq -f "file_name_%05g" 10000 -1 1` ; do
		stat $TESTDIR/$d/$d_$f > /dev/null
	done
done
date

Having the longer filenames will put more load on the hash function,
so we will see if we are really paying a big price for the overhead,
and the stat test will remove all of the creation time and disk dirtying
and just leave us with the pure lookup costs hopefully.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


  parent reply	other threads:[~2002-06-22  5:55 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-06-18 16:08 Shrinking ext3 directories DervishD
2002-06-18 16:10 ` Austin Gonyou
2002-06-18 16:39   ` Andreas Dilger
2002-06-18 19:39     ` DervishD
2002-06-18 19:34   ` DervishD
2002-06-18 16:21 ` Padraig Brady
2002-06-18 16:54   ` David Lang
2002-06-18 19:35   ` DervishD
2002-06-18 21:50 ` Stephen C. Tweedie
2002-06-18 22:18   ` Alexander Viro
2002-06-19  9:38     ` DervishD
2002-06-19 10:37     ` Stephen C. Tweedie
2002-06-19 17:03       ` [Ext2-devel] " Christopher Li
2002-06-19 20:10         ` Stephen C. Tweedie
2002-06-19 20:34           ` Stephen C. Tweedie
2002-06-19 20:13         ` Andrew Morton
2002-06-19 22:43           ` Stephen C. Tweedie
2002-06-19 23:54             ` Stephen C. Tweedie
2002-06-21  3:28               ` Daniel Phillips
2002-06-21  7:03                 ` Helge Hafting
2002-06-21 14:02                   ` Daniel Phillips
2002-06-24  7:12                     ` Helge Hafting
2002-06-21 16:23                   ` Daniel Phillips
2002-06-21 15:06                 ` Stephen C. Tweedie
2002-07-04  4:48                   ` Daniel Phillips
2002-07-04 14:15                     ` jlnance
2002-07-05  2:11                       ` Daniel Phillips
2002-06-22  5:53                 ` Andreas Dilger [this message]
2002-06-22 20:59                   ` Daniel Phillips
2002-06-23  0:01                     ` Daniel Phillips
2002-06-23  7:57                     ` Daniel Phillips
2002-06-19 22:49         ` Daniel Phillips
2002-06-20  0:24           ` Andreas Dilger
2002-06-20  9:34           ` Stephen C. Tweedie
2002-06-20 10:18             ` Andreas Dilger
2002-06-20 13:45               ` Daniel Phillips
2002-06-21 14:54                 ` Ville Herva
2002-06-21 15:08                   ` Stephen C. Tweedie
2002-06-21 15:38                     ` Ville Herva
2002-06-21 16:15                       ` Stephen C. Tweedie
2002-06-21 18:44                         ` Ville Herva
2002-06-20 16:26               ` Bill Davidsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020622055318.GA22411@clusterfs.com \
    --to=adilger@clusterfs.com \
    --cc=akpm@zip.com.au \
    --cc=chrisl@gnuchina.org \
    --cc=ext2-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillips@bonn-fries.net \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox