All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Eric Sandeen <sandeen@redhat.com>
Subject: Re: [PATCH, RFC] ext4: New inode/block allocation algorithms for flex_bg filesystems
Date: Thu, 26 Feb 2009 17:15:19 -0700	[thread overview]
Message-ID: <20090227001519.GO3199@webber.adilger.int> (raw)
In-Reply-To: <20090226182156.GL7227@mit.edu>

On Feb 26, 2009  13:21 -0500, Theodore Ts'o wrote:
> So any improvements in mkdirs_mark would require special-case hacks
> such as treating a zero-length directory inode as a synthetic empty
> inode, and not actually trying to allocate the directory block until
> the first time a file is created in the directory.  But that would be
> a file system format change that would probably only really be useful
> for better benchmark results --- how common are file systems with
> hundreds of thousands of empty directories, after all?

Actually, I often reference some online statistics for HPC storage:
http://www.pdsi-scidac.org/cgi-bin/fsstats-list.cgi

and while one would think in HPC filesystems there are lots of huge
directories, the below stats show that a huge majority of DIRECTORIES
are only containing a few entries.  That said, a large percentage of
the FILES are in larger directories, but that doesn't change the fact
that there are a large number of directories with very few entries.

Stats from the filesystem (incorrectly marked ext3, but really Lustre):
http://www.pdsi-scidac.org/fsstats/approved/PNNL-Oct102007-233TB-ext3-EvanFelix_nwfs.out

directory size:
count=888082 average=14.936094
min=0 max=57114
entries:  dirs  dir pct  cumulative entries ents pct cum. ents
[ 0- 1]: 127934 (14.41%) (14.41%)     86753 ( 0.65%) ( 0.65%)
[ 2- 3]: 126204 (14.21%) (28.62%)    305501 ( 2.30%) ( 2.96%)
[ 4- 7]: 268058 (30.18%) (58.80%)   1314419 ( 9.91%) (12.87%)
[ 8-15]: 228065 (25.68%) (84.48%)   2449552 (18.47%) (31.33%)
[16-31]:  88365 ( 9.95%) (94.43%)   1965719 (14.82%) (46.15%)
[32-63]:  30436 ( 3.43%) (97.86%)   1355962 (10.22%) (56.38%)

filename length:
count=13264476 average=21.981972
min=1 max=232
chars:   files   file pct cumulative bytes    byte pct cum. bytes
[ 0- 7]: 1557016 (11.74%) (11.74%)    7772274 ( 2.67%) ( 2.67%)
[ 8-15]: 4826194 (36.38%) (48.12%)   53282606 (18.27%) (20.94%)
[16-23]: 2598854 (19.59%) (67.72%)   50042818 (17.16%) (38.10%)
[24-31]: 1346382 (10.15%) (77.87%)   36152231 (12.40%) (50.50%)
[32-39]:  572299 ( 4.31%) (82.18%)   20691279 ( 7.10%) (57.60%)
[40-47]:  873408 ( 6.58%) (88.76%)   37941162 (13.01%) (70.61%)
[48-55]:  814905 ( 6.14%) (94.91%)   41733619 (14.31%) (84.92%)

Shows that we could quite easily store most (57%) of average named
files (24 chars or less) in average sized directories (15 files or
less) in 480-byte directories (including 8 bytes of dirent overhead
per name).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


  parent reply	other threads:[~2009-02-27  0:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-18 15:43 [PATCH, RFC] ext4: New inode/block allocation algorithms for flex_bg filesystems Theodore Tso
2009-02-24  8:59 ` Aneesh Kumar K.V
2009-02-24 15:27   ` Theodore Tso
2009-02-24 19:04     ` Theodore Tso
2009-02-24 22:41   ` Andreas Dilger
2009-02-25  0:57     ` Eric Sandeen
2009-02-25  0:58       ` Eric Sandeen
2009-02-25  2:50     ` Theodore Tso
2009-02-26 18:21 ` Theodore Tso
2009-02-26 18:38   ` Aneesh Kumar K.V
2009-03-30  8:48     ` Aneesh Kumar K.V
2009-02-27  0:15   ` Andreas Dilger [this message]
2009-02-27  9:17   ` Andreas Dilger
2009-02-27 15:06     ` Theodore Tso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090227001519.GO3199@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.