From: Andreas Dilger <adilger@sun.com>
To: Theodore Tso <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Eric Sandeen <sandeen@redhat.com>
Subject: Re: [PATCH, RFC] ext4: New inode/block allocation algorithms for flex_bg filesystems
Date: Thu, 26 Feb 2009 17:15:19 -0700 [thread overview]
Message-ID: <20090227001519.GO3199@webber.adilger.int> (raw)
In-Reply-To: <20090226182156.GL7227@mit.edu>
On Feb 26, 2009 13:21 -0500, Theodore Ts'o wrote:
> So any improvements in mkdirs_mark would require special-case hacks
> such as treating a zero-length directory inode as a synthetic empty
> inode, and not actually trying to allocate the directory block until
> the first time a file is created in the directory. But that would be
> a file system format change that would probably only really be useful
> for better benchmark results --- how common are file systems with
> hundreds of thousands of empty directories, after all?
Actually, I often reference some online statistics for HPC storage:
http://www.pdsi-scidac.org/cgi-bin/fsstats-list.cgi
and while one would think in HPC filesystems there are lots of huge
directories, the below stats show that a huge majority of DIRECTORIES
are only containing a few entries. That said, a large percentage of
the FILES are in larger directories, but that doesn't change the fact
that there are a large number of directories with very few entries.
Stats from the filesystem (incorrectly marked ext3, but really Lustre):
http://www.pdsi-scidac.org/fsstats/approved/PNNL-Oct102007-233TB-ext3-EvanFelix_nwfs.out
directory size:
count=888082 average=14.936094
min=0 max=57114
entries: dirs dir pct cumulative entries ents pct cum. ents
[ 0- 1]: 127934 (14.41%) (14.41%) 86753 ( 0.65%) ( 0.65%)
[ 2- 3]: 126204 (14.21%) (28.62%) 305501 ( 2.30%) ( 2.96%)
[ 4- 7]: 268058 (30.18%) (58.80%) 1314419 ( 9.91%) (12.87%)
[ 8-15]: 228065 (25.68%) (84.48%) 2449552 (18.47%) (31.33%)
[16-31]: 88365 ( 9.95%) (94.43%) 1965719 (14.82%) (46.15%)
[32-63]: 30436 ( 3.43%) (97.86%) 1355962 (10.22%) (56.38%)
filename length:
count=13264476 average=21.981972
min=1 max=232
chars: files file pct cumulative bytes byte pct cum. bytes
[ 0- 7]: 1557016 (11.74%) (11.74%) 7772274 ( 2.67%) ( 2.67%)
[ 8-15]: 4826194 (36.38%) (48.12%) 53282606 (18.27%) (20.94%)
[16-23]: 2598854 (19.59%) (67.72%) 50042818 (17.16%) (38.10%)
[24-31]: 1346382 (10.15%) (77.87%) 36152231 (12.40%) (50.50%)
[32-39]: 572299 ( 4.31%) (82.18%) 20691279 ( 7.10%) (57.60%)
[40-47]: 873408 ( 6.58%) (88.76%) 37941162 (13.01%) (70.61%)
[48-55]: 814905 ( 6.14%) (94.91%) 41733619 (14.31%) (84.92%)
Shows that we could quite easily store most (57%) of average named
files (24 chars or less) in average sized directories (15 files or
less) in 480-byte directories (including 8 bytes of dirent overhead
per name).
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
next prev parent reply other threads:[~2009-02-27 0:15 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-18 15:43 [PATCH, RFC] ext4: New inode/block allocation algorithms for flex_bg filesystems Theodore Tso
2009-02-24 8:59 ` Aneesh Kumar K.V
2009-02-24 15:27 ` Theodore Tso
2009-02-24 19:04 ` Theodore Tso
2009-02-24 22:41 ` Andreas Dilger
2009-02-25 0:57 ` Eric Sandeen
2009-02-25 0:58 ` Eric Sandeen
2009-02-25 2:50 ` Theodore Tso
2009-02-26 18:21 ` Theodore Tso
2009-02-26 18:38 ` Aneesh Kumar K.V
2009-03-30 8:48 ` Aneesh Kumar K.V
2009-02-27 0:15 ` Andreas Dilger [this message]
2009-02-27 9:17 ` Andreas Dilger
2009-02-27 15:06 ` Theodore Tso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090227001519.GO3199@webber.adilger.int \
--to=adilger@sun.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=sandeen@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.