linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
To: Andreas Dilger <adilger@clusterfs.com>
Cc: Ihar `Philips` Filipau <thephilips@gmail.com>,
	Bryan Henderson <hbryan@us.ibm.com>,
	Josef Sipek <jsipek@fsl.cs.sunysb.edu>,
	avishay <atraeger@cs.sunysb.edu>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Allocation strategy - dynamic zone for small files
Date: Mon, 13 Nov 2006 20:19:43 -0600	[thread overview]
Message-ID: <1163470783.24187.11.camel@kleikamp.austin.ibm.com> (raw)
In-Reply-To: <20061113235749.GK6012@schatzie.adilger.int>

On Mon, 2006-11-13 at 16:57 -0700, Andreas Dilger wrote:
> On Nov 14, 2006  00:32 +0100, Ihar `Philips` Filipau wrote:
> > As person throwing in the idea, I feel bit responsible. So here go my
> > results from my primitive script (bear with my bashism) on my plain
> > Debian/unstable with 123k files on 10GB partition with ext3, default
> > 8K block.
> > 
> > Script to count small files:
> > -+-
> > #!/bin/bash
> > find / -xdev 2>/dev/null | wc -l
> > find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc 
> > -l
> > find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | 
> > wc -l
> > -+-
> > First line to find all files on root fs, second to find all files with
> > sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain
> > on same fs to exclude proc/sys/tmp and so on)
> > 
> > And on my system counts are:
> > -+-
> > 107313
> > 8302
> > 2618
> > -+-
> > 
> > This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 
> > bytes)
> > 
> > [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range
> > 129-512 bytes) - small files are better half of whole /etc. ]
> 
> Note that using the root filesystem is a skewed result (esp. on GTK systems
> where lots of single-valued files are used by gconf).  Many root filesystems
> using ext3 are formatted with 1kB blocks for this reason.  Also gather stats
> for other filesystems.
> 
> At the filesystem summit we DID find a surprising number of small files
> even when the whole system was examined.  We discussed storing small
> files directly in the inode along with other EAs (this would require
> larger inodes).  This improves data locality and performance (i.e. stat
> of the file loads the small file data into cache), though the assumption
> is that there will be an increasing number of EAs on files in the future.
> It also avoids the issues w.r.t. packing file data from different files
> into the same block and they have different lifespans, etc.

I would agree that if the focus is on files that are 128 bytes or
smaller, storing the data in the inode makes the most sense.  I don't
think it's worth the complexity to doing any kind of tail merging unless
you would expect that a large number of small files would be too big to
practically fit in the inode, but small enough that it is worth doing
something to store them efficiently.  Symbolic links have been stored
this way for a long time.

-- 
David Kleikamp
IBM Linux Technology Center


  reply	other threads:[~2006-11-14  2:19 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-13 10:37 Allocation strategy - dynamic zone for small files Ihar `Philips` Filipau
2006-11-13 13:56 ` avishay
2006-11-13 17:46   ` Bryan Henderson
2006-11-13 19:38     ` Josef Sipek
2006-11-13 21:12       ` Bryan Henderson
2006-11-13 23:32         ` Ihar `Philips` Filipau
2006-11-13 23:57           ` Andreas Dilger
2006-11-14  2:19             ` Dave Kleikamp [this message]
2006-11-14 13:15               ` Jörn Engel
     [not found]                 ` <efa6f5910611140541m302201e6t4e84551b75e79611@mail.gmail.com>
2006-11-14 13:56                   ` Jörn Engel
2006-11-14 18:23                   ` Andreas Dilger
2006-11-14 15:19             ` phillip
2006-11-14 18:19               ` Andreas Dilger
2006-11-14  0:15           ` Josef Sipek
2006-11-14  0:59           ` Bryan Henderson
2006-11-14  1:02     ` Theodore Tso
2006-11-14 11:21       ` Al Boldi
2006-11-14 14:25         ` Theodore Tso
2006-11-14 15:43           ` Al Boldi
2006-11-14 15:46             ` Matthew Wilcox
2006-11-14 16:59               ` Al Boldi
2006-11-14 17:27                 ` Matthew Wilcox
2006-11-14 17:55                   ` Theodore Tso
2006-11-14 18:23                   ` Al Boldi
2006-11-14 14:30       ` phillip

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1163470783.24187.11.camel@kleikamp.austin.ibm.com \
    --to=shaggy@linux.vnet.ibm.com \
    --cc=adilger@clusterfs.com \
    --cc=atraeger@cs.sunysb.edu \
    --cc=hbryan@us.ibm.com \
    --cc=jsipek@fsl.cs.sunysb.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=thephilips@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).