linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@clusterfs.com>
To: Ihar `Philips` Filipau <thephilips@gmail.com>
Cc: Bryan Henderson <hbryan@us.ibm.com>,
	Josef Sipek <jsipek@fsl.cs.sunysb.edu>,
	avishay <atraeger@cs.sunysb.edu>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Allocation strategy - dynamic zone for small files
Date: Mon, 13 Nov 2006 16:57:50 -0700	[thread overview]
Message-ID: <20061113235749.GK6012@schatzie.adilger.int> (raw)
In-Reply-To: <efa6f5910611131532j6cb43a3apf0ce7326a4271d3d@mail.gmail.com>

On Nov 14, 2006  00:32 +0100, Ihar `Philips` Filipau wrote:
> As person throwing in the idea, I feel bit responsible. So here go my
> results from my primitive script (bear with my bashism) on my plain
> Debian/unstable with 123k files on 10GB partition with ext3, default
> 8K block.
> 
> Script to count small files:
> -+-
> #!/bin/bash
> find / -xdev 2>/dev/null | wc -l
> find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc 
> -l
> find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | 
> wc -l
> -+-
> First line to find all files on root fs, second to find all files with
> sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain
> on same fs to exclude proc/sys/tmp and so on)
> 
> And on my system counts are:
> -+-
> 107313
> 8302
> 2618
> -+-
> 
> This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 
> bytes)
> 
> [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range
> 129-512 bytes) - small files are better half of whole /etc. ]

Note that using the root filesystem is a skewed result (esp. on GTK systems
where lots of single-valued files are used by gconf).  Many root filesystems
using ext3 are formatted with 1kB blocks for this reason.  Also gather stats
for other filesystems.

At the filesystem summit we DID find a surprising number of small files
even when the whole system was examined.  We discussed storing small
files directly in the inode along with other EAs (this would require
larger inodes).  This improves data locality and performance (i.e. stat
of the file loads the small file data into cache), though the assumption
is that there will be an increasing number of EAs on files in the future.
It also avoids the issues w.r.t. packing file data from different files
into the same block and they have different lifespans, etc.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


  reply	other threads:[~2006-11-13 23:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-13 10:37 Allocation strategy - dynamic zone for small files Ihar `Philips` Filipau
2006-11-13 13:56 ` avishay
2006-11-13 17:46   ` Bryan Henderson
2006-11-13 19:38     ` Josef Sipek
2006-11-13 21:12       ` Bryan Henderson
2006-11-13 23:32         ` Ihar `Philips` Filipau
2006-11-13 23:57           ` Andreas Dilger [this message]
2006-11-14  2:19             ` Dave Kleikamp
2006-11-14 13:15               ` Jörn Engel
     [not found]                 ` <efa6f5910611140541m302201e6t4e84551b75e79611@mail.gmail.com>
2006-11-14 13:56                   ` Jörn Engel
2006-11-14 18:23                   ` Andreas Dilger
2006-11-14 15:19             ` phillip
2006-11-14 18:19               ` Andreas Dilger
2006-11-14  0:15           ` Josef Sipek
2006-11-14  0:59           ` Bryan Henderson
2006-11-14  1:02     ` Theodore Tso
2006-11-14 11:21       ` Al Boldi
2006-11-14 14:25         ` Theodore Tso
2006-11-14 15:43           ` Al Boldi
2006-11-14 15:46             ` Matthew Wilcox
2006-11-14 16:59               ` Al Boldi
2006-11-14 17:27                 ` Matthew Wilcox
2006-11-14 17:55                   ` Theodore Tso
2006-11-14 18:23                   ` Al Boldi
2006-11-14 14:30       ` phillip

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061113235749.GK6012@schatzie.adilger.int \
    --to=adilger@clusterfs.com \
    --cc=atraeger@cs.sunysb.edu \
    --cc=hbryan@us.ibm.com \
    --cc=jsipek@fsl.cs.sunysb.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=thephilips@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).