From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Kleikamp Subject: Re: Allocation strategy - dynamic zone for small files Date: Mon, 13 Nov 2006 20:19:43 -0600 Message-ID: <1163470783.24187.11.camel@kleikamp.austin.ibm.com> References: <20061113193816.GA31700@filer.fsl.cs.sunysb.edu> <20061113235749.GK6012@schatzie.adilger.int> Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Cc: Ihar `Philips` Filipau , Bryan Henderson , Josef Sipek , avishay , linux-fsdevel@vger.kernel.org Return-path: Received: from e5.ny.us.ibm.com ([32.97.182.145]:45458 "EHLO e5.ny.us.ibm.com") by vger.kernel.org with ESMTP id S933321AbWKNCTx (ORCPT ); Mon, 13 Nov 2006 21:19:53 -0500 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.13.8/8.12.11) with ESMTP id kAE2JqRU025082 for ; Mon, 13 Nov 2006 21:19:52 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.6/8.13.6/NCO v8.1.1) with ESMTP id kAE2JqZl304124 for ; Mon, 13 Nov 2006 21:19:52 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id kAE2Jpb3023994 for ; Mon, 13 Nov 2006 21:19:52 -0500 To: Andreas Dilger In-Reply-To: <20061113235749.GK6012@schatzie.adilger.int> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 2006-11-13 at 16:57 -0700, Andreas Dilger wrote: > On Nov 14, 2006 00:32 +0100, Ihar `Philips` Filipau wrote: > > As person throwing in the idea, I feel bit responsible. So here go my > > results from my primitive script (bear with my bashism) on my plain > > Debian/unstable with 123k files on 10GB partition with ext3, default > > 8K block. > > > > Script to count small files: > > -+- > > #!/bin/bash > > find / -xdev 2>/dev/null | wc -l > > find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc > > -l > > find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | > > wc -l > > -+- > > First line to find all files on root fs, second to find all files with > > sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain > > on same fs to exclude proc/sys/tmp and so on) > > > > And on my system counts are: > > -+- > > 107313 > > 8302 > > 2618 > > -+- > > > > This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 > > bytes) > > > > [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range > > 129-512 bytes) - small files are better half of whole /etc. ] > > Note that using the root filesystem is a skewed result (esp. on GTK systems > where lots of single-valued files are used by gconf). Many root filesystems > using ext3 are formatted with 1kB blocks for this reason. Also gather stats > for other filesystems. > > At the filesystem summit we DID find a surprising number of small files > even when the whole system was examined. We discussed storing small > files directly in the inode along with other EAs (this would require > larger inodes). This improves data locality and performance (i.e. stat > of the file loads the small file data into cache), though the assumption > is that there will be an increasing number of EAs on files in the future. > It also avoids the issues w.r.t. packing file data from different files > into the same block and they have different lifespans, etc. I would agree that if the focus is on files that are 128 bytes or smaller, storing the data in the inode makes the most sense. I don't think it's worth the complexity to doing any kind of tail merging unless you would expect that a large number of small files would be too big to practically fit in the inode, but small enough that it is worth doing something to store them efficiently. Symbolic links have been stored this way for a long time. -- David Kleikamp IBM Linux Technology Center