From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Ihar `Philips` Filipau" <thephilips@gmail.com>
Subject: Re: Allocation strategy - dynamic zone for small files
Date: Tue, 14 Nov 2006 00:32:07 +0100
Message-ID: <efa6f5910611131532j6cb43a3apf0ce7326a4271d3d@mail.gmail.com>
References: <20061113193816.GA31700@filer.fsl.cs.sunysb.edu>
	 <OF181C0544.AEFFEF29-ON88257225.0072F94B-88257225.0074828E@us.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "Josef Sipek" <jsipek@fsl.cs.sunysb.edu>,
	avishay <atraeger@cs.sunysb.edu>, linux-fsdevel@vger.kernel.org
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from zeus1.kernel.org ([204.152.191.4]:43676 "EHLO zeus1.kernel.org")
	by vger.kernel.org with ESMTP id S933165AbWKMXdq (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Mon, 13 Nov 2006 18:33:46 -0500
Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.176])
	by zeus1.kernel.org (8.13.7/8.13.1) with ESMTP id kADNXV3R030391
	for <linux-fsdevel@vger.kernel.org>; Mon, 13 Nov 2006 23:33:43 GMT
Received: by py-out-1112.google.com with SMTP id a25so812288pyi
        for <linux-fsdevel@vger.kernel.org>; Mon, 13 Nov 2006 15:32:31 -0800 (PST)
To: "Bryan Henderson" <hbryan@us.ibm.com>
In-Reply-To: <OF181C0544.AEFFEF29-ON88257225.0072F94B-88257225.0074828E@us.ibm.com>
Content-Disposition: inline
Sender: linux-fsdevel-owner@vger.kernel.org
List-Id: linux-fsdevel.vger.kernel.org

On 11/13/06, Bryan Henderson <hbryan@us.ibm.com> wrote:
> >
> >Good point. But wouldn't the page cache suffer regardless? (You can't split
> >up pages between files, AFAIK.)
>
> Yeah, you're right, if we're talking about granularity finer than the page
> size.  But furthermore, as long as we're just talking about techniques to
> reduce internal fragmentation in the disk allocations, there's no reason
> either the cache usage or the data transfer traffic has to be affected
> (the fact that a whole block is allocated doesn't mean you have to read or
> cache the whole block).
>
> But head movement and rotational latency are worth considering.  If you
>

As person throwing in the idea, I feel bit responsible. So here go my
results from my primitive script (bear with my bashism) on my plain
Debian/unstable with 123k files on 10GB partition with ext3, default
8K block.

Script to count small files:
-+-
#!/bin/bash
find / -xdev 2>/dev/null | wc -l
find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc -l
find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | wc -l
-+-
First line to find all files on root fs, second to find all files with
sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain
on same fs to exclude proc/sys/tmp and so on)

And on my system counts are:
-+-
107313
8302
2618
-+-

This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 bytes)

[ Results for /etc: 1712, 666, 143 (+ 221 file of size in range
129-512 bytes) - small files are better half of whole /etc. ]

[ In fact, the optimization of small blocks is widely used in network
equipment: many intelligent devices can use several packet queues to
send ingress packets to RAM - sorted by size. One device I programmed
driver for allowed to have four queues with recommended sizes: 32,
128, 512, 2048 - the sizes allowing to suck in RAM lots of
small/medium packets (normally used for control - ICMP, TCP's ACK/SYN,
etc) w/o depleting all buffers (normally used for data traffic). I
have posted the link here because I was bit surprised that somebody
tries to apply similar idea to file systems. ]

Most important outcome of the optimization might be that future FSs
wouldn't be afraid to set cluster size higher than it is accepted now:
e.g. standard 4/8/16K now - but with small file (+ tail) optimization
ramp it to 32/64/128K.

-- 
Don't walk behind me, I may not lead.
Don't walk in front of me, I may not follow.
Just walk beside me and be my friend.
    -- Albert Camus (attributed to)