From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ihar `Philips` Filipau" Subject: Re: Allocation strategy - dynamic zone for small files Date: Tue, 14 Nov 2006 00:32:07 +0100 Message-ID: References: <20061113193816.GA31700@filer.fsl.cs.sunysb.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "Josef Sipek" , avishay , linux-fsdevel@vger.kernel.org Return-path: Received: from zeus1.kernel.org ([204.152.191.4]:43676 "EHLO zeus1.kernel.org") by vger.kernel.org with ESMTP id S933165AbWKMXdq (ORCPT ); Mon, 13 Nov 2006 18:33:46 -0500 Received: from py-out-1112.google.com (py-out-1112.google.com [64.233.166.176]) by zeus1.kernel.org (8.13.7/8.13.1) with ESMTP id kADNXV3R030391 for ; Mon, 13 Nov 2006 23:33:43 GMT Received: by py-out-1112.google.com with SMTP id a25so812288pyi for ; Mon, 13 Nov 2006 15:32:31 -0800 (PST) To: "Bryan Henderson" In-Reply-To: Content-Disposition: inline Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 11/13/06, Bryan Henderson wrote: > > > >Good point. But wouldn't the page cache suffer regardless? (You can't split > >up pages between files, AFAIK.) > > Yeah, you're right, if we're talking about granularity finer than the page > size. But furthermore, as long as we're just talking about techniques to > reduce internal fragmentation in the disk allocations, there's no reason > either the cache usage or the data transfer traffic has to be affected > (the fact that a whole block is allocated doesn't mean you have to read or > cache the whole block). > > But head movement and rotational latency are worth considering. If you > As person throwing in the idea, I feel bit responsible. So here go my results from my primitive script (bear with my bashism) on my plain Debian/unstable with 123k files on 10GB partition with ext3, default 8K block. Script to count small files: -+- #!/bin/bash find / -xdev 2>/dev/null | wc -l find / -xdev -\( $(seq -f '-size %gc -o' 1 63) -false -\) 2>/dev/null | wc -l find / -xdev -\( $(seq -f '-size %gc -o' 64 128) -false -\) 2>/dev/null | wc -l -+- First line to find all files on root fs, second to find all files with sizes 1-63 bytes, third - 64-128. (Param '-xdev' tells find to remain on same fs to exclude proc/sys/tmp and so on) And on my system counts are: -+- 107313 8302 2618 -+- This is 10.1% of all files - are small files under 128 bytes. (7.7% < 63 bytes) [ Results for /etc: 1712, 666, 143 (+ 221 file of size in range 129-512 bytes) - small files are better half of whole /etc. ] [ In fact, the optimization of small blocks is widely used in network equipment: many intelligent devices can use several packet queues to send ingress packets to RAM - sorted by size. One device I programmed driver for allowed to have four queues with recommended sizes: 32, 128, 512, 2048 - the sizes allowing to suck in RAM lots of small/medium packets (normally used for control - ICMP, TCP's ACK/SYN, etc) w/o depleting all buffers (normally used for data traffic). I have posted the link here because I was bit surprised that somebody tries to apply similar idea to file systems. ] Most important outcome of the optimization might be that future FSs wouldn't be afraid to set cluster size higher than it is accepted now: e.g. standard 4/8/16K now - but with small file (+ tail) optimization ramp it to 32/64/128K. -- Don't walk behind me, I may not lead. Don't walk in front of me, I may not follow. Just walk beside me and be my friend. -- Albert Camus (attributed to)