From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Mon, 18 Feb 2008 15:12:25 -0800 (PST) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.168.28]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with ESMTP id m1INCMmr002811 for ; Mon, 18 Feb 2008 15:12:22 -0800 Received: from ishtar.tlinx.org (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 03801E55C6F for ; Mon, 18 Feb 2008 15:12:47 -0800 (PST) Received: from ishtar.tlinx.org (ishtar.tlinx.org [64.81.245.74]) by cuda.sgi.com with ESMTP id OB7BXMvUE4tvJUY7 for ; Mon, 18 Feb 2008 15:12:47 -0800 (PST) Message-ID: <47BA10EC.3090004@tlinx.org> Date: Mon, 18 Feb 2008 15:12:44 -0800 From: Linda Walsh MIME-Version: 1.0 Subject: Re: tuning, many small files, small blocksize References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Jeff Breidenbach Cc: xfs@oss.sgi.com Jeff Breidenbach wrote: > I'm testing xfs for use in storing 100 million+ small files > (roughly 4 to 10KB each) and some directories will contain > tens of thousands of files. There will be a lot of random > reading, and also some random writing, and very little > deletion. > > I am setting up the xfs partition now, and have only played > with blocksize so far. 512 byte blocks are most space efficient, > 1024 byte blocks cost 3.3% additional space, and 4096 byte > blocks cost 22.3% additional space. > > a) Should I just go with the 512 byte blocksize or is that going to be > bad for some performance reason? Going to 1024 is no problem, > > b) Are there any other mkfs.xfs paramters that I should play with. If your minimum file size is 4KB and max is 10KB, a blocksize of 2K might be give you a reasonable compaction level. Might also play with the inode size. I *usually* go with 1K-inode+4k block, but with a 2k block, I'm slightly torn between 512-byte inodes and 1K inodes, but I can't think of a _great_ reason to ever go with the default 256-byte inode size, since that size seems like it will always cause the inode to be shared with another, possibly unrelated file. Remember, in xfs, if the last bit of left-over data in an inode will fit into the inode, it can save a block-allocation, though I don't know how this will affect speed. Space-wise, a 2k block size and 1k-inode size might be good, but don't know how that would affect performance. My concern about 512-byte blocks is, in general (don't recall having used such a small block size on xfs), smaller blocks can lead to greater fragmentation, but xfs is better than the average 'fs' in laying out files. While 'xfs_fsr' is good about keeping files linear, it didn't used to work on directories -- and if you have 10's of thousands of files/directory...that might trigger some more directory fragmentation. Dunno. After you write the many small files, will you be appending to them? As for benchmarks, their's always the standard 'bonnie' and 'bonnie++'. I don't know how they compare to iozone though -- I'm not familiar with that benchmark. I'm sure you are familiar with mount options noatime,nodiratime -- same concepts, but dir's are split out. Someone else mentioned using logbsize=256k in the mount options. My manpages may be dated, but they "claim", that valid sizes are 16k and 32k. Also, it depends on the situation, but sometimes flattening out the directory structure can speed up lookup time. Sometime back someone did some benchmarks involving log size and it seemed that 32768b(4k) or ~128Meg seemed optimal if memory serves me correctly. Good luck...