public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Linda Walsh <xfs@tlinx.org>
To: Jeff Breidenbach <jeff@jab.org>
Cc: xfs@oss.sgi.com
Subject: Re: tuning, many small files, small blocksize
Date: Mon, 18 Feb 2008 15:12:44 -0800	[thread overview]
Message-ID: <47BA10EC.3090004@tlinx.org> (raw)
In-Reply-To: <e03b90ae0802152101t2bfa4644kcca5d6329239f9ff@mail.gmail.com>

Jeff Breidenbach wrote:
> I'm testing xfs for use in storing 100 million+ small files
> (roughly 4 to 10KB each) and some directories will contain
> tens of thousands of files. There will be a lot of random
> reading, and also some random writing, and very little
> deletion. 
> 
> I am setting up the xfs partition now, and have only played
> with blocksize so far. 512 byte blocks are most space efficient,
> 1024 byte blocks cost 3.3% additional space, and 4096 byte
> blocks cost 22.3% additional space.
>
> a) Should I just go with the 512 byte blocksize or is that going to be
> bad for some performance reason? Going to 1024 is no problem,
> 
> b) Are there any other mkfs.xfs paramters that I should play with.

If your minimum file size is 4KB and max is 10KB, a blocksize of
2K might be give you a reasonable compaction level.

Might also play with the inode size.  I *usually* go with 1K-inode+4k block,
but with a 2k block, I'm slightly torn between 512-byte inodes and 1K
inodes, but I can't think of a _great_ reason to ever go with the default
256-byte inode size, since that size seems like it will always cause
the inode to be shared with another, possibly unrelated file.

Remember, in xfs, if the last bit of left-over data in an inode will fit
into the inode, it can save a block-allocation, though I don't know
how this will affect speed.

Space-wise, a 2k block size and 1k-inode size might be good, but don't
know how that would affect performance.

My concern about 512-byte blocks is, in general (don't recall having
used such a small block size on xfs), smaller blocks can lead to
greater fragmentation, but xfs is better than the average 'fs' in
laying out files.  While 'xfs_fsr' is good about keeping files
linear, it didn't used to work on directories -- and if you have
10's of thousands of files/directory...that might trigger some
more directory fragmentation.  Dunno.

After you write the many small files, will you be appending to them?

As for benchmarks, their's always the standard 'bonnie' and 'bonnie++'.
I don't know how they compare to iozone though -- I'm not familiar with
that benchmark.

I'm sure you are familiar with mount options noatime,nodiratime -- same
concepts, but dir's are split out.  Someone else mentioned using
logbsize=256k in the mount options.  My manpages may be dated, but
they "claim", that valid sizes are 16k and 32k.

Also, it depends on the situation, but sometimes flattening out the
directory structure can speed up lookup time.

Sometime back someone did some benchmarks involving log size and it seemed
that 32768b(4k) or ~128Meg seemed optimal if memory serves me correctly.

Good luck...

  parent reply	other threads:[~2008-02-18 23:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-16  5:01 tuning, many small files, small blocksize Jeff Breidenbach
2008-02-16  9:28 ` Hannes Dorbath
2008-02-16 10:24   ` Jeff Breidenbach
2008-02-16 20:30     ` Jeff Breidenbach
2008-02-19  0:48     ` Timothy Shimmin
2008-02-16 12:23 ` pg_xfs2
2008-02-18 22:53 ` David Chinner
2008-02-18 23:12 ` Linda Walsh [this message]
2008-02-18 23:51   ` David Chinner
2008-02-19  1:03     ` Linda Walsh
2008-02-19  2:49       ` David Chinner
2008-02-19  4:58         ` Jeff Breidenbach
2008-02-19  8:27           ` Peter Grandi
2008-02-19 11:44             ` Hannes Dorbath
2008-02-19 21:24               ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47BA10EC.3090004@tlinx.org \
    --to=xfs@tlinx.org \
    --cc=jeff@jab.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox