Re: LWN.net article: creating 1 billion files -> XFS looses

From: Dave Chinner <david@fromorbit.com>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: xfs@oss.sgi.com
Subject: Re: LWN.net article: creating 1 billion files -> XFS looses
Date: Tue, 7 Sep 2010 17:01:39 +1000	[thread overview]
Message-ID: <20100907070139.GL705@dastard> (raw)
In-Reply-To: <201009070820.08354@zmi.at>

On Tue, Sep 07, 2010 at 08:20:07AM +0200, Michael Monnerie wrote:
> On Dienstag, 7. September 2010 Dave Chinner wrote:
> > # mkfs.xfs -n size=64k
> > (-n = naming = directories. -d = data != directories)
> 
> Thank you, Dave. Do I interpret that parameter right:
> 
> When a new directory is created, per default it would occupy only 4KB, 
> with -n size=64k would be reserved.

No, it allocates 64k blocks for the directory instead of 4k blocks.

> As the directory fills, space within 
> that block will be used, so in the default case after 4KB (how many 
> inodes would that be roughly? 256 Bytes/Inode, so 16 entries?) XFS would 
> reserve the next block, but in your case 256 entries would fit.

Inodes are not stored in the dirctory structure, only the directory
entry name and the inode number. Hence the amount of space used by a
directory entry is determined by the length of the name.

> That would keep dir fragmentation lower, and with todays disks, take a 
> minimal more space, so it sounds very good to use that option. 
> Especially with RAIDs, where stripes usually are 64KB or bigger. Or 
> would the waste of space be so big that it could hurt?

Well, there is extra overhead to allocate large directory blocks (16
pages instead of one, to begin with, then there's the vmap overhead,
etc), so for small directories smaller block sizes are faster for
create and unlink operations.

For empty directorys, operations on 4k block sized directories
consume roughly 50% less CPU that 64k block size directories. The
4k block size directoeies consume less CPU out to roughly 1.5
million entries where the two are roughly equal. At directory sizes
of 10 million entries, 64k directory block operations are consuming
about 15% of the CPU that 4k directory block operations consume.

In terms of lookups, the 64k block directory will take less IO but
consume more CPU for a given lookup. Hence it depends on your IO
latency and whether directory readahead can hide that latency as to
which will be faster. e.g. For SSDs, CPU usage might be the limiting
factor, not the IO. Right now I don't have any numbers on what
the difference might be - I'm getting 1B inode population issues worked
out first before I start on measuring cold cache lookup times on 1B
files....

> Last question: Is there a way to set that option on a given XFS?

No, it is a mkfs time parameter, though we have been discussing the
possibility of being able to set it per-directory (at mkdir time
when no blocks have been allocated).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs