Re: 30 TB RAID6 + XFS slow write performance

From: Dave Chinner <david@fromorbit.com>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: John Bokma <contact@johnbokma.com>,
	Stan Hoeppner <stan@hardwarefreak.com>,
	xfs@oss.sgi.com
Subject: Re: 30 TB RAID6 + XFS slow write performance
Date: Thu, 21 Jul 2011 16:48:38 +1000	[thread overview]
Message-ID: <20110721064838.GA13963@dastard> (raw)
In-Reply-To: <201107210820.01019@zmi.at>

On Thu, Jul 21, 2011 at 08:19:54AM +0200, Michael Monnerie wrote:
> On Donnerstag, 21. Juli 2011 Dave Chinner wrote:
> > No, they'll get sunit aligned but default, which would be on 64k
> > boundaries.
> 
> OK, so only when <quote Dave> "swalloc mount option set and the 
> allocation is for more than a swidth of space it will align to swidth 
> rather than sunit" </quote Dave>.
> 
> So even when I specify swalloc but a file is generated with only 4KB, it 
> will very probably be sunit aligned on disk.
>  
> > > That way, all stripes of a 1GB partition would be full when 
> > > there are roughly 1170 files (1170*896KiB ~ 1GB). What would happen
> > > when  I create other files - is XFS "full" then, or would it start
> > > using sub- stripes? If sub-stripes, would they start at su
> > > (=64KiB) distances, or at single block (e.g. 4KiB) distances?
> > 
> > It starts packing files tightly into remaining free space when no
> > free aligned extents are availble for allocation in the AG.
> 
> That means for above example, that 16384 x 2KiB files could be created, 
> and each be sunit aligned on disk. Then all sunit start blocks are full, 
> so additional files will be sub-sunit "packed", is it this?

Effectively.

> That would mean fragmentation is likely to occur from that moment, if 
> there are files that grow.

If you are writing files that grow like this, then you are doing
something wrong. If the app can't do it's IO differently, then this
is exactly the reason we have userspace-controlled preallocation
interfaces.

Filesystems cannot prevent user stupidity from screwing something
up....

> And files >64KiB are immediately fragmented 
> then. At this time, there are only 16384 * 2KiB = 32MiB used, which is 
> 3,125% of the disk. I can't believe my numbers, are they true?

No, because most filesystems have a 4k block size. Not to mention
that fragmentation is likely to be limited to the single AG the files
in the directory belong to. i.e. even if we can't allocation a sunit
aligned chunk in an AG, we won't switch to another AG just to do
sunit aligned allocation.

> OK, this is a worst case scenario, and as you've said before, any 
> filesystem can be considered full at 85% fill grade. But it's incredible 
> how quickly you could fuck up a filesystem when using su/sw and writing 
> small files.

Well, don't use a filesystem that is optimised for storing large
sizes, large files and high bandwidth for storing lots of small
files, then.  Indeed, the point of not packing the files is so they
-don't fragemnt as they grow-. XFS is not designed to be optimal
for small filesystems or small files. In most cases it will deal
with them just fine, so in reality your concerns are mostly
unfounded...

BTW, ext3/ext4 do exactly the same thing with spreading files out
over block groups before packing them tightly when there are not
more empty block groups left....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs