From: Dave Chinner <david@fromorbit.com>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: John Bokma <contact@johnbokma.com>,
Stan Hoeppner <stan@hardwarefreak.com>,
xfs@oss.sgi.com
Subject: Re: 30 TB RAID6 + XFS slow write performance
Date: Thu, 21 Jul 2011 16:48:38 +1000 [thread overview]
Message-ID: <20110721064838.GA13963@dastard> (raw)
In-Reply-To: <201107210820.01019@zmi.at>
On Thu, Jul 21, 2011 at 08:19:54AM +0200, Michael Monnerie wrote:
> On Donnerstag, 21. Juli 2011 Dave Chinner wrote:
> > No, they'll get sunit aligned but default, which would be on 64k
> > boundaries.
>
> OK, so only when <quote Dave> "swalloc mount option set and the
> allocation is for more than a swidth of space it will align to swidth
> rather than sunit" </quote Dave>.
>
> So even when I specify swalloc but a file is generated with only 4KB, it
> will very probably be sunit aligned on disk.
>
> > > That way, all stripes of a 1GB partition would be full when
> > > there are roughly 1170 files (1170*896KiB ~ 1GB). What would happen
> > > when I create other files - is XFS "full" then, or would it start
> > > using sub- stripes? If sub-stripes, would they start at su
> > > (=64KiB) distances, or at single block (e.g. 4KiB) distances?
> >
> > It starts packing files tightly into remaining free space when no
> > free aligned extents are availble for allocation in the AG.
>
> That means for above example, that 16384 x 2KiB files could be created,
> and each be sunit aligned on disk. Then all sunit start blocks are full,
> so additional files will be sub-sunit "packed", is it this?
Effectively.
> That would mean fragmentation is likely to occur from that moment, if
> there are files that grow.
If you are writing files that grow like this, then you are doing
something wrong. If the app can't do it's IO differently, then this
is exactly the reason we have userspace-controlled preallocation
interfaces.
Filesystems cannot prevent user stupidity from screwing something
up....
> And files >64KiB are immediately fragmented
> then. At this time, there are only 16384 * 2KiB = 32MiB used, which is
> 3,125% of the disk. I can't believe my numbers, are they true?
No, because most filesystems have a 4k block size. Not to mention
that fragmentation is likely to be limited to the single AG the files
in the directory belong to. i.e. even if we can't allocation a sunit
aligned chunk in an AG, we won't switch to another AG just to do
sunit aligned allocation.
> OK, this is a worst case scenario, and as you've said before, any
> filesystem can be considered full at 85% fill grade. But it's incredible
> how quickly you could fuck up a filesystem when using su/sw and writing
> small files.
Well, don't use a filesystem that is optimised for storing large
sizes, large files and high bandwidth for storing lots of small
files, then. Indeed, the point of not packing the files is so they
-don't fragemnt as they grow-. XFS is not designed to be optimal
for small filesystems or small files. In most cases it will deal
with them just fine, so in reality your concerns are mostly
unfounded...
BTW, ext3/ext4 do exactly the same thing with spreading files out
over block groups before packing them tightly when there are not
more empty block groups left....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-07-21 6:48 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-18 19:58 30 TB RAID6 + XFS slow write performance John Bokma
2011-07-19 0:00 ` Eric Sandeen
2011-07-19 8:37 ` Emmanuel Florac
2011-07-19 22:37 ` Stan Hoeppner
2011-07-20 0:20 ` Dave Chinner
2011-07-20 5:16 ` Stan Hoeppner
2011-07-20 6:44 ` Dave Chinner
2011-07-20 12:10 ` Stan Hoeppner
2011-07-20 14:04 ` Michael Monnerie
2011-07-20 23:01 ` Dave Chinner
2011-07-21 6:19 ` Michael Monnerie
2011-07-21 6:48 ` Dave Chinner [this message]
2011-07-22 6:10 ` Michael Monnerie
2011-07-22 18:05 ` Stan Hoeppner
2011-07-22 23:10 ` Dave Chinner
2011-07-24 6:14 ` Stan Hoeppner
2011-07-24 8:47 ` Michael Monnerie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721064838.GA13963@dastard \
--to=david@fromorbit.com \
--cc=contact@johnbokma.com \
--cc=michael.monnerie@is.it-management.at \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox