public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Stefan Ring <stefanrin@gmail.com>
Cc: Linux fs XFS <xfs@oss.sgi.com>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
Date: Mon, 09 Apr 2012 18:38:04 -0500	[thread overview]
Message-ID: <4F8372DC.7030405@hardwarefreak.com> (raw)
In-Reply-To: <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@mail.gmail.com>

On 4/9/2012 6:02 AM, Stefan Ring wrote:
>> Not at all.  You can achieve this performance with the 6 300GB spindles
>> you currently have, as Christoph and I both mentioned.  You simply lose
>> one spindle of capacity, 300GB, vs your current RAID6 setup.  Make 3
>> RAID1 pairs in the p400 and concatenate them.  If the p400 can't do this
>> concat the mirror pair devices with md --linear.  Format the resulting
>> Linux block device with the following and mount with inode64.
>>
>> $ mkfs.xfs -d agcount=3 /dev/[device]
>>
>> That will give you 1 AG per spindle, 3 horizontal AGs total instead of 4
>> vertical AGs as you get with default striping setup.  This is optimal
>> for your high IOPS workload as it eliminates all 'extraneous' seeks
>> yielding a per disk access pattern nearly identical to EXT4.  And it
>> will almost certainly outrun EXT4 on your RAID6 due mostly to the
>> eliminated seeks, but also to elimination of parity calculations.
>> You've wiped the array a few times in your testing already right, so one
>> or two more test setups should be no sweat.  Give it a go.  The results
>> will be pleasantly surprising.
> 
> Well I had to move around quite a bit of data, but for the sake of
> completeness, I had to give it a try.
> 
> With a nice and tidy fresh XFS file system, performance is indeed
> impressive – about 16 sec for the same task that would take 2 min 25
> before. So that’s about 150 MB/sec, which is not great, but for many
> tiny files it would perhaps be a bit unreasonable to expect more. A

150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
appear that you're writing all these files to a single directory.  If
you're writing them fairly evenly to 3 directories or a multiple of 3,
you should see close to 450MB/s, if using mdraid linear over 3 P400
RAID1 pairs.  If this is what you're doing then something seems wrong
somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
exercise all 3 AGs thus all 3 spindles.

> simple copy of the tar onto the XFS file system yields the same linear
> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
> best these disks can do, meaning that theoretically, with 3 AGs, it
> should be able to reach 450 MB/sec under optimal conditions.

The optimal condition, again, requires writing 3 of this file to 3
directories to hit ~450MB/s, which you should get close to if using
mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
parallelism must come from manipulating usage of filesystem structures.
 I thought I explained all of this previously when I introduced the "XFS
concat" into this thread.

> I will still do a test with the free space fragmentation priming on
> the concatenated AG=3 volume, because it seems to be rather slow as
> well.

> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
> benefits in this case to justify the hassle.

If you were writing to only one directory I can understand this
sentiment.  Again, if you were writing 3 directories fairly evenly, with
the md concat, then your sentiment here should be quite different.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-04-09 23:38 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-05 18:10 XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?) Stefan Ring
2012-04-05 19:56 ` Peter Grandi
2012-04-05 22:41   ` Peter Grandi
2012-04-06 14:36   ` Peter Grandi
2012-04-06 15:37     ` Stefan Ring
2012-04-07 13:33       ` Peter Grandi
2012-04-05 21:37 ` Christoph Hellwig
2012-04-06  1:09   ` Peter Grandi
2012-04-06  8:25   ` Stefan Ring
2012-04-07 18:57     ` Martin Steigerwald
2012-04-10 14:02       ` Stefan Ring
2012-04-10 14:32         ` Joe Landman
2012-04-10 15:56           ` Stefan Ring
2012-04-10 18:13         ` Martin Steigerwald
2012-04-10 20:44         ` Stan Hoeppner
2012-04-10 21:00           ` Stefan Ring
2012-04-05 22:32 ` Roger Willcocks
2012-04-06  7:11   ` Stefan Ring
2012-04-06  8:24     ` Stefan Ring
2012-04-05 23:07 ` Peter Grandi
2012-04-06  0:13   ` Peter Grandi
2012-04-06  7:27     ` Stefan Ring
2012-04-06 23:28       ` Stan Hoeppner
2012-04-07  7:27         ` Stefan Ring
2012-04-07  8:53           ` Emmanuel Florac
2012-04-07 14:57           ` Stan Hoeppner
2012-04-09 11:02             ` Stefan Ring
2012-04-09 12:48               ` Emmanuel Florac
2012-04-09 12:53                 ` Stefan Ring
2012-04-09 13:03                   ` Emmanuel Florac
2012-04-09 23:38               ` Stan Hoeppner [this message]
2012-04-10  6:11                 ` Stefan Ring
2012-04-10 20:29                   ` Stan Hoeppner
2012-04-10 20:43                     ` Stefan Ring
2012-04-10 21:29                       ` Stan Hoeppner
2012-04-09  0:19           ` Dave Chinner
2012-04-09 11:39             ` Emmanuel Florac
2012-04-09 21:47               ` Dave Chinner
2012-04-07  8:49         ` Emmanuel Florac
2012-04-08 20:33           ` Stan Hoeppner
2012-04-08 21:45             ` Emmanuel Florac
2012-04-09  5:27               ` Stan Hoeppner
2012-04-09 12:45                 ` Emmanuel Florac
2012-04-13 19:36                   ` Stefan Ring
2012-04-14  7:32                     ` Stan Hoeppner
2012-04-14 11:30                       ` Stefan Ring
2012-04-09 14:21         ` Geoffrey Wehrman
2012-04-10 19:30           ` Stan Hoeppner
2012-04-11 22:19             ` Geoffrey Wehrman
2012-04-07 16:50       ` Peter Grandi
2012-04-07 17:10         ` Joe Landman
2012-04-08 21:42           ` Stan Hoeppner
2012-04-09  5:13             ` Stan Hoeppner
2012-04-09 11:52               ` Stefan Ring
2012-04-10  7:34                 ` Stan Hoeppner
2012-04-10 13:59                   ` Stefan Ring
2012-04-09  9:23             ` Stefan Ring
2012-04-09 23:06               ` Stan Hoeppner
2012-04-06  0:53   ` Peter Grandi
2012-04-06  7:32     ` Stefan Ring
2012-04-06  5:53   ` Stefan Ring
2012-04-06 15:35     ` Peter Grandi
2012-04-10 14:05       ` Stefan Ring
2012-04-07 19:11     ` Peter Grandi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F8372DC.7030405@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=stefanrin@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox