linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: "Mathias Burén" <mathias.buren@gmail.com>
Cc: daobang wang <wangdb1981@gmail.com>,
	linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RAID5 created by 8 disks works with xfs
Date: Sat, 31 Mar 2012 15:09:54 -0500	[thread overview]
Message-ID: <4F776492.4070600@hardwarefreak.com> (raw)
In-Reply-To: <CADNH=7H_rgiY4fkB0SHo0yPhSBib7Gq-E_fu18yuJx=enn-eGg@mail.gmail.com>

On 3/31/2012 2:59 AM, Mathias Burén wrote:
> On 31 March 2012 02:22, daobang wang <wangdb1981@gmail.com> wrote:
>> Hi ALL,
>>
>>    How to adjust the xfs and raid parameters to improve the total
>> performance when RAID5 created by 8 disks works with xfs, and i writed
>> a test program, which started 100 threads to write big files, 500MB
>> per file, and delete it after writing finish. Thank you very much.
>>
>> Best Wishes,
>> Daobang Wang.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Hi,
> 
> See http://xfs.org/index.php/XFS_FAQ#Q:_I_want_to_tune_my_XFS_filesystems_for_.3Csomething.3E
> . Also see http://hep.kbfi.ee/index.php/IT/KernelTuning . For example,
> RAID5 with 8 harddrives and 64K stripe size:
> 
> mkfs.xfs -d su=64k,sw=7 -l version=2,su=64k /dev/md0

This is unnecessary.  mkfs.xfs creates w/stripe alignment automatically
when the target device is an md device.

> Consider mounting the filesystem with logbufs=8,logbsize=256k

This is unnecessary for two reasons:

1.  These are the default values in recent kernels
2.  His workload is the opposite of "metadata heavy"
    logbufs and logbsize exist for metadata operations
    to the journal, they are in memory journal write buffers

The OP's stated workload is 100 streaming writes of 500MB files.  This
is not anything close to a sane, real world workload.  Writing 100 x
500MB files in parallel to 7 spindles is an exercise in stupidity, and
especially to a RAID5 array with only 7 spindles.  The OP is pushing
those drives to their seek limit of about 150 head seeks/sec without
actually writing much data, and *that* is what is ruining his
performance.  What *should* be a streaming write workload of large files
has been turned into a massively random IO pattern due mostly to the
unrealistic write thread count, and partly to disk striping and the way
XFS allocation groups are created on a striped array.

Assuming these are 2TB drives, to get much closer to ideal write
performance, and make this more of a streaming workload, what the OP
should be doing is writing no more than 8 files in parallel to at least
8 different directories with XFS sitting on an md linear array of 4 md
RAID1 devices, assuming he needs protection from drive failure *and*
parallel write performance:

$ mdadm -C /dev/md0 -l 1 -n 2 /dev/sd[ab]
$ mdadm -C /dev/md1 -l 1 -n 2 /dev/sd[cd]
$ mdadm -C /dev/md2 -l 1 -n 2 /dev/sd[ef]
$ mdadm -C /dev/md3 -l 1 -n 2 /dev/sd[gh]
$ mdadm -C /dev/md4 -l linear -n 4 /dev/md[0-3]
$ mkfs.xfs -d agcount=8 /dev/md4

and mount with the inode64 option in fstab so we get the inode64
allocator, which spreads the metadata across all of the AGs instead of
stuffing in all in the first AG and yields other benefits.

This setup eliminates striping, tons of head seeks, and gets much closer
to pure streaming write performance.  Writing 8 files in parallel to 8
directories will cause XFS to put each file in a different allocation
group.  Since we created 8 AGs, this means we'll have 2 files being
written to each disk in parallel.  This reduces time wasted in head seek
latency by an order of magnitude and will dramatically increase disk
throughput in MB/s compared to the 100 files in parallel workload, which
again is simply stupid to do on this limited disk hardware.

This 100 file parallel write workload needs about 6 times as many
spindles to be realistic, configured as a linear array of 24 RAID1
devices and formatted with 48 AGs.  This would give you ~4 write streams
per drive, 2 per AG, or somewhere around 50% to 66% of the per drive
performance compared to the 8 drive 8 thread scenario I recommended above.

Final note:  It is simply not possible to optimize XFS nor mdraid to get
you any better performance when writing 100 x 500MB files in parallel.
The lack of sufficient spindles is the problem.

-- 
Stan


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-03-31 20:09 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-31  1:22 RAID5 created by 8 disks works with xfs daobang wang
2012-03-31  7:59 ` Mathias Burén
2012-03-31 20:09   ` Stan Hoeppner [this message]
2012-04-01  1:16     ` daobang wang
2012-04-01  2:05       ` daobang wang
2012-04-01  5:13         ` Stan Hoeppner
2012-04-01  3:51       ` Stan Hoeppner
2012-04-01  5:12         ` daobang wang
2012-04-01  5:40           ` Stan Hoeppner
2012-04-01  5:59             ` daobang wang
2012-04-01  6:20               ` daobang wang
2012-04-01  7:08                 ` Marcus Sorensen
2012-04-02  3:47                   ` Stan Hoeppner
2012-04-05  0:48                     ` daobang wang
     [not found]                       ` <CACwgYDOtCoVF-p+KKqPYxHhA4vWF78Ueecx9hcVWLoyxFWzV9Q@mail.gmail.com>
2012-04-05 21:01                         ` Stan Hoeppner
2012-04-06  0:25                           ` daobang wang
2012-04-06  2:33                             ` daobang wang
2012-04-06  6:00                               ` Jack Wang
2012-04-06  6:45                                 ` daobang wang
2012-04-06  6:49                                   ` daobang wang
2012-04-06  8:18                                     ` Stan Hoeppner
2012-04-06  8:45                                       ` daobang wang
2012-04-06 11:12                                         ` Stan Hoeppner
2012-04-18  2:23                                           ` daobang wang
2012-04-02  3:12                 ` Stan Hoeppner
2012-04-01 10:33             ` David Brown
2012-04-01 12:28               ` John Robinson
2012-04-02  6:59                 ` David Brown
     [not found]                 ` <CA+res+QkLi7sxZrD-XOcbR47CeJ5gADf7P6pa1w1oMf8CKSB4g@mail.gmail.com>
2012-04-02  8:01                   ` John Robinson
2012-04-02 10:01                     ` Jack Wang
2012-04-02 10:28                       ` John Robinson
2012-04-02 20:41                         ` Stan Hoeppner
2012-04-02  5:43               ` Stan Hoeppner
2012-04-02  7:04                 ` David Brown
2012-04-02 20:21                   ` Stan Hoeppner
2012-04-01  4:52       ` Stan Hoeppner
2012-04-01  8:06         ` John Robinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F776492.4070600@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mathias.buren@gmail.com \
    --cc=wangdb1981@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).