public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* RAID60/mdadm/xfs performance tuning
@ 2011-12-05 18:50 Paul Anderson
  2011-12-05 22:48 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Paul Anderson @ 2011-12-05 18:50 UTC (permalink / raw)
  To: xfs-oss

I've set up an software RAID-60 array composed of 7 software RAID6's,
each with 32k chunks, 18 devices total (16 data, 2 parity), and in
theory appropriate setup parameters according to a nice white paper
written by Christoph and presented this last summer at LinuxCon.

My question is, if the mdraid and XFS are all configured properly,
would I expect to see any read operations when doing a write-only
test?  I would have assumed that I would not, since XFS should write
stripe-aligned sets of data, and in theory nothing needs to be read
(no read-modify-write going on, I would think).

The performance is great, but I'm wondering if I need to keep looking.

Thanks,

Paul Anderson

Here's the details for kernel 2.6.38.5:

mdadm --detail /dev/md0  (md1, md2, md3, md4, md5, and md6 all the same)
/dev/md0:
        Version : 01.02
  Creation Time : Fri Dec  2 14:54:23 2011
     Raid Level : raid6
     Array Size : 31256214528 (29808.25 GiB 32006.36 GB)
  Used Dev Size : 3907026816 (3726.03 GiB 4000.80 GB)
   Raid Devices : 18
  Total Devices : 18
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Dec  5 13:38:52 2011
          State : clean
 Active Devices : 18
Working Devices : 18
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 32K

/dev/md8 is the RAID0 that concatenates the above RAID6's, making a
single RAID60:

 mdadm --detail /dev/md8
/dev/md8:
        Version : 01.02
  Creation Time : Fri Dec  2 14:55:36 2011
     Raid Level : raid0
     Array Size : 218793480192 (208657.73 GiB 224044.52 GB)
   Raid Devices : 7
  Total Devices : 7
Preferred Minor : 8
    Persistence : Superblock is persistent

    Update Time : Fri Dec  2 14:55:36 2011
          State : clean
 Active Devices : 7
Working Devices : 7
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 4096K (this is what the RAID0 container thinks, but
I ignore it for xfs)

xfs_info /exports/
meta-data=/dev/md8               isize=256    agcount=204, agsize=268435448 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=54698370048, imaxpct=1
         =                       sunit=8      swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=521728, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

I made the filesystem like this:
mkfs.xfs -L $(hostname) -l su=32768 -d su=32768,sw=128 /dev/md8

mount options: inode64,largeio,swalloc,delaylog,logbsize=256k,logbufs=8,noatime,nodiratime

I intended to make it with an external log, but forgot.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RAID60/mdadm/xfs performance tuning
  2011-12-05 18:50 RAID60/mdadm/xfs performance tuning Paul Anderson
@ 2011-12-05 22:48 ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2011-12-05 22:48 UTC (permalink / raw)
  To: Paul Anderson; +Cc: xfs-oss

On Mon, Dec 05, 2011 at 01:50:58PM -0500, Paul Anderson wrote:
> I've set up an software RAID-60 array composed of 7 software RAID6's,
> each with 32k chunks, 18 devices total (16 data, 2 parity), and in
> theory appropriate setup parameters according to a nice white paper
> written by Christoph and presented this last summer at LinuxCon.
> 
> My question is, if the mdraid and XFS are all configured properly,
> would I expect to see any read operations when doing a write-only
> test?  I would have assumed that I would not, since XFS should write
> stripe-aligned sets of data, and in theory nothing needs to be read
> (no read-modify-write going on, I would think).

That depends. What's your "write only" test?

> The performance is great, but I'm wondering if I need to keep looking.

If performance is great, then what's the problem?

> 
> Thanks,
> 
> Paul Anderson
> 
> Here's the details for kernel 2.6.38.5:
> 
> mdadm --detail /dev/md0  (md1, md2, md3, md4, md5, and md6 all the same)
> /dev/md0:
....
>      Chunk Size : 32K
> 
> /dev/md8 is the RAID0 that concatenates the above RAID6's, making a
> single RAID60:
> 
>  mdadm --detail /dev/md8
> /dev/md8:
....
>      Chunk Size : 4096K (this is what the RAID0 container thinks, but
> I ignore it for xfs)

You should set the RAID0 chunk size to the stripe width of the
underlying RAID6 volume (i.e. 512k).

> xfs_info /exports/
> meta-data=/dev/md8               isize=256    agcount=204, agsize=268435448 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=54698370048, imaxpct=1
>          =                       sunit=8      swidth=1024 blks

Because XFS has clearly not been configured correctly. You've given
it a stripe unit of 32k (the RAID6 chunk size), and a width of 4MB
(the RAID0 chunk size).

What you are doing is aligning allocation to individual disks in the
RAID6 volumes but the filesystem doesn't know what the stripe width
of those volumes are so can't really align correctly to the RAID6
geometry. And because it is not set up as a sunit = 128 (512k), it
can't align to the RAID0 on top of it correctly, either.

You need to align all layers of the stack to each other so the
filesystem has a consistent view of stripe unit and widths. In this
configuration, the RAID0 really needs a chunk size of 512k to match
the RAID6 stripe width. Then you can chose from two different valid
alignments for the filesytsem - align to the underlying RAID6 or to
the top level RAID0.

If you have a small file intensive workload, then aligning to the
RAID6 is probably best so that small files can pack full RAID6
stripe widths. If you have a bandwidth intensive workload, then
aligning to the RAID0 is probaly best so that large writes are
aligned to the full stripe width of the underlying RAID6 devices.

Either way, you need to understand and test your workload to improve
on whatever the default XFS settings give you.

> I made the filesystem like this:
> mkfs.xfs -L $(hostname) -l su=32768 -d su=32768,sw=128 /dev/md8
> 
> mount options: inode64,largeio,swalloc,delaylog,logbsize=256k,logbufs=8,noatime,nodiratime

Why largeio,swalloc? Have you determined that you're actually
getting hot disks in your array without it?

FWIW, delaylog and logbufs are the default so you don't need to set
them, and nodiratime is a subset of noatime, so you don't need to
specify that, either.

> I intended to make it with an external log, but forgot.

So you've determined an internal log is a performance bottleneck for
your workload?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-12-05 22:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-05 18:50 RAID60/mdadm/xfs performance tuning Paul Anderson
2011-12-05 22:48 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox