public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Matti Aarnio <matti.aarnio@zmailer.org>
To: Peter Rabbitson <rabbit@rabbit.us>
Cc: Justin Piszcz <jpiszcz@lucidpixels.com>,
	linux-raid@vger.kernel.org, xfs@oss.sgi.com,
	Alan Piszcz <ap@solarrain.com>
Subject: Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
Date: Thu, 28 Jun 2007 12:05:51 +0300	[thread overview]
Message-ID: <20070628090551.GG4504@mea-ext.zmailer.org> (raw)
In-Reply-To: <46837056.4050306@rabbit.us>

On Thu, Jun 28, 2007 at 10:24:54AM +0200, Peter Rabbitson wrote:
> Interesting, I came up with the same results (1M chunk being superior) 
> with a completely different raid set with XFS on top:
> 
> mdadm	--create \
> 	--level=10 \
> 	--chunk=1024 \
> 	--raid-devices=4 \
> 	--layout=f3 \
> 	...
> 
> Could it be attributed to XFS itself?

Sort of..

 /dev/md4:
         Version : 00.90.03
      Raid Level : raid5
    Raid Devices : 4
   Total Devices : 4
 Preferred Minor : 4
 
  Active Devices : 4
 Working Devices : 4

          Layout : left-symmetric
      Chunk Size : 256K

This means there are 3x 256k for the user data..
Now I had to carefully tune the XFS  bsize/sunit/swidth  to match that:

 meta-data=/dev/DataDisk/lvol0    isize=256    agcount=32, agsize=7325824 blks
          =                       sectsz=512   attr=1
 data     =                       bsize=4096   blocks=234426368, imaxpct=25
          =                       sunit=64     swidth=192 blks, unwritten=1
 ...

That is, 4k * 64 = 256k,   and   64 * 3 = 192
With that, bulk writing on the file system runs without need to
read back blocks of disk-space to calculate RAID5 parity data because
the filesystem's idea of block does not align with RAID5 surface.

I do have LVM in between the MD-RAID5 and XFS, so I did also align
the LVM to that  3 * 256k.

Doing this alignment thing did boost write performance by nearly
a factor of 2 from mkfs.xfs with default parameters.


With very wide RAID5, like the original question...  I would find it
very surprising if the alignment of upper layers to MD-RAID level
would not be important there as well.

Very small continuous writing does not make good use of disk mechanism,
(seek time, rotation delay), so something in order of 128k-1024k will
speed things up -- presuming that when you are writing, you are doing
it many MB at the time.  Database transactions are a lot smaller, and
are indeed harmed by such large megachunk-IO oriented surfaces.

RAID-levels 0 and 1 (and 10)  do not have the need of reading back parts
of the surface because a subset of it was not altered by incoming write.

Some DB application on top of the filesystem would benefit if we had
a way for it to ask about these alignment boundary issues, so it could
read whole alignment block even though it writes out only a subset of it.
(Theory being that those same blocks would also exist in memory cache
and thus be available for write-back parity calculation.)


> Peter

/Matti Aarnio

  parent reply	other threads:[~2007-06-28  9:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-27 23:20 Fastest Chunk Size w/XFS For MD Software RAID = 1024k Justin Piszcz
2007-06-27 23:20 ` Justin Piszcz
2007-06-27 23:24   ` Justin Piszcz
2007-06-28  5:08   ` David Chinner
2007-06-28  7:53     ` David Greaves
2007-06-28  8:07     ` Justin Piszcz
     [not found] ` <46832E60.9000006@rabbit.us>
2007-06-28  8:07   ` Justin Piszcz
     [not found]     ` <46837056.4050306@rabbit.us>
2007-06-28  8:27       ` Justin Piszcz
2007-06-28 22:05         ` David Chinner
2007-06-28  9:05       ` Matti Aarnio [this message]
2007-06-28 13:27         ` Jon Nelson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070628090551.GG4504@mea-ext.zmailer.org \
    --to=matti.aarnio@zmailer.org \
    --cc=ap@solarrain.com \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=rabbit@rabbit.us \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox