linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Doug Ledford <dledford@redhat.com>
To: Leslie Rhorer <lrhorer@satx.rr.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Successful RAID 6 setup
Date: Sat, 07 Nov 2009 13:35:31 -0500	[thread overview]
Message-ID: <4AF5BDF3.8020907@redhat.com> (raw)
In-Reply-To: <20091104184049356.ZIDJ2725@cdptpa-omta04.mail.rr.com>

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]

On 11/04/2009 01:40 PM, Leslie Rhorer wrote:
>> I will preface this by saying I only need about 100MB/s out of my array
>> because I access it via a gigabit crossover cable.
> 
> 	That's certainly within the capabilities of a good setup.
> 
>> I am backing up all of my information right now (~4TB) with the
>> intention of re-creating this array with a larger chunk size and
>> possibly tweaking the file system a little bit.
>>
>> My original array was a raid6 of 9 WD caviar black drives, the chunk
>> size was 64k. I use USAS-AOC-L8i controllers to address all of my drives
>> and the TLER setting on the drives is enabled for 7 seconds.
> 
> 	I would recommend a larger chunk size.  I'm using 256K, and even
> 512K or 1024K probably would not be excessive.

OK, I've got some data that I'm not quite ready to send out yet, but it
maps out the relationship between max_sectors_kb (largest request size a
disk can process, which varies based upon scsi host adapter in question,
but for SATA adapters is capped at and defaults to 512KB max per
request) and chunk size for a raid0 array across 4 disks or 5 disks (I
could run other array sizes too, and that's part of what I'm waiting on
before sending the data out).  The point here being that a raid0 array
will show up more of the md/lower layer block device interactions where
as raid5/6 would muddy the waters with other stuff.  The results of the
tests I ran were pretty conclusive that the sweet spot for chunk size is
when chunk size is == max_sectors_kb, and since SATA is the predominant
thing today and it defaults to 512K, that gives a 512K chunk as the
sweet spot.  Given that the chunk size is generally about optimizing
block device operations at the command/queue level, it should transfer
directly to raid5/6 as well.

>> storrgie@ALEXANDRIA:~$ sudo mdadm -D /dev/md0
>> /dev/md0:
>>         Version : 00.90
> 
> 	I definitely recommend something other than 0.9, especially if this
> array is to grow a lot.
> 
>> I have noticed slow rebuilding time when I first created the array and
>> intermittent lockups while writing large data sets.
> 
> 	Lock-ups are not good.  Investigate your kernel log.  A write-intent
> bitmap is recommended to reduce rebuild time.
> 
>> Is ext4 the ideal file system for my purposes?
> 
> 	I'm using xfs.  YMMV.
> 
>> Should I be investigating into the file system stripe size and chunk
>> size or let mkfs choose these for me? If I need to, please be kind to
>> point me in a good direction as I am new to this lower level file system
>> stuff.
> 
> 	I don/'t know specifically about ext4, but xfs did a fine job of
> assigning stripe and chunk size.

xfs pulls this out all on it's own, ext2/3/4 need to be told (and you
need very recent ext utils to tell it both stripe and stride sizes).

>> Can I change the properties of my file system in place (ext4 or other)
>> so that I can tweak the stripe size when I add more drives and grow the
>> array?
> 
> 	One can with xfs.  I expect ext4 may be the same.

Actually, this needs clarified somewhat.  You can tweak xfs in terms of
the sunit and swidth settings.  This will effect new allocations *only*!
 All of your existing data will still be wherever it was and if that
happens to be not so well laid out for the new array, too bad.  For the
ext filesystems, they use this information at filesystem creation time
to lay out their block groups, inode tables, etc. in such a fashion that
they are aligned to individual chunks and also so that they are *not*
exactly stripe width apart from each other (which forces the metadata to
reside on different disks and avoids the possible pathological case
where you could accidentally end up with the metadata blocks always
falling on the same disk in the array making that one disk a huge
bottleneck to the rest of the array).  Once an ext filesystem is
created, I don't think it uses the data much any longer, but I could be
wrong.  However, I know that it won't be rearranged for your new layout,
so you get what you get after you grow the fs.


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: CFBFF194
	      http://people.redhat.com/dledford

Infiniband specific RPMs available at
	      http://people.redhat.com/dledford/Infiniband


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

  reply	other threads:[~2009-11-07 18:35 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-04 18:03 Successful RAID 6 setup Andrew Dunn
2009-11-04 18:40 ` Leslie Rhorer
2009-11-07 18:35   ` Doug Ledford [this message]
2009-11-08  6:42     ` Beolach
2009-11-08 16:15       ` Doug Ledford
2009-11-09 17:37       ` Bill Davidsen
2009-11-06 21:22 ` Thomas Fjellstrom
2009-11-08  7:07 ` Beolach
2009-11-08 13:31   ` Andrew Dunn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AF5BDF3.8020907@redhat.com \
    --to=dledford@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=lrhorer@satx.rr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).