From: Bill Davidsen <davidsen@tmr.com>
To: Peter Grandi <pg_lxra@lxra.for.sabi.co.UK>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Raid over 48 disks
Date: Tue, 25 Dec 2007 16:08:14 -0500 [thread overview]
Message-ID: <4771713E.2020303@tmr.com> (raw)
In-Reply-To: <18289.16006.856691.862471@base.ty.sabi.co.UK>
Peter Grandi wrote:
>>>> On Wed, 19 Dec 2007 07:28:20 +1100, Neil Brown
>>>> <neilb@suse.de> said:
>>>>
>
> [ ... what to do with 48 drive Sun Thumpers ... ]
>
> neilb> I wouldn't create a raid5 or raid6 on all 48 devices.
> neilb> RAID5 only survives a single device failure and with that
> neilb> many devices, the chance of a second failure before you
> neilb> recover becomes appreciable.
>
> That's just one of the many problems, other are:
>
> * If a drive fails, rebuild traffic is going to hit hard, with
> reading in parallel 47 blocks to compute a new 48th.
>
> * With a parity strip length of 48 it will be that much harder
> to avoid read-modify before write, as it will be avoidable
> only for writes of at least 48 blocks aligned on 48 block
> boundaries. And reading 47 blocks to write one is going to be
> quite painful.
>
> [ ... ]
>
> neilb> RAID10 would be a good option if you are happy wit 24
> neilb> drives worth of space. [ ... ]
>
> That sounds like the only feasible option (except for the 3
> drive case in most cases). Parity RAID does not scale much
> beyond 3-4 drives.
>
> neilb> Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use
> neilb> RAID0 to combine them together. This would give you
> neilb> adequate reliability and performance and still a large
> neilb> amount of storage space.
>
> That sounds optimistic to me: the reason to do a RAID50 of
> 8x(5+1) can only be to have a single filesystem, else one could
> have 8 distinct filesystems each with a subtree of the whole.
> With a single filesystem the failure of any one of the 8 RAID5
> components of the RAID0 will cause the loss of the whole lot.
>
> So in the 47+1 case a loss of any two drives would lead to
> complete loss; in the 8x(5+1) case only a loss of two drives in
> the same RAID5 will.
>
> It does not sound like a great improvement to me (especially
> considering the thoroughly inane practice of building arrays out
> of disks of the same make and model taken out of the same box).
>
Quality control just isn't that good that "same box" make a big
difference, assuming that you have an appropriate number of hot spares
online. Note that I said "big difference," is there some clustering of
failures? Some, but damn little. A few years ago I was working with
multiple 6TB machines and 20+ 1TB machines, all using small, fast,
drives in RAID5E. I can't remember a case where a drive failed before
rebuild was complete, and only one or two where there was a failure to
degraded mode before the hot spare was replaced.
That said, RAID5E typically can rebuild a lot faster than a typical hot
spare as a unit drive, at least for any given impact on performance.
This undoubtedly reduce our exposure time.
> There are also modest improvements in the RMW strip size and in
> the cost of a rebuild after a single drive loss. Probably the
> reduction in the RMW strip size is the best improvement.
>
> Anyhow, let's assume 0.5TB drives; with a 47+1 we get a single
> 23.5TB filesystem, and with 8*(5+1) we get a 20TB filesystem.
> With current filesystem technology either size is worrying, for
> example as to time needed for an 'fsck'.
>
Given that someone is putting a typical filesystem full of small files
on a big raid, I agree. But fsck with large files is pretty fast on a
given filesystem (200GB files on a 6TB ext3, for instance), due to the
small number of inodes in play. While the bitmap resolution is a factor,
it's pretty linear, fsck with lots of files gets really slow. And let's
face it, the objective of raid is to avoid doing that fsck in the first
place ;-)
--
Bill Davidsen <davidsen@tmr.com>
"Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark
next prev parent reply other threads:[~2007-12-25 21:08 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-18 17:29 Raid over 48 disks Norman Elton
2007-12-18 18:27 ` Justin Piszcz
2007-12-18 19:34 ` Thiemo Nagel
2007-12-18 19:52 ` Norman Elton
2007-12-18 20:19 ` Thiemo Nagel
2007-12-18 20:25 ` Justin Piszcz
2007-12-18 21:13 ` Thiemo Nagel
2007-12-18 21:20 ` Jon Nelson
2007-12-18 21:40 ` Thiemo Nagel
2007-12-18 21:43 ` Justin Piszcz
2007-12-18 21:21 ` Justin Piszcz
2007-12-19 15:21 ` Bill Davidsen
2007-12-19 15:02 ` Justin Piszcz
2007-12-20 16:48 ` Thiemo Nagel
2007-12-21 1:53 ` Bill Davidsen
2007-12-18 18:45 ` Robin Hill
2007-12-18 20:28 ` Neil Brown
2007-12-19 8:27 ` Mattias Wadenstein
2007-12-19 15:26 ` Bill Davidsen
2007-12-21 11:03 ` Leif Nixon
2007-12-25 17:31 ` pg_mh, Peter Grandi
2007-12-25 21:08 ` Bill Davidsen [this message]
2007-12-18 20:36 ` Brendan Conoboy
2007-12-18 23:50 ` Guy Watkins
2007-12-18 23:58 ` Justin Piszcz
2007-12-18 23:59 ` Justin Piszcz
2007-12-19 12:08 ` Russell Smith
2007-12-21 10:57 ` Leif Nixon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4771713E.2020303@tmr.com \
--to=davidsen@tmr.com \
--cc=linux-raid@vger.kernel.org \
--cc=pg_lxra@lxra.for.sabi.co.UK \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).