linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Peter Grandi <pg_lxra@lxra.for.sabi.co.UK>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Raid over 48 disks
Date: Tue, 25 Dec 2007 16:08:14 -0500	[thread overview]
Message-ID: <4771713E.2020303@tmr.com> (raw)
In-Reply-To: <18289.16006.856691.862471@base.ty.sabi.co.UK>

Peter Grandi wrote:
>>>> On Wed, 19 Dec 2007 07:28:20 +1100, Neil Brown
>>>> <neilb@suse.de> said:
>>>>         
>
> [ ... what to do with 48 drive Sun Thumpers ... ]
>
> neilb> I wouldn't create a raid5 or raid6 on all 48 devices.
> neilb> RAID5 only survives a single device failure and with that
> neilb> many devices, the chance of a second failure before you
> neilb> recover becomes appreciable.
>
> That's just one of the many problems, other are:
>
> * If a drive fails, rebuild traffic is going to hit hard, with
>   reading in parallel 47 blocks to compute a new 48th.
>
> * With a parity strip length of 48 it will be that much harder
>   to avoid read-modify before write, as it will be avoidable
>   only for writes of at least 48 blocks aligned on 48 block
>   boundaries. And reading 47 blocks to write one is going to be
>   quite painful.
>
> [ ... ]
>
> neilb> RAID10 would be a good option if you are happy wit 24
> neilb> drives worth of space. [ ... ]
>
> That sounds like the only feasible option (except for the 3
> drive case in most cases). Parity RAID does not scale much
> beyond 3-4 drives.
>
> neilb> Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use
> neilb> RAID0 to combine them together. This would give you
> neilb> adequate reliability and performance and still a large
> neilb> amount of storage space.
>
> That sounds optimistic to me: the reason to do a RAID50 of
> 8x(5+1) can only be to have a single filesystem, else one could
> have 8 distinct filesystems each with a subtree of the whole.
> With a single filesystem the failure of any one of the 8 RAID5
> components of the RAID0 will cause the loss of the whole lot.
>
> So in the 47+1 case a loss of any two drives would lead to
> complete loss; in the 8x(5+1) case only a loss of two drives in
> the same RAID5 will.
>
> It does not sound like a great improvement to me (especially
> considering the thoroughly inane practice of building arrays out
> of disks of the same make and model taken out of the same box).
>   

Quality control just isn't that good that "same box" make a big 
difference, assuming that you have an appropriate number of hot spares 
online. Note that I said "big difference," is there some clustering of 
failures? Some, but damn little. A few years ago I was working with 
multiple 6TB machines and 20+ 1TB machines, all using small, fast, 
drives in RAID5E. I can't remember a case where a drive failed before 
rebuild was complete, and only one or two where there was a failure to 
degraded mode before the hot spare was replaced.

That said, RAID5E typically can rebuild a lot faster than a typical hot 
spare as a unit drive, at least for any given impact on performance. 
This undoubtedly reduce our exposure time.
> There are also modest improvements in the RMW strip size and in
> the cost of a rebuild after a single drive loss. Probably the
> reduction in the RMW strip size is the best improvement.
>
> Anyhow, let's assume 0.5TB drives; with a 47+1 we get a single
> 23.5TB filesystem, and with 8*(5+1) we get a 20TB filesystem.
> With current filesystem technology either size is worrying, for
> example as to time needed for an 'fsck'.
>   

Given that someone is putting a typical filesystem full of small files 
on a big raid, I agree. But fsck with large files is pretty fast on a 
given filesystem (200GB files on a 6TB ext3, for instance), due to the 
small number of inodes in play. While the bitmap resolution is a factor, 
it's pretty linear, fsck with lots of files gets really slow. And let's 
face it, the objective of raid is to avoid doing that fsck in the first 
place ;-)

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



  reply	other threads:[~2007-12-25 21:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-18 17:29 Raid over 48 disks Norman Elton
2007-12-18 18:27 ` Justin Piszcz
2007-12-18 19:34   ` Thiemo Nagel
2007-12-18 19:52     ` Norman Elton
2007-12-18 20:19       ` Thiemo Nagel
2007-12-18 20:25     ` Justin Piszcz
2007-12-18 21:13       ` Thiemo Nagel
2007-12-18 21:20         ` Jon Nelson
2007-12-18 21:40           ` Thiemo Nagel
2007-12-18 21:43           ` Justin Piszcz
2007-12-18 21:21         ` Justin Piszcz
2007-12-19 15:21         ` Bill Davidsen
2007-12-19 15:02           ` Justin Piszcz
2007-12-20 16:48           ` Thiemo Nagel
2007-12-21  1:53             ` Bill Davidsen
2007-12-18 18:45 ` Robin Hill
2007-12-18 20:28 ` Neil Brown
2007-12-19  8:27   ` Mattias Wadenstein
2007-12-19 15:26     ` Bill Davidsen
2007-12-21 11:03     ` Leif Nixon
2007-12-25 17:31   ` pg_mh, Peter Grandi
2007-12-25 21:08     ` Bill Davidsen [this message]
2007-12-18 20:36 ` Brendan Conoboy
2007-12-18 23:50   ` Guy Watkins
2007-12-18 23:58     ` Justin Piszcz
2007-12-18 23:59       ` Justin Piszcz
2007-12-19 12:08     ` Russell Smith
2007-12-21 10:57 ` Leif Nixon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4771713E.2020303@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg_lxra@lxra.for.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).