All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Peter Grandi <pg_lxra@lxra.for.sabi.co.UK>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Raid over 48 disks
Date: Tue, 25 Dec 2007 16:08:14 -0500	[thread overview]
Message-ID: <4771713E.2020303@tmr.com> (raw)
In-Reply-To: <18289.16006.856691.862471@base.ty.sabi.co.UK>

Peter Grandi wrote:
>>>> On Wed, 19 Dec 2007 07:28:20 +1100, Neil Brown
>>>> <neilb@suse.de> said:
>>>>         
>
> [ ... what to do with 48 drive Sun Thumpers ... ]
>
> neilb> I wouldn't create a raid5 or raid6 on all 48 devices.
> neilb> RAID5 only survives a single device failure and with that
> neilb> many devices, the chance of a second failure before you
> neilb> recover becomes appreciable.
>
> That's just one of the many problems, other are:
>
> * If a drive fails, rebuild traffic is going to hit hard, with
>   reading in parallel 47 blocks to compute a new 48th.
>
> * With a parity strip length of 48 it will be that much harder
>   to avoid read-modify before write, as it will be avoidable
>   only for writes of at least 48 blocks aligned on 48 block
>   boundaries. And reading 47 blocks to write one is going to be
>   quite painful.
>
> [ ... ]
>
> neilb> RAID10 would be a good option if you are happy wit 24
> neilb> drives worth of space. [ ... ]
>
> That sounds like the only feasible option (except for the 3
> drive case in most cases). Parity RAID does not scale much
> beyond 3-4 drives.
>
> neilb> Alternately, 8 6drive RAID5s or 6 8raid RAID6s, and use
> neilb> RAID0 to combine them together. This would give you
> neilb> adequate reliability and performance and still a large
> neilb> amount of storage space.
>
> That sounds optimistic to me: the reason to do a RAID50 of
> 8x(5+1) can only be to have a single filesystem, else one could
> have 8 distinct filesystems each with a subtree of the whole.
> With a single filesystem the failure of any one of the 8 RAID5
> components of the RAID0 will cause the loss of the whole lot.
>
> So in the 47+1 case a loss of any two drives would lead to
> complete loss; in the 8x(5+1) case only a loss of two drives in
> the same RAID5 will.
>
> It does not sound like a great improvement to me (especially
> considering the thoroughly inane practice of building arrays out
> of disks of the same make and model taken out of the same box).
>   

Quality control just isn't that good that "same box" make a big 
difference, assuming that you have an appropriate number of hot spares 
online. Note that I said "big difference," is there some clustering of 
failures? Some, but damn little. A few years ago I was working with 
multiple 6TB machines and 20+ 1TB machines, all using small, fast, 
drives in RAID5E. I can't remember a case where a drive failed before 
rebuild was complete, and only one or two where there was a failure to 
degraded mode before the hot spare was replaced.

That said, RAID5E typically can rebuild a lot faster than a typical hot 
spare as a unit drive, at least for any given impact on performance. 
This undoubtedly reduce our exposure time.
> There are also modest improvements in the RMW strip size and in
> the cost of a rebuild after a single drive loss. Probably the
> reduction in the RMW strip size is the best improvement.
>
> Anyhow, let's assume 0.5TB drives; with a 47+1 we get a single
> 23.5TB filesystem, and with 8*(5+1) we get a 20TB filesystem.
> With current filesystem technology either size is worrying, for
> example as to time needed for an 'fsck'.
>   

Given that someone is putting a typical filesystem full of small files 
on a big raid, I agree. But fsck with large files is pretty fast on a 
given filesystem (200GB files on a 6TB ext3, for instance), due to the 
small number of inodes in play. While the bitmap resolution is a factor, 
it's pretty linear, fsck with lots of files gets really slow. And let's 
face it, the objective of raid is to avoid doing that fsck in the first 
place ;-)

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



  reply	other threads:[~2007-12-25 21:08 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-18 17:29 Raid over 48 disks Norman Elton
2007-12-18 18:27 ` Justin Piszcz
2007-12-18 19:34   ` Thiemo Nagel
2007-12-18 19:52     ` Norman Elton
2007-12-18 20:19       ` Thiemo Nagel
2007-12-18 20:25     ` Justin Piszcz
2007-12-18 21:13       ` Thiemo Nagel
2007-12-18 21:20         ` Jon Nelson
2007-12-18 21:40           ` Thiemo Nagel
2007-12-18 21:43           ` Justin Piszcz
2007-12-18 21:21         ` Justin Piszcz
2007-12-19 15:21         ` Bill Davidsen
2007-12-19 15:02           ` Justin Piszcz
2007-12-20 16:48           ` Thiemo Nagel
2007-12-21  1:53             ` Bill Davidsen
2007-12-18 18:45 ` Robin Hill
2007-12-18 20:28 ` Neil Brown
2007-12-19  8:27   ` Mattias Wadenstein
2007-12-19 15:26     ` Bill Davidsen
2007-12-21 11:03     ` Leif Nixon
2007-12-25 17:31   ` pg_mh, Peter Grandi
2007-12-25 21:08     ` Bill Davidsen [this message]
2007-12-18 20:36 ` Brendan Conoboy
2007-12-18 23:50   ` Guy Watkins
2007-12-18 23:58     ` Justin Piszcz
2007-12-18 23:59       ` Justin Piszcz
2007-12-19 12:08     ` Russell Smith
2007-12-21 10:57 ` Leif Nixon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4771713E.2020303@tmr.com \
    --to=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=pg_lxra@lxra.for.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.