From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: high throughput storage server?
Date: Thu, 24 Feb 2011 12:24:54 +0100 [thread overview]
Message-ID: <ik5f7j$kuk$1@dough.gmane.org> (raw)
In-Reply-To: <4D6577E4.2080305@hardwarefreak.com>
On 23/02/2011 22:11, Stan Hoeppner wrote:
> David Brown put forth on 2/23/2011 7:56 AM:
>
>> However, as disks get bigger, the chance of errors on any given disk is
>> increasing. And the fact remains that if you have a failure on a RAID10
>> system, you then have a single point of failure during the rebuild
>> period - while with RAID6 you still have redundancy (obviously RAID5 is
>> far worse here).
>
> The problem isn't a 2nd whole drive failure during the rebuild, but a
> URE during rebuild:
>
> http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162
>
Yes, I've read that article - it's one of the reasons for always
preferring RAID6 to RAID5.
My understanding of RAID controllers (software or hardware) is that they
consider a drive to be either "good" or "bad". So if you get an URE,
the controller considers the drive "bad" and ejects it from the array.
It doesn't matter if it is an URE or a total disk death.
Maybe hardware RAID controllers do something else here - you know far
more about them than I do.
The idea of the md raid "bad block list" is that there is a medium
ground - you can have disks that are "mostly good".
Supposing you have a RAID6 array, and one disk has died completely. It
gets replaced by a hot spare, and rebuild begins. As the rebuild
progresses, disk 1 gets an URE. Traditional handling would mean disk 1
is ejected, and now you have a double-degraded RAID6 to rebuilt. When
you later get an URE on disk 2, you have lost data for that stripe - and
the whole raid is gone.
But with bad block lists, the URE on disk 1 leads to a bad block entry
on disk 1, and the rebuild continues. When you later get an URE on disk
2, it's no problem - you use data from disk 1 and the other disks.
URE's are no longer a killer unless your set has no redundancy.
URE's are also what I worry about with RAID1 (including RAID10)
rebuilds. If a disk has failed, you are right in saying that the
chances of the second disk in the pair failing completely are tiny. But
the chances of getting an URE on the second disk during the rebuild are
not negligible - they are small, but growing with each new jump in disk
size.
With md raid's future bad block lists and hot replace features, then an
URE on the second disk during rebuilds is only a problem if the first
disk has died completely - if it only had a small problem, then the "hot
replace" rebuild will be able to use both disks to find the data.
>> I don't know if you've followed the recent "md road-map: 2011" thread (I
>> can't see any replies from you in the thread), but that is my reference
>> point here.
>
> Actually I haven't. Is Neil's motivation with this RAID5/6 "mirror
> rebuild" to avoid the URE problem?
>
I know you are more interested in hardware raid than software raid, but
I'm sure you'll find some interesting points in Neil's writings. If you
don't want to read through the thread, at least read his blog post.
<http://neil.brown.name/blog/20110216044002>
>> Incidentally, what's your opinion on a RAID1+5 or RAID1+6 setup, where
>> you have a RAID5 or RAID6 build from RAID1 pairs? You get all the
>> rebuild benefits of RAID1 or RAID10, such as simple and fast direct
>> copies for rebuilds, and little performance degradation. But you also
>> get multiple failure redundancy from the RAID5 or RAID6. It could be
>> that it is excessive - that the extra redundancy is not worth the
>> performance cost (you still have poor small write performance).
>
> I don't care for and don't use parity RAID levels. Simple mirroring and
> RAID10 have served me well for a very long time. They have many
> advantages over parity RAID and few, if any, disadvantages. I've
> mentioned all of these in previous posts.
>
next prev parent reply other threads:[~2011-02-24 11:24 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-14 23:59 high throughput storage server? Matt Garman
2011-02-15 2:06 ` Doug Dumitru
2011-02-15 4:44 ` Matt Garman
2011-02-15 5:49 ` hansbkk
2011-02-15 9:43 ` David Brown
2011-02-24 20:28 ` Matt Garman
2011-02-24 20:43 ` David Brown
2011-02-15 15:16 ` Joe Landman
2011-02-15 20:37 ` NeilBrown
2011-02-15 20:47 ` Joe Landman
2011-02-15 21:41 ` NeilBrown
2011-02-24 20:58 ` Matt Garman
2011-02-24 21:20 ` Joe Landman
2011-02-26 23:54 ` high throughput storage server? GPFS w/ 10GB/s throughput to the rescue Stan Hoeppner
2011-02-27 0:56 ` Joe Landman
2011-02-27 14:55 ` Stan Hoeppner
2011-03-12 22:49 ` Matt Garman
2011-02-27 21:30 ` high throughput storage server? Ed W
2011-02-28 15:46 ` Joe Landman
2011-02-28 23:14 ` Stan Hoeppner
2011-02-28 22:22 ` Stan Hoeppner
2011-03-02 3:44 ` Matt Garman
2011-03-02 4:20 ` Joe Landman
2011-03-02 7:10 ` Roberto Spadim
2011-03-02 19:03 ` Drew
2011-03-02 19:20 ` Roberto Spadim
2011-03-13 20:10 ` Christoph Hellwig
2011-03-14 12:27 ` Stan Hoeppner
2011-03-14 12:47 ` Christoph Hellwig
2011-03-18 13:16 ` Stan Hoeppner
2011-03-18 14:05 ` Christoph Hellwig
2011-03-18 15:43 ` Stan Hoeppner
2011-03-18 16:21 ` Roberto Spadim
2011-03-18 22:01 ` NeilBrown
2011-03-18 22:23 ` Roberto Spadim
2011-03-20 1:34 ` Stan Hoeppner
2011-03-20 3:41 ` NeilBrown
2011-03-20 5:32 ` Roberto Spadim
2011-03-20 23:22 ` Stan Hoeppner
2011-03-21 0:52 ` Roberto Spadim
2011-03-21 2:44 ` Keld Jørn Simonsen
2011-03-21 3:13 ` Roberto Spadim
2011-03-21 3:14 ` Roberto Spadim
2011-03-21 17:07 ` Stan Hoeppner
2011-03-21 14:18 ` Stan Hoeppner
2011-03-21 17:08 ` Roberto Spadim
2011-03-21 22:13 ` Keld Jørn Simonsen
2011-03-22 9:46 ` Robin Hill
2011-03-22 10:14 ` Keld Jørn Simonsen
2011-03-23 8:53 ` Stan Hoeppner
2011-03-23 15:57 ` Roberto Spadim
2011-03-23 16:19 ` Joe Landman
2011-03-24 8:05 ` Stan Hoeppner
2011-03-24 13:12 ` Joe Landman
2011-03-25 7:06 ` Stan Hoeppner
2011-03-24 17:07 ` Christoph Hellwig
2011-03-24 5:52 ` Stan Hoeppner
2011-03-24 6:33 ` NeilBrown
2011-03-24 8:07 ` Roberto Spadim
2011-03-24 8:31 ` Stan Hoeppner
2011-03-22 10:00 ` Stan Hoeppner
2011-03-22 11:01 ` Keld Jørn Simonsen
2011-02-15 12:29 ` Stan Hoeppner
2011-02-15 12:45 ` Roberto Spadim
2011-02-15 13:03 ` Roberto Spadim
2011-02-24 20:43 ` Matt Garman
2011-02-24 20:53 ` Zdenek Kaspar
2011-02-24 21:07 ` Joe Landman
2011-02-15 13:39 ` David Brown
2011-02-16 23:32 ` Stan Hoeppner
2011-02-17 0:00 ` Keld Jørn Simonsen
2011-02-17 0:19 ` Stan Hoeppner
2011-02-17 2:23 ` Roberto Spadim
2011-02-17 3:05 ` Stan Hoeppner
2011-02-17 0:26 ` David Brown
2011-02-17 0:45 ` Stan Hoeppner
2011-02-17 10:39 ` David Brown
2011-02-24 20:49 ` Matt Garman
2011-02-15 13:48 ` Zdenek Kaspar
2011-02-15 14:29 ` Roberto Spadim
2011-02-15 14:51 ` A. Krijgsman
2011-02-15 16:44 ` Roberto Spadim
2011-02-15 14:56 ` Zdenek Kaspar
2011-02-24 20:36 ` Matt Garman
2011-02-17 11:07 ` John Robinson
2011-02-17 13:36 ` Roberto Spadim
2011-02-17 13:54 ` Roberto Spadim
2011-02-17 21:47 ` Stan Hoeppner
2011-02-17 22:13 ` Joe Landman
2011-02-17 23:49 ` Stan Hoeppner
2011-02-18 0:06 ` Joe Landman
2011-02-18 3:48 ` Stan Hoeppner
2011-02-18 13:49 ` Mattias Wadenstein
2011-02-18 23:16 ` Stan Hoeppner
2011-02-21 10:25 ` Mattias Wadenstein
2011-02-21 21:51 ` Stan Hoeppner
2011-02-22 8:57 ` David Brown
2011-02-22 9:30 ` Mattias Wadenstein
2011-02-22 9:49 ` David Brown
2011-02-22 13:38 ` Stan Hoeppner
2011-02-22 14:18 ` David Brown
2011-02-23 5:52 ` Stan Hoeppner
2011-02-23 13:56 ` David Brown
2011-02-23 14:25 ` John Robinson
2011-02-23 15:15 ` David Brown
2011-02-23 23:14 ` Stan Hoeppner
2011-02-24 10:19 ` David Brown
2011-02-23 21:59 ` Stan Hoeppner
2011-02-23 23:43 ` John Robinson
2011-02-24 15:53 ` Stan Hoeppner
2011-02-23 21:11 ` Stan Hoeppner
2011-02-24 11:24 ` David Brown [this message]
2011-02-24 23:30 ` Stan Hoeppner
2011-02-25 8:20 ` David Brown
2011-02-19 0:24 ` Joe Landman
2011-02-21 10:04 ` Mattias Wadenstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='ik5f7j$kuk$1@dough.gmane.org' \
--to=david@westcontrol.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).