linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: stan@hardwarefreak.com, James Plank <plank@cs.utk.edu>,
	Ric Wheeler <rwheeler@redhat.com>
Cc: Andrea Mazzoleni <amadvance@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-raid@vger.kernel.org, linux-btrfs@vger.kernel.org,
	David Smith <creamyfish@gmail.com>
Subject: Re: Triple parity and beyond
Date: Thu, 28 Nov 2013 10:56:44 +0100	[thread overview]
Message-ID: <5297135C.50600@hesbynett.no> (raw)
In-Reply-To: <5296EDBD.8030905@hardwarefreak.com>

On 28/11/13 08:16, Stan Hoeppner wrote:
> Late reply.  This one got lost in the flurry of activity...
> 
> On 11/22/2013 7:24 AM, David Brown wrote:
>> On 22/11/13 09:38, Stan Hoeppner wrote:
>>> On 11/21/2013 3:07 AM, David Brown wrote:
>>>
>>>> For example, with 20 disks at 1 TB each, you can have:
>>>
> ...
>>> Maximum:
>>>
>>> RAID 10 = 10 disk redundancy
>>> RAID 15 = 11 disk redundancy
>>
>> 12 disks maximum (you have 8 with data, the rest are mirrors, parity, or
>> mirrors of parity).
>>
>>> RAID 16 = 12 disk redundancy
>>
>> 14 disks maximum (you have 6 with data, the rest are mirrors, parity, or
>> mirrors of parity).
> 
> We must follow different definitions of "redundancy".  I view redundancy
> as the number of drives that can fail without taking down the array.  In
> the case of the above 20 drive RAID15 that maximum is clearly 11
> drives-- one of every mirror and both of one mirror can fail.  The 12th
> drive failure kills the array.
> 

No, we have the same definitions of redundancy - just different
definitions of basic arithmetic.  Your definition is a bit more common!

My error was actually in an earlier email, when I listed the usable
capacities of different layouts for 20 x 1TB drive.  I wrote:

> raid10 = 10TB, 1 disk redundancy
> raid15 = 8TB, 3 disk redundancy
> raid16 = 6TB, 5 disk redundancy

Of course, it should be:

raid10 = 10TB, 1 disk redundancy
raid15 = 9TB, 3 disk redundancy
raid16 = 8TB, 5 disk redundancy


So it is your fault for not spotting my earlier mistake :-)



>>> Range:
>>>
>>> RAID 10 = 1-10 disk redundancy
>>> RAID 15 = 3-11 disk redundancy
>>> RAID 16 = 5-12 disk redundancy
>>
>> Yes, I know these are the minimum redundancies.  But that's a vital
>> figure for reliability (even if the range is important for statistical
>> averages).  When one disk in a raid10 array fails, your main concern is
>> about failures or URE's in the other half of the pair - it doesn't help
>> to know that another nine disks can "safely" fail too.
> 
> Knowing this is often critical from an architectural standpoint David.
> It is quite common to create the mirrors of a RAID10 across two HBAs and
> two JBOD chassis.  Some call this "duplexing".  With RAID10 you know you
> can lose one HBA, one cable, one JBOD (PSU, expander, etc) and not skip
> a beat.  "RAID15" would work the same in this scenario.
> 

That is absolutely true, and I agree that it is very important when
setting up big arrays.  You have to make decisions like where you split
your raid1 pairs - putting them on different controllers/chassis means
you can survive the loss of a whole half of the system.  On the other
hand, putting them on the same controller could mean hardware raid1 is
more efficient and you don't need to duplicate the traffic over the
higher level interfaces.

But here we are looking at one specific class of failures - hard disk
failures (including complete disk failure and URE's).  For that, the
redundancy is the number of disks that can fail without data loss,
assuming the worst possible combination of failures.  And given the
extra stress on the disks during degraded access or rebuilds, "bad"
combinations are more likely than "good" combinations.

So I think it is of little help to say that a 20 disk raid 15 can
survive up to 11 disk failures.  It is far more interesting to say that
it can survive any 3 random disk failures, and (if connected as you
describe with two controllers and chassis) it can also survive the
complete failure of a chassis or controller while still retaining a one
disk redundancy.


As a side issue here, I wonder if a write intent bitmap can be used for
a chassis failure so that when the chassis is fixed (the controller card
replaced, the cable re-connected, etc.) the disks inside can be brought
up to sync again without a full rebuild.

> This architecture is impossible with RAID5/6.  Any of the mentioned
> failures will kill the array.
> 

Yes.


  parent reply	other threads:[~2013-11-28  9:56 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 22:08 Triple parity and beyond Andrea Mazzoleni
2013-11-18 22:12 ` H. Peter Anvin
2013-11-18 22:35   ` Andrea Mazzoleni
2013-11-18 23:25     ` H. Peter Anvin
2013-11-19 10:16       ` David Brown
2013-11-19 17:36         ` Andrea Mazzoleni
2013-11-19 22:51           ` Drew
2013-11-20  0:54             ` Chris Murphy
2013-11-20  1:23               ` John Williams
2013-11-20 10:35                 ` David Brown
2013-11-20 10:31           ` David Brown
2013-11-20 18:09             ` John Williams
2013-11-20 18:44               ` Andrea Mazzoleni
2013-11-21  6:15                 ` Stan Hoeppner
2013-11-21  8:32               ` David Brown
2013-11-20 18:34             ` Andrea Mazzoleni
2013-11-20 18:43               ` H. Peter Anvin
2013-11-20 18:56                 ` Andrea Mazzoleni
2013-11-20 18:59                   ` H. Peter Anvin
2013-11-20 21:21                     ` Andrea Mazzoleni
2013-11-20 19:00                   ` H. Peter Anvin
2013-11-20 21:04                     ` Andrea Mazzoleni
2013-11-20 21:06                       ` H. Peter Anvin
2013-11-21  8:36               ` David Brown
2013-11-19 17:28       ` Andrea Mazzoleni
2013-11-19 20:29         ` Ric Wheeler
2013-11-20 16:16           ` James Plank
2013-11-20 19:05             ` Andrea Mazzoleni
2013-11-20 19:10               ` H. Peter Anvin
2013-11-20 20:30                 ` James Plank
2013-11-20 21:23                   ` Andrea Mazzoleni
2013-11-27  2:50                     ` ronnie sahlberg
2013-11-20 21:28                   ` H. Peter Anvin
2013-11-21  1:28             ` Stan Hoeppner
2013-11-21  2:46               ` John Williams
2013-11-21  6:52                 ` Stan Hoeppner
2013-11-21  7:05                   ` John Williams
2013-11-21 22:57                     ` Stan Hoeppner
2013-11-21 23:38                       ` John Williams
2013-11-22  9:35                         ` Stan Hoeppner
2013-11-22 15:01                           ` John Williams
2013-11-22 22:28                             ` Stan Hoeppner
2013-11-22 23:07                       ` NeilBrown
2013-11-23  3:46                         ` Stan Hoeppner
2013-11-23  5:04                           ` NeilBrown
2013-11-23  5:34                             ` John Williams
2013-11-23  7:12                               ` NeilBrown
2013-11-24  4:03                                 ` Stan Hoeppner
2013-11-24  5:14                                   ` John Williams
2013-11-24 21:13                                     ` Stan Hoeppner
2013-11-24 23:28                                       ` Rudy Zijlstra
     [not found]                                       ` <l6u3h9$l72$2@ger.gmane.org>
2013-11-25  2:04                                         ` Stan Hoeppner
2013-11-25  9:15                                       ` David Brown
2013-11-24  5:19                                   ` Russell Coker
2013-11-24 21:44                                     ` Stan Hoeppner
2013-11-24 22:31                                       ` Mark Knecht
2013-11-25  2:14                                       ` Russell Coker
2013-11-25  9:20                                         ` David Brown
2013-11-21  8:08               ` joystick
2013-11-22  0:30                 ` Stan Hoeppner
2013-11-22  0:33                   ` H. Peter Anvin
2013-11-22  0:45                   ` David Brown
2013-11-21  9:07               ` David Brown
2013-11-21  9:54                 ` Adam Goryachev
2013-11-21 10:32                   ` David Brown
2013-11-22  8:12                   ` Russell Coker
2013-11-25 18:23                     ` Pasi Kärkkäinen
2013-11-22  8:13                 ` Stan Hoeppner
2013-11-22 13:15                   ` David Brown
2013-11-22 16:07                   ` Stan Hoeppner
2013-11-22 22:59                     ` NeilBrown
2013-11-23 17:39                       ` David Brown
2013-11-22 16:50                   ` Mark Knecht
2013-11-22 19:51                     ` Duncan
2013-11-22  8:38                 ` Stan Hoeppner
2013-11-22 13:24                   ` David Brown
2013-11-28  7:16                     ` Stan Hoeppner
2013-11-28  7:36                       ` Russell Coker
2013-11-28  9:56                       ` David Brown [this message]
2013-11-21 19:56               ` Piergiorgio Sartor
2013-11-19 18:12 ` Piergiorgio Sartor
2013-11-20 10:44   ` David Brown
2013-11-20 21:59     ` Piergiorgio Sartor
2013-11-21 10:13       ` David Brown
2013-11-21 17:37         ` Goffredo Baroncelli
2013-11-21 20:05         ` Piergiorgio Sartor
2013-11-21 20:31           ` David Brown
2013-11-21 20:52             ` Piergiorgio Sartor
2013-11-22  0:32               ` David Brown
2013-11-22 20:32                 ` Piergiorgio Sartor
2013-11-26 18:10             ` joystick
2013-11-20 21:38   ` Andrea Mazzoleni
2013-11-20 22:29 ` Piergiorgio Sartor
2013-11-23  7:55   ` Andrea Mazzoleni
2013-11-23 22:10     ` Piergiorgio Sartor
2013-11-24  9:39       ` Andrea Mazzoleni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5297135C.50600@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=amadvance@gmail.com \
    --cc=creamyfish@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=plank@cs.utk.edu \
    --cc=rwheeler@redhat.com \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).