linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: joystick <joystick@shiftmail.org>
To: stan@hardwarefreak.com
Cc: James Plank <plank@cs.utk.edu>, Ric Wheeler <rwheeler@redhat.com>,
	Andrea Mazzoleni <amadvance@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-raid@vger.kernel.org, linux-btrfs@vger.kernel.org,
	David Brown <david.brown@hesbynett.no>,
	David Smith <creamyfish@gmail.com>
Subject: Re: Triple parity and beyond
Date: Thu, 21 Nov 2013 09:08:37 +0100	[thread overview]
Message-ID: <528DBF85.6010303@shiftmail.org> (raw)
In-Reply-To: <528D61C5.70902@hardwarefreak.com>

On 21/11/2013 02:28, Stan Hoeppner wrote:
> On 11/20/2013 10:16 AM, James Plank wrote:
>> Hi all -- no real comments, except as I mentioned to Ric, my tutorial
>> in FAST last February presents Reed-Solomon coding with Cauchy
>> matrices, and then makes special note of the common pitfall of
>> assuming that you can append a Vandermonde matrix to an identity
>> matrix.  Please see
>> http://web.eecs.utk.edu/~plank/plank/papers/2013-02-11-FAST-Tutorial.pdf,
>> slides 48-52.
>>
>> Andrea, does the matrix that you included in an earlier mail (the one
>> that has Linux RAID-6 in the first two rows) have a general form, or
>> did you develop it in an ad hoc manner so that it would include Linux
>> RAID-6 in the first two rows?
> Hello Jim,
>
> It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal
> today. ;)
>
> I'm not attempting to marginalize Andrea's work here, but I can't help
> but ponder what the real value of triple parity RAID is, or quad, or
> beyond.  Some time ago parity RAID's primary mission ceased to be
> surviving single drive failure, or a 2nd failure during rebuild, and
> became mitigating UREs during a drive rebuild.  So we're now talking
> about dedicating 3 drives of capacity to avoiding disaster due to
> platter defects and secondary drive failure.  For small arrays this is
> approaching half the array capacity.  So here parity RAID has lost the
> battle with RAID10's capacity disadvantage, yet it still suffers the
> vastly inferior performance in normal read/write IO, not to mention
> rebuild times that are 3-10x longer.
>
> WRT rebuild times, once drives hit 20TB we're looking at 18 hours just
> to mirror a drive at full streaming bandwidth, assuming 300MB/s
> average--and that is probably being kind to the drive makers.  With 6 or
> 8 of these drives, I'd guess a typical md/RAID6 rebuild will take at
> minimum 72 hours or more, probably over 100, and probably more yet for
> 3P.  And with larger drive count arrays the rebuild times approach a
> week.  Whose users can go a week with degraded performance?  This is
> simply unreasonable, at best.  I say it's completely unacceptable.
>
> With these gargantuan drives coming soon, the probability of multiple
> UREs during rebuild are pretty high.

No because if you are correct about the very high CPU overhead during 
rebuild (which I don't see so dramatic as Andrea claims 500MB/sec for 
triple-parity, probably parallelizable on multiple cores), the speed of 
rebuild decreases proportionally and hence the stress and heating on the 
drives proportionally reduces, approximating that of normal operation.
And how often have you seen a drive failure in a week during normal 
operation?

But in reality, consider that a non-naive implementation of 
multiple-parity would probably use just the single parity during 
reconstruction if just one disk fails, using the multiple parities only 
to read the stripes which are unreadable at single parity. So the speed 
and time of reconstruction and performance penalty would be that of 
raid5 except in exceptional situations of multiple failures.


> ...
> What I envision is an array type, something similar to RAID 51, i.e.
> striped parity over mirror pairs. ....

I don't like your approach of raid 51: it has the write overhead of 
raid5, with the waste of space of raid1.
So it cannot be used as neither a performance array nor a capacity array.
In the scope of this discussion (we are talking about very large 
arrays), the waste of space of your solution, higher than 50%, will make 
your solution costing double the price.

A competitor for the multiple-parity scheme might be raid65 or 66, but 
this is a so much dirtier approach than multiple parity if you think at 
the kind of rmw and overhead that will occur during normal operation.



  parent reply	other threads:[~2013-11-21  8:08 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-18 22:08 Triple parity and beyond Andrea Mazzoleni
2013-11-18 22:12 ` H. Peter Anvin
2013-11-18 22:35   ` Andrea Mazzoleni
2013-11-18 23:25     ` H. Peter Anvin
2013-11-19 10:16       ` David Brown
2013-11-19 17:36         ` Andrea Mazzoleni
2013-11-19 22:51           ` Drew
2013-11-20  0:54             ` Chris Murphy
2013-11-20  1:23               ` John Williams
2013-11-20 10:35                 ` David Brown
2013-11-20 10:31           ` David Brown
2013-11-20 18:09             ` John Williams
2013-11-20 18:44               ` Andrea Mazzoleni
2013-11-21  6:15                 ` Stan Hoeppner
2013-11-21  8:32               ` David Brown
2013-11-20 18:34             ` Andrea Mazzoleni
2013-11-20 18:43               ` H. Peter Anvin
2013-11-20 18:56                 ` Andrea Mazzoleni
2013-11-20 18:59                   ` H. Peter Anvin
2013-11-20 21:21                     ` Andrea Mazzoleni
2013-11-20 19:00                   ` H. Peter Anvin
2013-11-20 21:04                     ` Andrea Mazzoleni
2013-11-20 21:06                       ` H. Peter Anvin
2013-11-21  8:36               ` David Brown
2013-11-19 17:28       ` Andrea Mazzoleni
2013-11-19 20:29         ` Ric Wheeler
2013-11-20 16:16           ` James Plank
2013-11-20 19:05             ` Andrea Mazzoleni
2013-11-20 19:10               ` H. Peter Anvin
2013-11-20 20:30                 ` James Plank
2013-11-20 21:23                   ` Andrea Mazzoleni
2013-11-27  2:50                     ` ronnie sahlberg
2013-11-20 21:28                   ` H. Peter Anvin
2013-11-21  1:28             ` Stan Hoeppner
2013-11-21  2:46               ` John Williams
2013-11-21  6:52                 ` Stan Hoeppner
2013-11-21  7:05                   ` John Williams
2013-11-21 22:57                     ` Stan Hoeppner
2013-11-21 23:38                       ` John Williams
2013-11-22  9:35                         ` Stan Hoeppner
2013-11-22 11:24                           ` joystick
2013-11-22 15:01                           ` John Williams
2013-11-22 22:28                             ` Stan Hoeppner
2013-11-22 23:07                       ` NeilBrown
2013-11-23  3:46                         ` Stan Hoeppner
2013-11-23  5:04                           ` NeilBrown
2013-11-23  5:34                             ` John Williams
2013-11-23  7:12                               ` NeilBrown
2013-11-24  4:03                                 ` Stan Hoeppner
2013-11-24  5:14                                   ` John Williams
2013-11-24 21:13                                     ` Stan Hoeppner
2013-11-24 23:28                                       ` Rudy Zijlstra
2013-11-24 23:53                                       ` Alex Elsayed
2013-11-25  2:04                                         ` Stan Hoeppner
2013-11-25  4:48                                           ` Alex Elsayed
2013-11-25  9:15                                       ` David Brown
2013-11-24  5:19                                   ` Russell Coker
2013-11-24 21:44                                     ` Stan Hoeppner
2013-11-24 22:31                                       ` Mark Knecht
2013-11-25  2:14                                       ` Russell Coker
2013-11-25  9:20                                         ` David Brown
2013-11-21  8:08               ` joystick [this message]
2013-11-22  0:30                 ` Stan Hoeppner
2013-11-22  0:33                   ` H. Peter Anvin
2013-11-22  0:45                   ` David Brown
2013-11-21  9:07               ` David Brown
2013-11-21  9:54                 ` Adam Goryachev
2013-11-21 10:32                   ` David Brown
2013-11-22  8:12                   ` Russell Coker
2013-11-25 18:23                     ` Pasi Kärkkäinen
2013-11-22  8:13                 ` Stan Hoeppner
2013-11-22 13:15                   ` David Brown
2013-11-22 16:07                   ` Stan Hoeppner
2013-11-22 22:59                     ` NeilBrown
2013-11-23 17:39                       ` David Brown
2013-11-22 16:50                   ` Mark Knecht
2013-11-22 19:51                     ` Duncan
2013-11-22  8:38                 ` Stan Hoeppner
2013-11-22 13:24                   ` David Brown
2013-11-28  7:16                     ` Stan Hoeppner
2013-11-28  7:36                       ` Russell Coker
2013-11-28  9:56                       ` David Brown
2013-11-30  7:32                       ` Alex Elsayed
2013-12-01 15:37                         ` Stan Hoeppner
2013-11-22 14:19                   ` David Taylor
2013-11-21 19:56               ` Piergiorgio Sartor
2013-11-19 18:12 ` Piergiorgio Sartor
2013-11-20 10:44   ` David Brown
2013-11-20 21:59     ` Piergiorgio Sartor
2013-11-21 10:13       ` David Brown
2013-11-21 17:37         ` Goffredo Baroncelli
2013-11-21 20:05         ` Piergiorgio Sartor
2013-11-21 20:31           ` David Brown
2013-11-21 20:52             ` Piergiorgio Sartor
2013-11-22  0:32               ` David Brown
2013-11-22 20:32                 ` Piergiorgio Sartor
2013-11-26 18:10             ` joystick
2013-11-20 21:38   ` Andrea Mazzoleni
2013-11-20 22:29 ` Piergiorgio Sartor
2013-11-23  7:55   ` Andrea Mazzoleni
2013-11-23 22:10     ` Piergiorgio Sartor
2013-11-24  9:39       ` Andrea Mazzoleni
  -- strict thread matches above, loose matches on Subject: below --
2013-12-01 17:53 Richard Scobie
2013-12-02  4:30 ` Stan Hoeppner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=528DBF85.6010303@shiftmail.org \
    --to=joystick@shiftmail.org \
    --cc=amadvance@gmail.com \
    --cc=creamyfish@gmail.com \
    --cc=david.brown@hesbynett.no \
    --cc=hpa@zytor.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=plank@cs.utk.edu \
    --cc=rwheeler@redhat.com \
    --cc=stan@hardwarefreak.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).