linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Triple-parity raid6
@ 2011-06-09  0:01 David Brown
  2011-06-09  1:49 ` NeilBrown
  2011-06-09 22:42 ` David Brown
  0 siblings, 2 replies; 22+ messages in thread
From: David Brown @ 2011-06-09  0:01 UTC (permalink / raw)
  To: linux-raid

Has anyone considered triple-parity raid6 ?  As far as I can see, it 
should not be significantly harder than normal raid6 - either  to 
implement, or for the processor at run-time.  Once you have the GF(2⁸) 
field arithmetic in place for raid6, it's just a matter of making 
another parity block in the same way but using a different generator:

P = D_0 + D_1 + D_2 + .. + D_(n.1)
Q = D_0 + g.D_1 + g².D_2 + .. + g^(n-1).D_(n.1)
R = D_0 + h.D_1 + h².D_2 + .. + h^(n-1).D_(n.1)

The raid6 implementation in mdraid uses g = 0x02 to generate the second 
parity (based on "The mathematics of RAID-6" - I haven't checked the 
source code).  You can make a third parity using h = 0x04 and then get a 
redundancy of 3 disks.  (Note - I haven't yet confirmed that this is 
valid for more than 100 data disks - I need to make my checker program 
more efficient first.)

Rebuilding a disk, or running in degraded mode, is just an obvious 
extension to the current raid6 algorithms.  If you are missing three 
data blocks, the maths looks hard to start with - but if you express the 
equations as a set of linear equations and use standard matrix inversion 
techniques, it should not be hard to implement.  You only need to do 
this inversion once when you find that one or more disks have failed - 
then you pre-compute the multiplication tables in the same way as is 
done for raid6 today.

In normal use, calculating the R parity is no more demanding than 
calculating the Q parity.  And most rebuilds or degraded situations will 
only involve a single disk, and the data can thus be re-constructed 
using the P parity just like raid5 or two-parity raid6.


I'm sure there are situations where triple-parity raid6 would be 
appealing - it has already been implemented in ZFS, and it is only a 
matter of time before two-parity raid6 has a real probability of hitting 
an unrecoverable read error during a rebuild.


And of course, there is no particular reason to stop at three parity 
blocks - the maths can easily be generalised.  1, 2, 4 and 8 can be used 
as generators for quad-parity (checked up to 60 disks), and adding 16 
gives you quintuple parity (checked up to 30 disks) - but that's maybe 
getting a bit paranoid.


ref.:

<http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>
<http://blogs.oracle.com/ahl/entry/acm_triple_parity_raid>
<http://queue.acm.org/detail.cfm?id=1670144>
<http://blogs.oracle.com/ahl/entry/triple_parity_raid_z>


mvh.,

David

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-06-12  9:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-09  0:01 Triple-parity raid6 David Brown
2011-06-09  1:49 ` NeilBrown
2011-06-09 11:32   ` David Brown
2011-06-09 12:04     ` NeilBrown
2011-06-09 19:19       ` David Brown
2011-06-10  3:22       ` Namhyung Kim
2011-06-10  8:45         ` David Brown
2011-06-10 12:20           ` Christoph Dittmann
2011-06-10 14:28             ` David Brown
2011-06-11 10:13               ` Piergiorgio Sartor
2011-06-11 11:51                 ` David Brown
2011-06-11 13:18                   ` Piergiorgio Sartor
2011-06-11 14:53                     ` David Brown
2011-06-11 15:05                       ` Joe Landman
2011-06-11 16:31                         ` David Brown
2011-06-11 16:57                           ` Joe Landman
2011-06-12  9:05                             ` David Brown
2011-06-11 17:14                           ` Joe Landman
2011-06-11 18:05                             ` David Brown
2011-06-10  9:03       ` David Brown
2011-06-10 13:56       ` Bill Davidsen
2011-06-09 22:42 ` David Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).