From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: Triple-parity raid6
Date: Thu, 09 Jun 2011 13:32:59 +0200 [thread overview]
Message-ID: <isqb2o$g0s$1@dough.gmane.org> (raw)
In-Reply-To: <20110609114954.243e9e22@notabene.brown>
On 09/06/2011 03:49, NeilBrown wrote:
> On Thu, 09 Jun 2011 02:01:06 +0200 David Brown<david.brown@hesbynett.no>
> wrote:
>
>> Has anyone considered triple-parity raid6 ? As far as I can see, it
>> should not be significantly harder than normal raid6 - either to
>> implement, or for the processor at run-time. Once you have the GF(2⁸)
>> field arithmetic in place for raid6, it's just a matter of making
>> another parity block in the same way but using a different generator:
>>
>> P = D_0 + D_1 + D_2 + .. + D_(n.1)
>> Q = D_0 + g.D_1 + g².D_2 + .. + g^(n-1).D_(n.1)
>> R = D_0 + h.D_1 + h².D_2 + .. + h^(n-1).D_(n.1)
>>
>> The raid6 implementation in mdraid uses g = 0x02 to generate the second
>> parity (based on "The mathematics of RAID-6" - I haven't checked the
>> source code). You can make a third parity using h = 0x04 and then get a
>> redundancy of 3 disks. (Note - I haven't yet confirmed that this is
>> valid for more than 100 data disks - I need to make my checker program
>> more efficient first.)
>>
>> Rebuilding a disk, or running in degraded mode, is just an obvious
>> extension to the current raid6 algorithms. If you are missing three
>> data blocks, the maths looks hard to start with - but if you express the
>> equations as a set of linear equations and use standard matrix inversion
>> techniques, it should not be hard to implement. You only need to do
>> this inversion once when you find that one or more disks have failed -
>> then you pre-compute the multiplication tables in the same way as is
>> done for raid6 today.
>>
>> In normal use, calculating the R parity is no more demanding than
>> calculating the Q parity. And most rebuilds or degraded situations will
>> only involve a single disk, and the data can thus be re-constructed
>> using the P parity just like raid5 or two-parity raid6.
>>
>>
>> I'm sure there are situations where triple-parity raid6 would be
>> appealing - it has already been implemented in ZFS, and it is only a
>> matter of time before two-parity raid6 has a real probability of hitting
>> an unrecoverable read error during a rebuild.
>>
>>
>> And of course, there is no particular reason to stop at three parity
>> blocks - the maths can easily be generalised. 1, 2, 4 and 8 can be used
>> as generators for quad-parity (checked up to 60 disks), and adding 16
>> gives you quintuple parity (checked up to 30 disks) - but that's maybe
>> getting a bit paranoid.
>>
>>
>> ref.:
>>
>> <http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf>
>> <http://blogs.oracle.com/ahl/entry/acm_triple_parity_raid>
>> <http://queue.acm.org/detail.cfm?id=1670144>
>> <http://blogs.oracle.com/ahl/entry/triple_parity_raid_z>
>>
>
> -ENOPATCH :-)
>
> I have a series of patches nearly ready which removes a lot of the remaining
> duplication in raid5.c between raid5 and raid6 paths. So there will be
> relative few places where RAID5 and RAID6 do different things - only the
> places where they *must* do different things.
> After that, adding a new level or layout which has 'max_degraded == 3' would
> be quite easy.
> The most difficult part would be the enhancements to libraid6 to generate the
> new 'syndrome', and to handle the different recovery possibilities.
>
> So if you're not otherwise busy this weekend, a patch would be nice :-)
>
I'm not going to promise any patches, but maybe I can help with the
maths. You say the difficult part is the syndrome calculations and
recovery - I've got these bits figured out on paper and some
quick-and-dirty python test code. On the other hand, I don't really
want to get into the md kernel code, or the mdadm code - I haven't done
Linux kernel development before (I mostly program 8-bit microcontrollers
- when I code on Linux, I use Python), and I fear it would take me a
long time to get up to speed.
However, if the parity generation and recovery is neatly separated into
a libraid6 library, the whole thing becomes much more tractable from my
viewpoint. Since I am new to this, can you tell me where I should get
the current libraid6 code? I'm sure google will find some sources for
me, but I'd like to make sure I start with whatever version /you/ have.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-06-09 11:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 0:01 Triple-parity raid6 David Brown
2011-06-09 1:49 ` NeilBrown
2011-06-09 11:32 ` David Brown [this message]
2011-06-09 12:04 ` NeilBrown
2011-06-09 19:19 ` David Brown
2011-06-10 3:22 ` Namhyung Kim
2011-06-10 8:45 ` David Brown
2011-06-10 12:20 ` Christoph Dittmann
2011-06-10 14:28 ` David Brown
2011-06-11 10:13 ` Piergiorgio Sartor
2011-06-11 11:51 ` David Brown
2011-06-11 13:18 ` Piergiorgio Sartor
2011-06-11 14:53 ` David Brown
2011-06-11 15:05 ` Joe Landman
2011-06-11 16:31 ` David Brown
2011-06-11 16:57 ` Joe Landman
2011-06-12 9:05 ` David Brown
2011-06-11 17:14 ` Joe Landman
2011-06-11 18:05 ` David Brown
2011-06-10 9:03 ` David Brown
2011-06-10 13:56 ` Bill Davidsen
2011-06-09 22:42 ` David Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='isqb2o$g0s$1@dough.gmane.org' \
--to=david@westcontrol.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).