From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Brown Subject: Re: Triple-parity raid6 Date: Thu, 09 Jun 2011 13:32:59 +0200 Message-ID: References: <20110609114954.243e9e22@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20110609114954.243e9e22@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids On 09/06/2011 03:49, NeilBrown wrote: > On Thu, 09 Jun 2011 02:01:06 +0200 David Brown > wrote: > >> Has anyone considered triple-parity raid6 ? As far as I can see, it >> should not be significantly harder than normal raid6 - either to >> implement, or for the processor at run-time. Once you have the GF(2= =E2=81=B8) >> field arithmetic in place for raid6, it's just a matter of making >> another parity block in the same way but using a different generator= : >> >> P =3D D_0 + D_1 + D_2 + .. + D_(n.1) >> Q =3D D_0 + g.D_1 + g=C2=B2.D_2 + .. + g^(n-1).D_(n.1) >> R =3D D_0 + h.D_1 + h=C2=B2.D_2 + .. + h^(n-1).D_(n.1) >> >> The raid6 implementation in mdraid uses g =3D 0x02 to generate the s= econd >> parity (based on "The mathematics of RAID-6" - I haven't checked the >> source code). You can make a third parity using h =3D 0x04 and then= get a >> redundancy of 3 disks. (Note - I haven't yet confirmed that this is >> valid for more than 100 data disks - I need to make my checker progr= am >> more efficient first.) >> >> Rebuilding a disk, or running in degraded mode, is just an obvious >> extension to the current raid6 algorithms. If you are missing three >> data blocks, the maths looks hard to start with - but if you express= the >> equations as a set of linear equations and use standard matrix inver= sion >> techniques, it should not be hard to implement. You only need to do >> this inversion once when you find that one or more disks have failed= - >> then you pre-compute the multiplication tables in the same way as is >> done for raid6 today. >> >> In normal use, calculating the R parity is no more demanding than >> calculating the Q parity. And most rebuilds or degraded situations = will >> only involve a single disk, and the data can thus be re-constructed >> using the P parity just like raid5 or two-parity raid6. >> >> >> I'm sure there are situations where triple-parity raid6 would be >> appealing - it has already been implemented in ZFS, and it is only a >> matter of time before two-parity raid6 has a real probability of hit= ting >> an unrecoverable read error during a rebuild. >> >> >> And of course, there is no particular reason to stop at three parity >> blocks - the maths can easily be generalised. 1, 2, 4 and 8 can be = used >> as generators for quad-parity (checked up to 60 disks), and adding 1= 6 >> gives you quintuple parity (checked up to 30 disks) - but that's may= be >> getting a bit paranoid. >> >> >> ref.: >> >> >> >> >> >> > > -ENOPATCH :-) > > I have a series of patches nearly ready which removes a lot of the re= maining > duplication in raid5.c between raid5 and raid6 paths. So there will = be > relative few places where RAID5 and RAID6 do different things - only = the > places where they *must* do different things. > After that, adding a new level or layout which has 'max_degraded =3D=3D= 3' would > be quite easy. > The most difficult part would be the enhancements to libraid6 to gene= rate the > new 'syndrome', and to handle the different recovery possibilities. > > So if you're not otherwise busy this weekend, a patch would be nice := -) > I'm not going to promise any patches, but maybe I can help with the=20 maths. You say the difficult part is the syndrome calculations and=20 recovery - I've got these bits figured out on paper and some=20 quick-and-dirty python test code. On the other hand, I don't really=20 want to get into the md kernel code, or the mdadm code - I haven't done= =20 Linux kernel development before (I mostly program 8-bit microcontroller= s=20 - when I code on Linux, I use Python), and I fear it would take me a=20 long time to get up to speed. However, if the parity generation and recovery is neatly separated into= =20 a libraid6 library, the whole thing becomes much more tractable from my= =20 viewpoint. Since I am new to this, can you tell me where I should get=20 the current libraid6 code? I'm sure google will find some sources for=20 me, but I'd like to make sure I start with whatever version /you/ have. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html