raid6 and parity calculations

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid6 and parity calculations
@ 2010-09-14 14:45 Michael Sallaway
  2010-09-15 10:26 ` Neil Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Sallaway @ 2010-09-14 14:45 UTC (permalink / raw)
  To: linux-raid

Hi,

I've been looking through the drivers/md code, and I've got a few questions about the RAID6 parity calculations that have me stumped.

I can see that when recovering 1 or 2 data sections, it calls functions based on the content that we're recovering (eg. async_gen_syndrome, async_xor, async_raid6_datap_recov, etc.) However, the length parameter is always given as STRIPE_SIZE, which from what I can tell is the same as PAGE_SIZE, which for vanilla systems like the one I'm playing with is 4096 bytes.

The thing that I can't figure out is how this interacts with the RAID6 chunk size; the array I'm playing with has a default chunk size (64kb), which I understand means that there's 64kb of data striped across each disk (bar two), then 64kb of P, then 64kb of Q for the first stripe, correct? If so, I can't figure out where the whole parity calculation is done for all 64kb. There's no loops, no recursion, or anything that would process it that I can find. I'm obviously missing something here, can anyone enlighten me?

Thanks for any advice or pointers!

Cheers,
Michael

(as a side note: I'm playing with all this as I've managed to royally screw up an array which had 2 dropped drives, by readding them back in (in what appears to be the wrong order). That would have been fine if thr rebuild finished completely, however the rebuild failed a few percent in, so now I have 2 drives with "swapped" data. That is, drive A contains the data for raid member 4 for the first x%, and raid member 5 for the rest, and drive B contains the data for raid member 5 for the first x% and raid member 4 for the rest. So I'm trying to write a userspace program to manually go through the array members, inspecting each stripe, and manually doing parity calculations for a range of drive permutations to try and see what looks sensible, hence I'm trying to understand what's ON the driv
 e to reverse engineer it.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid6 and parity calculations
  2010-09-14 14:45 raid6 and parity calculations Michael Sallaway
@ 2010-09-15 10:26 ` Neil Brown
  0 siblings, 0 replies; 4+ messages in thread
From: Neil Brown @ 2010-09-15 10:26 UTC (permalink / raw)
  To: Michael Sallaway; +Cc: linux-raid

On Tue, 14 Sep 2010 14:45:40 +0000
"Michael Sallaway" <michael@sallaway.com> wrote:

> Hi,
> 
> I've been looking through the drivers/md code, and I've got a few questions about the RAID6 parity calculations that have me stumped.
> 
> I can see that when recovering 1 or 2 data sections, it calls functions based on the content that we're recovering (eg. async_gen_syndrome, async_xor, async_raid6_datap_recov, etc.) However, the length parameter is always given as STRIPE_SIZE, which from what I can tell is the same as PAGE_SIZE, which for vanilla systems like the one I'm playing with is 4096 bytes.
> 
> The thing that I can't figure out is how this interacts with the RAID6 chunk size; the array I'm playing with has a default chunk size (64kb), which I understand means that there's 64kb of data striped across each disk (bar two), then 64kb of P, then 64kb of Q for the first stripe, correct? If so, I can't figure out where the whole parity calculation is done for all 64kb. There's no loops, no recursion, or anything that would process it that I can find. I'm obviously missing something here, can anyone enlighten me?
> 
> Thanks for any advice or pointers!

It is best not to think to think to much about chunks.  Think about strips
(not stripes).
A strip is a set of blocks, one per device each at the same offset.
Think of page sizes blocks/strips.
Each strip has a P block and a Q block and a bunch of data blocks.  Which
is P and which is Q and which each data block is a function of the offset,
the layout and the chunk size.  Once you have used the chunksize to perform
that calculation, don't think about chunks any more - just blocks and strips.

Hope that helps.

> 
> Cheers,
> Michael
> 
> 
> (as a side note: I'm playing with all this as I've managed to royally screw up an array which had 2 dropped drives, by readding them back in (in what appears to be the wrong order). That would have been fine if thr rebuild finished completely, however the rebuild failed a few percent in, so now I have 2 drives with "swapped" data. That is, drive A contains the data for raid member 4 for the first x%, and raid member 5 for the rest, and drive B contains the data for raid member 5 for the first x% and raid member 4 for the rest. So I'm trying to write a userspace program to manually go through the array members, inspecting each stripe, and manually doing parity calculations for a range of drive permutations to try and see what looks sensible, hence I'm trying to understand what's ON the dr
 ive to reverse engineer it.)

Ouch... good luck.

NeilBrown


> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid6 and parity calculations
@ 2010-09-15 15:55 Michael Sallaway
  2010-09-15 16:07 ` Andre Noll
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Sallaway @ 2010-09-15 15:55 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>  -------Original Message-------
>  From: Neil Brown <neilb@suse.de>
>  To: Michael Sallaway <michael@sallaway.com>
>  Cc: linux-raid@vger.kernel.org
>  Subject: Re: raid6 and parity calculations
>  Sent: 15 Sep '10 10:26

>  It is best not to think to think to much about chunks.  Think about strips
>  (not stripes).
>  A strip is a set of blocks, one per device each at the same offset.
>  Think of page sizes blocks/strips.
>  Each strip has a P block and a Q block and a bunch of data blocks.  Which
>  is P and which is Q and which each data block is a function of the offset,
>  the layout and the chunk size.  Once you have used the chunksize to perform
>  that calculation, don't think about chunks any more - just blocks and strips.
>  

Aah, perfect -- that makes sense, thanks for that.

As a sort-of follow up question, would anyone know if the data size of a Q calculation affects the result at all? eg. if I do a 64kb Q calculation on 10 drives of data, would that be the same as doing 16x 4kb Q calculations on sequential blocks of the same data, then concatenating it together? (I can't remember what that operation property is called....?)

I've been reading the maths of RAID6 PDF (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf), but I'm a bit too rusty to understand Galois fields, and if the data size matters. I presume the data ordering is also critical for a Q calculation, correct? (eg. drives have to be d0 -> d10 in order, not just random).

And, in contrast, for the P calculations, data size and input order makes no difference, correct? (since it's just a simple bitwise XOR of all the inputs).

>  
>  Ouch... good luck.

Thanks! I'm the only one to blame, though -- it happened in the month between "getting the new system set up" and "setting up backups for the new system". So it's the only copy of the data.... whoops. :-)

Thanks for the help/advice!

Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: raid6 and parity calculations
  2010-09-15 15:55 Michael Sallaway
@ 2010-09-15 16:07 ` Andre Noll
  0 siblings, 0 replies; 4+ messages in thread
From: Andre Noll @ 2010-09-15 16:07 UTC (permalink / raw)
  To: Michael Sallaway; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Wed, Sep 15, 15:55, Michael Sallaway wrote:
> As a sort-of follow up question, would anyone know if the data size of
> a Q calculation affects the result at all? eg. if I do a 64kb Q
> calculation on 10 drives of data, would that be the same as doing 16x
> 4kb Q calculations on sequential blocks of the same data, then
> concatenating it together? (I can't remember what that operation
> property is called....?)

Yes, the result would be the same. In fact, byte n of Q depends only
on byte n of the 10/16 data drives.

> I've been reading the maths of RAID6 PDF
> (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf), but I'm a
> bit too rusty to understand Galois fields, and if the data size
> matters. I presume the data ordering is also critical for a Q
> calculation, correct? (eg. drives have to be d0 -> d10 in order, not
> just random).

Right, order matters.

> And, in contrast, for the P calculations, data size and input order
> makes no difference, correct? (since it's just a simple bitwise XOR of
> all the inputs).

Also correct.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-09-15 16:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-14 14:45 raid6 and parity calculations Michael Sallaway
2010-09-15 10:26 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2010-09-15 15:55 Michael Sallaway
2010-09-15 16:07 ` Andre Noll

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).