Re: Triple-parity raid6

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Brown <david@westcontrol.com>
To: linux-raid@vger.kernel.org
Subject: Re: Triple-parity raid6
Date: Fri, 10 Jun 2011 10:45:46 +0200	[thread overview]
Message-ID: <isslla$o2i$1@dough.gmane.org> (raw)
In-Reply-To: <87aadq5q1l.fsf@gmail.com>

On 10/06/2011 05:22, Namhyung Kim wrote:
> NeilBrown<neilb@suse.de>  writes:
>> On Thu, 09 Jun 2011 13:32:59 +0200 David Brown<david@westcontrol.com>  wrote:
>>
>> You can see the current kernel code at:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=tree;f=lib/raid6;h=970c541a452d3b9983223d74b10866902f1a47c7;hb=HEAD
>>
>>
>> int.uc is the generic C code which 'unroll.awk' processes to make various
>> versions that unroll the loops different amounts to work with CPUs with
>> different numbers of registers.
>> Then there is sse1, sse2, altivec which provide the same functionality in
>> assembler which is optimised for various processors.
>>
>> And 'recov' has the smarts for doing the reverse calculation when 2 data
>> blocks, or 1 data and P are missing.
>>
>> Even if you don't feel up to implementing everything, a start might be
>> useful.  You never know when someone might jump up and offer to help.
>>
>
> Maybe I could help David to some extent. :)
>
> I'm gonna read the raid6 code next week and hope that there is a room I
> can help with, FWIW.
>

No matter how far I manage to get, I'm going to need help from someone 
who knows the system and the development process, how the new functions 
would fit with the rest of the system, and not least people who can 
check and test the code.

Making multiple parity syndromes is easy enough mathematically:

For each parity bit P_j, you have a generator g_j and calculate for d_i 
running over all data disks:

	P_j = sum((g_j ^ i) . d_i)

Raid5 parity uses g_0 = 1, so it is just the xor.
Raid6 uses g_0 = 1, g_1 = 2.

Any independent generators could be used, of course.  I am not sure how 
to prove that a set of generators is independent (except in the easy 
case of 2 generators, as shown in the raid6.pdf paper) - but brute force 
testing over all choices of dead disks is easy enough.

For Raid7, the obvious choice is g_2 = 4 - then we can re-use existing 
macros and optimisations.

I've had a brief look at int.uc - the gen_syndrome function is nice and 
small, but that's before awk gets at it.  I haven't yet tried building 
any of this, but extending raid6_int to a third syndrome is, I hope, is 
straightforward:

static void raid7_int$#_gen_syndrome(int disks, size_t bytes, void **ptrs)
{
         u8 **dptr = (u8 **)ptrs;
         u8 *p, *q, *r;
         int d, z, z0;

         unative_t wd$$, wq$$, wp$$, w1$$, w2$$, wr$$, w3$$, w4$$;

         z0 = disks - 4;         /* Highest data disk */
         p = dptr[z0+1];         /* XOR parity */
         q = dptr[z0+2];         /* RS syndrome */
         r = dptr[z0+3];         /* RS syndrome 2 */

         for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
                 wr$$ = wq$$ = wp$$ = *(unative_t *)&dptr[z0][d+$$*NSIZE];
                 for ( z = z0-1 ; z >= 0 ; z-- ) {
                         wd$$ = *(unative_t *)&dptr[z][d+$$*NSIZE];
                         wp$$ ^= wd$$;
                         w2$$ = MASK(wq$$);
                         w1$$ = SHLBYTE(wq$$);
                         w2$$ &= NBYTES(0x1d);
                         w1$$ ^= w2$$;
                         wq$$ = w1$$ ^ wd$$;

                         w4$$ = MASK(wr$$);
                         w3$$ = SHLBYTE(wr$$);
                         w4$$ &= NBYTES(0x1d);
                         w3$$ ^= w4$$;
                         w4$$ = MASK(w3$$);
                         w3$$ = SHLBYTE(w3$$);
                         w4$$ &= NBYTES(0x1d);
                         w3$$ ^= w4$$;
                         wr$$ = w3$$ ^ wd$$;
                 }
                 *(unative_t *)&p[d+NSIZE*$$] = wp$$;
                 *(unative_t *)&q[d+NSIZE*$$] = wq$$;
                 *(unative_t *)&r[d+NSIZE*$$] = wr$$;
         }
}

I wrote the wr$$ calculations using a second set of working variables. 
I don't know (yet) what compiler options are used to generate the target 
code, nor what the awk'ed code looks like.  If the compiler can handle 
the scheduling to interlace the Q and R calculations and reduce pipeline 
delays, then that's great.  If not, then they can be manually interlaced 
if it helps the code.  But as I'm writing this blind (on a windows 
machine, no less), I don't know if it makes a difference.

As a general point regarding optimisations, is there a minimum level of 
gcc we can expect to have here?  And are there rules about compiler 
flags?  Later versions of gcc are getting very smart about 
vectorisation, and generating multiple versions of code that are 
selected automatically at run-time depending on the capabilities of the 
processor.  My hope is that the code can be greatly simplified by 
writing a single clear version in C that runs fast and takes advantage 
of the cpu, without needing to make explicit SSE, AtliVec, etc., 
versions.  Are there rules about which gcc attributes or extensions we 
can use?

Another question - what should we call this?  Raid 7?  Raid 6.3?  Raid 
7.3?  While we are in the mood for triple-parity Raid, we can easily 
extend it to quad-parity or more - the name should probably be flexible 
enough to allow that.

next prev parent reply	other threads:[~2011-06-10  8:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  0:01 Triple-parity raid6 David Brown
2011-06-09  1:49 ` NeilBrown
2011-06-09 11:32   ` David Brown
2011-06-09 12:04     ` NeilBrown
2011-06-09 19:19       ` David Brown
2011-06-10  3:22       ` Namhyung Kim
2011-06-10  8:45         ` David Brown [this message]
2011-06-10 12:20           ` Christoph Dittmann
2011-06-10 14:28             ` David Brown
2011-06-11 10:13               ` Piergiorgio Sartor
2011-06-11 11:51                 ` David Brown
2011-06-11 13:18                   ` Piergiorgio Sartor
2011-06-11 14:53                     ` David Brown
2011-06-11 15:05                       ` Joe Landman
2011-06-11 16:31                         ` David Brown
2011-06-11 16:57                           ` Joe Landman
2011-06-12  9:05                             ` David Brown
2011-06-11 17:14                           ` Joe Landman
2011-06-11 18:05                             ` David Brown
2011-06-10  9:03       ` David Brown
2011-06-10 13:56       ` Bill Davidsen
2011-06-09 22:42 ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='isslla$o2i$1@dough.gmane.org' \
    --to=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).