All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: linux-raid@vger.kernel.org
Subject: Re: Triple-parity raid6
Date: Sun, 12 Jun 2011 11:05:40 +0200	[thread overview]
Message-ID: <it1vh4$3fh$1@dough.gmane.org> (raw)
In-Reply-To: <4DF39E8B.6090106@gmail.com>

On 11/06/11 18:57, Joe Landman wrote:
> On 06/11/2011 12:31 PM, David Brown wrote:
>
>> What has changed over the years is that there is no longer such a need
>> for manual assembly code to get optimal speed out of the cpu. While
>
> Hmmm ... I've done studies on this using an incredibly simple function
> (Riemann Zeta Function c.f. http://scalability.org/?p=470 ). The short
> version is that hand optimized SSE2 is ~4x faster (for this case) than
> best optimization of high level code. Hand optimized assembler is even
> better.
>
>> writing such assembly is fun, it is time-consuming to write and hard to
>> maintain, especially for code that must run on so many different
>> platforms.
>
> Yes, it is generally hard to write and maintain. But it you can get the
> rest of the language semantics out of the way. If you look at the tests
> that Linux does when it starts up, you can see a fairly wide
> distribution in the performance.
>
> raid5: using function: generic_sse (13356.000 MB/sec)
> raid6: int64x1 3507 MB/s
> raid6: int64x2 3886 MB/s
> raid6: int64x4 3257 MB/s
> raid6: int64x8 3054 MB/s
> raid6: sse2x1 8347 MB/s
> raid6: sse2x2 9695 MB/s
> raid6: sse2x4 10972 MB/s
>
> Some of these are hand coded assembly. See
> ${KERNEL_SOURCE}/drivers/md/raid6sse2.c and look at the
> raid6_sse24_gen_syndrome code.
>
> Really, to get the best performance out of the system, requires a fairly
> deep understanding of how the processor/memory system operates. These
> functions do use the SSE registers, but we can have only so many SSE
> operations in flight at once. These processors can generally have quite
> a few simultaneous operations in flight at once, so a knowledge about
> that, and the mix of operations, and how the interact with the
> instruction scheduler in the hardware, is fairly essential to getting
> good performance.
>

I am not suggesting that hand-coding assembly won't make the 
calculations faster - just that better compiler optimisations (which 
will automatically make use of sse instructions) will make the generic 
code closer to the theoretical maximum.

Out of curiosity, have you re-tried your zeta function code using a more 
modern version of gcc?  A lot has happened with gcc since 4.1 - in 
particular, the "graphite" code in gcc 4.4 will make a big difference to 
code that loops through a lot of data (it re-arranges the loops to 
unroll inner blocks, and to make loop strides match cache sizes).


>>
>>> We are interested in working on this capability (and more generic
>>> capability) as well.
>>>
>>> Is anyone in particular starting to design/code this? Please let me
>>> know.
>>>
>>
>> Well, I am currently trying to write up some of the maths - I started
>> the thread because I had been playing around with the maths, and thought
>> it should work. I made a brief stab at writing a
>> "raid7_int$#_gen_syndrome()" function, but I haven't done any testing
>> with it (or even tried to compile it) - first I want to be sure of the
>> algorithms.
>
> I've been coding various bits as "pseudocode" using Octave. Makes
> checking with the built in Galios functions pretty easy.
>
> I haven't looked at the math behind the triple parity syndrome calc yet,
> though I'd imagine someone has, and can write it down. If someone hasn't
> done that yet, its a good first step. Then we can code the simple
> version from there with test drivers/cases, and then start optimizing
> the implementation.
>
>



  reply	other threads:[~2011-06-12  9:05 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09  0:01 Triple-parity raid6 David Brown
2011-06-09  1:49 ` NeilBrown
2011-06-09 11:32   ` David Brown
2011-06-09 12:04     ` NeilBrown
2011-06-09 19:19       ` David Brown
2011-06-10  3:22       ` Namhyung Kim
2011-06-10  8:45         ` David Brown
2011-06-10 12:20           ` Christoph Dittmann
2011-06-10 14:28             ` David Brown
2011-06-11 10:13               ` Piergiorgio Sartor
2011-06-11 11:51                 ` David Brown
2011-06-11 13:18                   ` Piergiorgio Sartor
2011-06-11 14:53                     ` David Brown
2011-06-11 15:05                       ` Joe Landman
2011-06-11 16:31                         ` David Brown
2011-06-11 16:57                           ` Joe Landman
2011-06-12  9:05                             ` David Brown [this message]
2011-06-11 17:14                           ` Joe Landman
2011-06-11 18:05                             ` David Brown
2011-06-10  9:03       ` David Brown
2011-06-10 13:56       ` Bill Davidsen
2011-06-09 22:42 ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='it1vh4$3fh$1@dough.gmane.org' \
    --to=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.