public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: linux-xfs@oss.sgi.com
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 r-m-w, op-journaled fs, SSDs
Date: Sun, 01 May 2011 20:32:22 +0200	[thread overview]
Message-ID: <ipk8vn$s9a$1@dough.gmane.org> (raw)
In-Reply-To: <19901.31958.368144.832086@tree.ty.sabi.co.UK>

On 01/05/11 17:31, Peter Grandi wrote:
> [ ... ]
>
>>> * Can Linux MD do "abbreviated" read-modify-write RAID6
>>> updates like for RAID5? [ ... ]
>
>> No. (patches welcome).
>
> Ahhhm, but let me dig a bit deeper, even if it may be implied in
> the answer: would it be *possible*?
>
> That is, is the double parity scheme used in MS such that it is
> possible to "subtract" the old content of a page and "add" the
> new content of that page to both parity pages?
>

If I've understood the maths correctly, then yes it would be possible. 
But it would involve more calculations, and it is difficult to see where 
the best balance lies between cpu demands and IO demands.  In general, 
calculating the Q parity block for raid6 is processor-intensive - 
there's a fair amount of optimisation done in the normal calculations to 
keep it reasonable.

Basically, the first parity P is a simple calculation:

P = D_0 + D_1 + .. + D_n-1

But Q is more difficult:

Q = D_0 + g.D_1 + g².D_2 + ... + g^(n-1).D_n-1

where "plus" is xor, "times" is a weird function calculated over a 
G(2^8) field, and g is a generator for that field.

If you want to replace D_i, then you can calculate:

P(new) = P(old) + D_i(old) + D_i(new)

Q(new) = Q(old) + g^i.(D_i(old) + D_i(new))

This means multiplying by g_i for whichever block i is being replaced.

The generator and multiply operation are picked to make it relatively 
fast and easy to multiply by g, especially if you've got a processor 
that has vector operations (as most powerful cpus do).  This means that 
the original Q calculation is fairly efficient.  But to do general 
multiplications by g_i is more effort, and will typically involve 
cache-killing lookup tables or multiple steps.


It is probably reasonable to say that when md raid first implemented 
raid6, it made little sense to do these abbreviated parity calculations. 
  But as processors have got faster (and wider, with more cores) while 
disk throughput has made slower progress, it's maybe a different 
balance.  So it's probably both possible and practical to do these 
calculations.  All it needs is someone to spend the time writing the 
code - and lots of people willing to test it.



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2011-05-01 19:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-30 15:27 RAID6 r-m-w, op-journaled fs, SSDs Peter Grandi
2011-04-30 16:02 ` Emmanuel Florac
2011-04-30 19:54   ` Stan Hoeppner
2011-04-30 21:50     ` Michael Monnerie
2011-05-01  3:17       ` Stan Hoeppner
2011-05-01  9:14       ` Emmanuel Florac
2011-05-01  9:11     ` Emmanuel Florac
2011-04-30 22:27 ` NeilBrown
2011-05-01 15:31   ` Peter Grandi
2011-05-01 18:32     ` David Brown [this message]
2011-05-01  9:36 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ipk8vn$s9a$1@dough.gmane.org' \
    --to=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox