Re: some general questions on RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: some general questions on RAID
Date: Thu, 04 Jul 2013 18:07:40 -0400	[thread overview]
Message-ID: <51D5F22C.70007@turmel.org> (raw)
In-Reply-To: <1372962602.8716.56.camel@heisenberg.scientia.net>

On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote:

> 1) I plan to use dmcrypt and LUKS and had the following stacking in
> mind:
> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
> filesystems
> 
> Basically I use LVM for partitioning here ;-)
> 
> Are there any issues with that order? E.g. I've heard rumours that
> dmcrypt on top of MD performs much worse than vice versa...

Last time I checked, dmcrypt treated barriers as no-ops, so filesystems
that rely on barriers for integrity can be scrambled.  As such, where I
mix LVM and dmcrypt, I do it selectively on top of each LV.

I believe dmcrypt is single-threaded, too.

If either or both of those issues have been corrected, I wouldn't expect
the layering order to matter.  I'd be nice if a lurking dmcrypt dev or
enthusiast would chime in here.

> But when looking at potential disaster recovery... I think not having MD
> directly on top of the HDDs (especially having it above dmcrypt) seems
> stupid.

I don't know that layering matters much in that case, but I can think of
many cases where it could complicate things.

> 2) Chunks / Chunk size
> a) How does MD work in that matter... is it that it _always_ reads
> and/or writes FULL chunks?

No.  It does not.  It doesn't go below 4k though.

> Guess it must at least do so on _write_ for the RAID levels with parity
> (5/6)... but what about read?

No, not even for write.  If an isolated 4k block is written to a raid6,
the corresponding 4k blocks from the other data drives in that stripe
are read, both corresponding parity blocks are computed, and the three
blocks are written.

> And what about read/write with the non-parity RAID levels (1, 0, 10,
> linear)... is the chunk size of any real influence here (in terms of
> reading/writing)?

Not really.  At least, I've seen nothing on this list that shows any
influence.

> b) What's the currently suggested chunk size when having a undetermined
> mix of file sizes? Well it's obviously >= filesystem block size...
> dm-crypt blocksize is always 512B so far so this won't matter... but do
> the LVM physical extents somehow play in (I guess not,... and LVM PEs
> are _NOT_ always FULLY read and/or written - why should they? .. right?)
> From our countless big (hardware) RAID systems at the faculty (we run a
> Tier-2 for the LHC Computing Grid)... experience seems that 256K is best
> for an undetermined mixture of small/medium/large files... and the
> biggest possible chunk size for mostly large files.
> But does the 256K apply to MD RAIDs as well?

For parity raid, large chunk sizes are crazy, IMHO.  As I pointed out in
another mail, I use 16k for all of mine.

> 3) Any extra benefit from the parity?
> What I mean is... does that parity give me kinda "integrity check"...
> I.e. when a drive fails completely (burns down or whatever)... then it's
> clear... the parity is used on rebuild to get the lost chunks back.
>
> But when I only have block errors... and do scrubbing... a) will it tell
> me that/which blocks are damaged... it will it be possible to recover
> the right value by the parity? Assuming of course that block
> error/damage doesn't mean the drive really tells me an error code for
> "BLOCK BROKEN"... but just gives me bogus data?

This capability exists as a separate userspace utility "raid6check" that
is in the process of acceptance into the mdadm toolkit.  It is not built
into the kernel, and Neil Brown has a long blog post explaining why it
shouldn't ever be.  Built-in "check" scrubs will report such mismatches,
and the built-in "repair" scrub fixes them by recomputing all parity
from the data blocks.

Phil

next prev parent reply	other threads:[~2013-07-04 22:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer
2013-07-04 22:07 ` Phil Turmel [this message]
2013-07-04 23:34   ` Christoph Anton Mitterer
2013-07-08  4:48     ` NeilBrown
2013-07-06  1:33   ` Christoph Anton Mitterer
2013-07-06  8:52     ` Stan Hoeppner
2013-07-06 15:15       ` Christoph Anton Mitterer
2013-07-07 16:51         ` Stan Hoeppner
2013-07-07 17:39   ` Milan Broz
2013-07-07 18:01     ` Christoph Anton Mitterer
2013-07-07 18:50       ` Milan Broz
2013-07-07 20:51         ` Christoph Anton Mitterer
2013-07-08  5:40           ` Milan Broz
2013-07-08  4:53     ` NeilBrown
2013-07-08  5:25       ` Milan Broz
2013-07-05  1:13 ` Brad Campbell
2013-07-05  1:39   ` Sam Bingner
2013-07-05  3:06     ` Brad Campbell
2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
2013-07-06  6:23       ` Sam Bingner
2013-07-06 15:11         ` Christoph Anton Mitterer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D5F22C.70007@turmel.org \
    --to=philip@turmel.org \
    --cc=calestyo@scientia.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).