Re: some general questions on RAID

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Phil Turmel <philip@turmel.org>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: some general questions on RAID
Date: Thu, 04 Jul 2013 18:07:40 -0400	[thread overview]
Message-ID: <51D5F22C.70007@turmel.org> (raw)
In-Reply-To: <1372962602.8716.56.camel@heisenberg.scientia.net>

On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote:

> 1) I plan to use dmcrypt and LUKS and had the following stacking in
> mind:
> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
> filesystems
> 
> Basically I use LVM for partitioning here ;-)
> 
> Are there any issues with that order? E.g. I've heard rumours that
> dmcrypt on top of MD performs much worse than vice versa...

Last time I checked, dmcrypt treated barriers as no-ops, so filesystems
that rely on barriers for integrity can be scrambled.  As such, where I
mix LVM and dmcrypt, I do it selectively on top of each LV.

I believe dmcrypt is single-threaded, too.

If either or both of those issues have been corrected, I wouldn't expect
the layering order to matter.  I'd be nice if a lurking dmcrypt dev or
enthusiast would chime in here.

> But when looking at potential disaster recovery... I think not having MD
> directly on top of the HDDs (especially having it above dmcrypt) seems
> stupid.

I don't know that layering matters much in that case, but I can think of
many cases where it could complicate things.

> 2) Chunks / Chunk size
> a) How does MD work in that matter... is it that it _always_ reads
> and/or writes FULL chunks?

No.  It does not.  It doesn't go below 4k though.

> Guess it must at least do so on _write_ for the RAID levels with parity
> (5/6)... but what about read?

No, not even for write.  If an isolated 4k block is written to a raid6,
the corresponding 4k blocks from the other data drives in that stripe
are read, both corresponding parity blocks are computed, and the three
blocks are written.

> And what about read/write with the non-parity RAID levels (1, 0, 10,
> linear)... is the chunk size of any real influence here (in terms of
> reading/writing)?

Not really.  At least, I've seen nothing on this list that shows any
influence.

> b) What's the currently suggested chunk size when having a undetermined
> mix of file sizes? Well it's obviously >= filesystem block size...
> dm-crypt blocksize is always 512B so far so this won't matter... but do
> the LVM physical extents somehow play in (I guess not,... and LVM PEs
> are _NOT_ always FULLY read and/or written - why should they? .. right?)
> From our countless big (hardware) RAID systems at the faculty (we run a
> Tier-2 for the LHC Computing Grid)... experience seems that 256K is best
> for an undetermined mixture of small/medium/large files... and the
> biggest possible chunk size for mostly large files.
> But does the 256K apply to MD RAIDs as well?

For parity raid, large chunk sizes are crazy, IMHO.  As I pointed out in
another mail, I use 16k for all of mine.

> 3) Any extra benefit from the parity?
> What I mean is... does that parity give me kinda "integrity check"...
> I.e. when a drive fails completely (burns down or whatever)... then it's
> clear... the parity is used on rebuild to get the lost chunks back.
>
> But when I only have block errors... and do scrubbing... a) will it tell
> me that/which blocks are damaged... it will it be possible to recover
> the right value by the parity? Assuming of course that block
> error/damage doesn't mean the drive really tells me an error code for
> "BLOCK BROKEN"... but just gives me bogus data?

This capability exists as a separate userspace utility "raid6check" that
is in the process of acceptance into the mdadm toolkit.  It is not built
into the kernel, and Neil Brown has a long blog post explaining why it
shouldn't ever be.  Built-in "check" scrubs will report such mismatches,
and the built-in "repair" scrub fixes them by recomputing all parity
from the data blocks.

Phil

next prev parent reply	other threads:[~2013-07-04 22:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer
2013-07-04 22:07 ` Phil Turmel [this message]
2013-07-04 23:34   ` Christoph Anton Mitterer
2013-07-08  4:48     ` NeilBrown
2013-07-06  1:33   ` Christoph Anton Mitterer
2013-07-06  8:52     ` Stan Hoeppner
2013-07-06 15:15       ` Christoph Anton Mitterer
2013-07-07 16:51         ` Stan Hoeppner
2013-07-07 17:39   ` Milan Broz
2013-07-07 18:01     ` Christoph Anton Mitterer
2013-07-07 18:50       ` Milan Broz
2013-07-07 20:51         ` Christoph Anton Mitterer
2013-07-08  5:40           ` Milan Broz
2013-07-08  4:53     ` NeilBrown
2013-07-08  5:25       ` Milan Broz
2013-07-05  1:13 ` Brad Campbell
2013-07-05  1:39   ` Sam Bingner
2013-07-05  3:06     ` Brad Campbell
2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
2013-07-06  6:23       ` Sam Bingner
2013-07-06 15:11         ` Christoph Anton Mitterer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51D5F22C.70007@turmel.org \
    --to=philip@turmel.org \
    --cc=calestyo@scientia.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.