some general questions on RAID

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* some general questions on RAID
@ 2013-07-04 18:30 Christoph Anton Mitterer
  2013-07-04 22:07 ` Phil Turmel
  2013-07-05  1:13 ` Brad Campbell
  0 siblings, 2 replies; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-04 18:30 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

Hi.

Well for me personally these are follow up questions to my scenario
presented here: http://thread.gmane.org/gmane.linux.raid/43405

But I think these questions would be generally interesting an I'd like
to add them to the Debian FAQ for mdadm (and haven't found real good
answers in the archives/google).

1) I plan to use dmcrypt and LUKS and had the following stacking in
mind:
physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
filesystems

Basically I use LVM for partitioning here ;-)

Are there any issues with that order? E.g. I've heard rumours that
dmcrypt on top of MD performs much worse than vice versa...

But when looking at potential disaster recovery... I think not having MD
directly on top of the HDDs (especially having it above dmcrypt) seems
stupid.

2) Chunks / Chunk size
a) How does MD work in that matter... is it that it _always_ reads
and/or writes FULL chunks?
Guess it must at least do so on _write_ for the RAID levels with parity
(5/6)... but what about read?
And what about read/write with the non-parity RAID levels (1, 0, 10,
linear)... is the chunk size of any real influence here (in terms of
reading/writing)?

b) What's the currently suggested chunk size when having a undetermined
mix of file sizes? Well it's obviously >= filesystem block size...
dm-crypt blocksize is always 512B so far so this won't matter... but do
the LVM physical extents somehow play in (I guess not,... and LVM PEs
are _NOT_ always FULLY read and/or written - why should they? .. right?)
From our countless big (hardware) RAID systems at the faculty (we run a
Tier-2 for the LHC Computing Grid)... experience seems that 256K is best
for an undetermined mixture of small/medium/large files... and the
biggest possible chunk size for mostly large files.
But does the 256K apply to MD RAIDs as well?

3) Any extra benefit from the parity?
What I mean is... does that parity give me kinda "integrity check"...
I.e. when a drive fails completely (burns down or whatever)... then it's
clear... the parity is used on rebuild to get the lost chunks back.

But when I only have block errors... and do scrubbing... a) will it tell
me that/which blocks are damaged... it will it be possible to recover
the right value by the parity? Assuming of course that block
error/damage doesn't mean the drive really tells me an error code for
"BLOCK BROKEN"... but just gives me bogus data?

Thanks again,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5165 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer
@ 2013-07-04 22:07 ` Phil Turmel
  2013-07-04 23:34   ` Christoph Anton Mitterer
                     ` (2 more replies)
  2013-07-05  1:13 ` Brad Campbell
  1 sibling, 3 replies; 21+ messages in thread
From: Phil Turmel @ 2013-07-04 22:07 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote:

> 1) I plan to use dmcrypt and LUKS and had the following stacking in
> mind:
> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
> filesystems
> 
> Basically I use LVM for partitioning here ;-)
> 
> Are there any issues with that order? E.g. I've heard rumours that
> dmcrypt on top of MD performs much worse than vice versa...

Last time I checked, dmcrypt treated barriers as no-ops, so filesystems
that rely on barriers for integrity can be scrambled.  As such, where I
mix LVM and dmcrypt, I do it selectively on top of each LV.

I believe dmcrypt is single-threaded, too.

If either or both of those issues have been corrected, I wouldn't expect
the layering order to matter.  I'd be nice if a lurking dmcrypt dev or
enthusiast would chime in here.

> But when looking at potential disaster recovery... I think not having MD
> directly on top of the HDDs (especially having it above dmcrypt) seems
> stupid.

I don't know that layering matters much in that case, but I can think of
many cases where it could complicate things.

> 2) Chunks / Chunk size
> a) How does MD work in that matter... is it that it _always_ reads
> and/or writes FULL chunks?

No.  It does not.  It doesn't go below 4k though.

> Guess it must at least do so on _write_ for the RAID levels with parity
> (5/6)... but what about read?

No, not even for write.  If an isolated 4k block is written to a raid6,
the corresponding 4k blocks from the other data drives in that stripe
are read, both corresponding parity blocks are computed, and the three
blocks are written.

> And what about read/write with the non-parity RAID levels (1, 0, 10,
> linear)... is the chunk size of any real influence here (in terms of
> reading/writing)?

Not really.  At least, I've seen nothing on this list that shows any
influence.

> b) What's the currently suggested chunk size when having a undetermined
> mix of file sizes? Well it's obviously >= filesystem block size...
> dm-crypt blocksize is always 512B so far so this won't matter... but do
> the LVM physical extents somehow play in (I guess not,... and LVM PEs
> are _NOT_ always FULLY read and/or written - why should they? .. right?)
> From our countless big (hardware) RAID systems at the faculty (we run a
> Tier-2 for the LHC Computing Grid)... experience seems that 256K is best
> for an undetermined mixture of small/medium/large files... and the
> biggest possible chunk size for mostly large files.
> But does the 256K apply to MD RAIDs as well?

For parity raid, large chunk sizes are crazy, IMHO.  As I pointed out in
another mail, I use 16k for all of mine.

> 3) Any extra benefit from the parity?
> What I mean is... does that parity give me kinda "integrity check"...
> I.e. when a drive fails completely (burns down or whatever)... then it's
> clear... the parity is used on rebuild to get the lost chunks back.
>
> But when I only have block errors... and do scrubbing... a) will it tell
> me that/which blocks are damaged... it will it be possible to recover
> the right value by the parity? Assuming of course that block
> error/damage doesn't mean the drive really tells me an error code for
> "BLOCK BROKEN"... but just gives me bogus data?

This capability exists as a separate userspace utility "raid6check" that
is in the process of acceptance into the mdadm toolkit.  It is not built
into the kernel, and Neil Brown has a long blog post explaining why it
shouldn't ever be.  Built-in "check" scrubs will report such mismatches,
and the built-in "repair" scrub fixes them by recomputing all parity
from the data blocks.

Phil

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 22:07 ` Phil Turmel
@ 2013-07-04 23:34   ` Christoph Anton Mitterer
  2013-07-08  4:48     ` NeilBrown
  2013-07-06  1:33   ` Christoph Anton Mitterer
  2013-07-07 17:39   ` Milan Broz
  2 siblings, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-04 23:34 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5185 bytes --]

On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote:
> Last time I checked, dmcrypt treated barriers as no-ops, so filesystems
> that rely on barriers for integrity can be scrambled.
Whow... uhmm... that would be awful... (since I already use ext4 on top
of dmcrypt..
But wouldn't that be a general problem of dmcrypt, unrelated to any
further stacking of LVM and/or MD?!

> As such, where I
> mix LVM and dmcrypt, I do it selectively on top of each LV.
Don't understand what you exactly mean/do and why it should help against
the barrier thingy?


> I believe dmcrypt is single-threaded, too.
I've had thought that, too,..at least it used to be.
So that would basically mean... if I put dmcrypt on top of MD, then one
single thread would handle the whole encryption for the whole MD and
therefore also on all of my (e.g. 4) devices?

But when I do it the other way round... MD being on top of dmcrypt...
than each physical device would get it's own dmcrypt device... and also
it's own thread, using potentially more CPUs?

Well the QNAP in question would have 2 cores with HT so 4 threads...
anyone here with an idea whether the performance boost would be worth
running dmcrypt below MD (which somehow sounds ugly and wrong).


> If either or both of those issues have been corrected, I wouldn't expect
> the layering order to matter.  I'd be nice if a lurking dmcrypt dev or
> enthusiast would chime in here.
I've mailed Milan Broz a pointer to this thread and hope he finds some
time to have a look at :)

If so, a question in addition,... if it's not already done,... are there
plans to make dmcrypt multi-threaded (so I could just wait for it and
put MD below it)?


> > But when looking at potential disaster recovery... I think not having MD
> > directly on top of the HDDs (especially having it above dmcrypt) seems
> > stupid.
> I don't know that layering matters much in that case, but I can think of
> many cases where it could complicate things.
What exactly do you mean?

My idea was that when MD is directly above the physical device... then I
will roughly which kind block should be where and how data blocks should
yield parity blocks... i.e. when I do disc forensics or plain dd access.
When dmcrypt is below, though,... all physical devices will look
completely like garbage.


> > 2) Chunks / Chunk size
> > a) How does MD work in that matter... is it that it _always_ reads
> > and/or writes FULL chunks?
>
> No.  It does not.  It doesn't go below 4k though.
So what does that mean exactly? It always reads/writes at least 4k
blocks?


> > Guess it must at least do so on _write_ for the RAID levels with parity
> > (5/6)... but what about read?
> No, not even for write.
:-O

> If an isolated 4k block is written to a raid6,
> the corresponding 4k blocks from the other data drives in that stripe
> are read, both corresponding parity blocks are computed, and the three
> blocks are written.
okay that's clear... but uhm... why having chuk sizes then? I mean
what's the difference when having a 128k chunk vs. a 256k one... when
the parity/data blocks seem to be split in 4k blocks,... or did I get
that completely wrong?


> > And what about read/write with the non-parity RAID levels (1, 0, 10,
> > linear)... is the chunk size of any real influence here (in terms of
> > reading/writing)?
> Not really.  At least, I've seen nothing on this list that shows any
> influence.
So AFAIU now:
a) Regardless the RAID level and regardless the chunk size,
   - data blocks are read/written in 4KiB blocks
   - when there IS parity information... then that parity information is _ALWAYS_ read/computed/written in 4KiB blocks.
b) The chunks basically just control how much consecutive data is on one
device, thereby allowing to speed up / slow down reads/write for small /
large files.
But that should basically only matter on seeking devices, i.e. not on
SSDs... thus the chunk size is irrelevant on SSDs...

Is all that right? Phil, Neil? :D



> > b) What's the currently suggested chunk size when having a undetermined
> > [snip]
> For parity raid, large chunk sizes are crazy, IMHO.  As I pointed out in
> another mail, I use 16k for all of mine.
Sounds contradicting to the 4 KiB parity blocks idea?! So why? Or do you
have by chance a URL to your other mail? :)



> > 3) Any extra benefit from the parity?
> > [snip]
> This capability exists as a separate userspace utility "raid6check" that
> is in the process of acceptance into the mdadm toolkit.
Interesting... just looking at it.

>   It is not built
> into the kernel, and Neil Brown has a long blog post explaining why it
> shouldn't ever be.
I'll search for it...

>   Built-in "check" scrubs will report such mismatches,
> and the built-in "repair" scrub fixes them by recomputing all parity
> from the data blocks.
So that basically means, that parity RAID (i.e. RAID6) *HAS* an
resilience advantage even over the 3 block copy RAID10 version that
we've discussed over there[0], right?



Thanks a lot,
Chris.

[0] http://thread.gmane.org/gmane.linux.raid/43405/focus=43407

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 23:34   ` Christoph Anton Mitterer
@ 2013-07-08  4:48     ` NeilBrown
  0 siblings, 0 replies; 21+ messages in thread
From: NeilBrown @ 2013-07-08  4:48 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3266 bytes --]

On Fri, 05 Jul 2013 01:34:42 +0200 Christoph Anton Mitterer
<calestyo@scientia.net> wrote:

> 
> > > 2) Chunks / Chunk size
> > > a) How does MD work in that matter... is it that it _always_ reads
> > > and/or writes FULL chunks?
> >
> > No.  It does not.  It doesn't go below 4k though.
> So what does that mean exactly? It always reads/writes at least 4k
> blocks?

RAID1 reads or writes whatever block size the filesystem sends
RAID0,10 read or write whatever block size the filesystem sends up to the
chunk size (and obviously less than chunksize when not aligned with the
chunksize).

RAID4/5/6 read the same as RAID0 when not degraded.
When degraded or when writing, RAID4/5/6 does all IO in 4K blocks (hoping
that the lower layers will merge as appropriate).

> 
> 
> > > Guess it must at least do so on _write_ for the RAID levels with parity
> > > (5/6)... but what about read?
> > No, not even for write.
> :-O
> 
> > If an isolated 4k block is written to a raid6,
> > the corresponding 4k blocks from the other data drives in that stripe
> > are read, both corresponding parity blocks are computed, and the three
> > blocks are written.
> okay that's clear... but uhm... why having chuk sizes then? I mean
> what's the difference when having a 128k chunk vs. a 256k one... when
> the parity/data blocks seem to be split in 4k blocks,... or did I get
> that completely wrong?

A sequential read that only hits one chunk will be served faster than one
which hits two chunks.  So making the chunksize 1-2 times your typical block
size for random reads can help read performance.
For very large sequential reads it shouldn't really matter though large chunk
sizes tend to result in larger IO requests to the underlying devices.
For very small random reads it shouldn't really matter either.

For writes, you want the stripe size (chunksize * (drives - parity_drives))
to match the typical size for writes - and you want those writes to be
aligned.

So the ideal load is smallish reads and largish writes with a chunksize
between the two.

> 
> 
> > > And what about read/write with the non-parity RAID levels (1, 0, 10,
> > > linear)... is the chunk size of any real influence here (in terms of
> > > reading/writing)?
> > Not really.  At least, I've seen nothing on this list that shows any
> > influence.
> So AFAIU now:
> a) Regardless the RAID level and regardless the chunk size,
>    - data blocks are read/written in 4KiB blocks
>    - when there IS parity information... then that parity information is _ALWAYS_ read/computed/written in 4KiB blocks.
> b) The chunks basically just control how much consecutive data is on one
> device, thereby allowing to speed up / slow down reads/write for small /
> large files.
> But that should basically only matter on seeking devices, i.e. not on
> SSDs... thus the chunk size is irrelevant on SSDs...

Seeks are cheaper in SSDs than on spinning rust but the cost is not zero.

If you are concerned about the effect of chunksize on performance, you should
measure the performance of your hardware with your workload with differing
chunk sizes and come to your own conclusion.
All anyone else can do is offer generalities.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 22:07 ` Phil Turmel
  2013-07-04 23:34   ` Christoph Anton Mitterer
@ 2013-07-06  1:33   ` Christoph Anton Mitterer
  2013-07-06  8:52     ` Stan Hoeppner
  2013-07-07 17:39   ` Milan Broz
  2 siblings, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-06  1:33 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

One more on that, which I've just thought about:

On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote:
> I believe dmcrypt is single-threaded, too.
Even if that is still true... and it with dmcrypt below MD it would run
with _one_ thread _per_ physical device... while the other way round
(dmcrypt on top of MD) it would run with _only one_ thread over the
_whole_ MD device (and thus also all disks)...

...would the former really give a performance benefit (or wouldn't it be
actually much worse).

Since even though you'd have one dmcrypt thread per disk now,... each of
them would have to en/decrypt the "same" actual data for different
devices... so while you have 4x threads... you need to do 4x
en/decryption works.

Which wouldn't be the case when having dmcrypt on top of MD... sure
you'd only have one thread...

Does that sound reasonable?

Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-06  1:33   ` Christoph Anton Mitterer
@ 2013-07-06  8:52     ` Stan Hoeppner
  2013-07-06 15:15       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 21+ messages in thread
From: Stan Hoeppner @ 2013-07-06  8:52 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 7/5/2013 8:33 PM, Christoph Anton Mitterer wrote:
> One more on that, which I've just thought about:
> 
> On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote:
>> I believe dmcrypt is single-threaded, too.
> Even if that is still true... and it with dmcrypt below MD it would run
> with _one_ thread _per_ physical device... while the other way round
> (dmcrypt on top of MD) it would run with _only one_ thread over the
> _whole_ MD device (and thus also all disks)...
> 
> ...would the former really give a performance benefit (or wouldn't it be
> actually much worse).

Yes.

> Since even though you'd have one dmcrypt thread per disk now,... each of
> them would have to en/decrypt the "same" actual data for different
> devices... so while you have 4x threads... you need to do 4x
> en/decryption works.

This would only be true if using n-way mirroring.

> Which wouldn't be the case when having dmcrypt on top of MD... sure
> you'd only have one thread...

You misunderstand the way this works.  With striped md RAID each chunk
goes to a different disk.  All chunks contain dissimilar data.  If you
use dmcrypt at the lowest level of the stack you have one dmcrypt thread
per chunk, each processing 1/n of the load.  There is no duplication of
work.

-- 
Stan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-06  8:52     ` Stan Hoeppner
@ 2013-07-06 15:15       ` Christoph Anton Mitterer
  2013-07-07 16:51         ` Stan Hoeppner
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-06 15:15 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 625 bytes --]

On Sat, 2013-07-06 at 03:52 -0500, Stan Hoeppner wrote:
> > Since even though you'd have one dmcrypt thread per disk now,... each of
> > them would have to en/decrypt the "same" actual data for different
> > devices... so while you have 4x threads... you need to do 4x
> > en/decryption works.
> 
> This would only be true if using n-way mirroring.
Ah yes...
(as you already assumed) I missed the point with striping... ;) ...

But also for RAID5/6 you shouldn't get n times the performance (when n
is the number of disks), probably "only" about (n-1) times respectively
(n-2) times, right?


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-06 15:15       ` Christoph Anton Mitterer
@ 2013-07-07 16:51         ` Stan Hoeppner
  0 siblings, 0 replies; 21+ messages in thread
From: Stan Hoeppner @ 2013-07-07 16:51 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 7/6/2013 10:15 AM, Christoph Anton Mitterer wrote:
> On Sat, 2013-07-06 at 03:52 -0500, Stan Hoeppner wrote:
>>> Since even though you'd have one dmcrypt thread per disk now,... each of
>>> them would have to en/decrypt the "same" actual data for different
>>> devices... so while you have 4x threads... you need to do 4x
>>> en/decryption works.
>>
>> This would only be true if using n-way mirroring.
> Ah yes...
> (as you already assumed) I missed the point with striping... ;) ...
> 
> But also for RAID5/6 you shouldn't get n times the performance (when n
> is the number of disks), probably "only" about (n-1) times respectively
> (n-2) times, right?

One has nothing to do with the other.  We're discussing dmcrypt
performance here.

-- 
Stan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 22:07 ` Phil Turmel
  2013-07-04 23:34   ` Christoph Anton Mitterer
  2013-07-06  1:33   ` Christoph Anton Mitterer
@ 2013-07-07 17:39   ` Milan Broz
  2013-07-07 18:01     ` Christoph Anton Mitterer
  2013-07-08  4:53     ` NeilBrown
  2 siblings, 2 replies; 21+ messages in thread
From: Milan Broz @ 2013-07-07 17:39 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Christoph Anton Mitterer, linux-raid

On 07/05/2013 12:07 AM, Phil Turmel wrote:
> On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote:
> 
>> 1) I plan to use dmcrypt and LUKS and had the following stacking in
>> mind:
>> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
>> filesystems
>>
>> Basically I use LVM for partitioning here ;-)
>>
>> Are there any issues with that order? E.g. I've heard rumours that
>> dmcrypt on top of MD performs much worse than vice versa...
> 
> Last time I checked, dmcrypt treated barriers as no-ops, so filesystems
> that rely on barriers for integrity can be scrambled.  As such, where I
> mix LVM and dmcrypt, I do it selectively on top of each LV.

Hi,

Barriers (later replaced by flush/fua) works on dmcrypt, this is the first
commit implementing it
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=647c7db14ef9cacc4ccb3683e206b61f0de6dc2b
(dm core ensures that there is no IO processing before it submits FLUSH/FUA to dmcrypt,
so the implementation is so simple here)

All features based on flush/FUA works over dmcrypt now.

> 
> I believe dmcrypt is single-threaded, too.

Since
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/dm-crypt.c?id=c029772125594e31eb1a5ad9e0913724ed9891f2

dmcrypt keeps IO running on the CPU core which submitted it.

So if you have multiple IOs submitted in parallel from *different* CPUs,
they are processed in parallel.

If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs
with the same cpu context and dmcrypt cannot run it in parallel.

So with new kernel do not try put *multiple* dmcrypt mappings (per device or so)
below MD RAID - it will not improve performance, it will cause exact opposite
(everything will run on one core).

(Please note, this applies for kernel with patch above and later,
previously it was different. There were a lot of discussions about it,
some other patches which were never applied to mainline etc, see dmcrypt
and dm-devel list archive for more info...)

> If either or both of those issues have been corrected, I wouldn't expect
> the layering order to matter. 

Block layer (including transparent mappings like dmcrypt) can
reorder requests. It is FS responsibility to handle ordering (if it is
important) though flush requests.

Milan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-07 17:39   ` Milan Broz
@ 2013-07-07 18:01     ` Christoph Anton Mitterer
  2013-07-07 18:50       ` Milan Broz
  2013-07-08  4:53     ` NeilBrown
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-07 18:01 UTC (permalink / raw)
  To: linux-raid; +Cc: Milan Broz

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

Hi Milan.

Thanks for your answers :)

On Sun, 2013-07-07 at 19:39 +0200, Milan Broz wrote:
> dmcrypt keeps IO running on the CPU core which submitted it.
> 
> So if you have multiple IOs submitted in parallel from *different* CPUs,
> they are processed in parallel.
> 
> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs
> with the same cpu context and dmcrypt cannot run it in parallel.
Interesting to know :)
I will ask Arno over at the dm-crypt list to at this to the FAQ.

I'd guess there are no further black magical issues one should expect
when mixing MD and/or dmcrypt with LVM (especially when contiguous
allocation is used)... and even less expectable when using partitions
(should be just offsetting)?!


> (Please note, this applies for kernel with patch above and later,
> previously it was different. There were a lot of discussions about it,
> some other patches which were never applied to mainline etc, see dmcrypt
> and dm-devel list archive for more info...)
IIRC, than these included discussions about paralleling IO sent from one
CPU context, right?
That's perhaps a bit off-topic now,... but given that stacking dmcrypt
with MD seems to be done by many people I guess it's not totally
off-topic, so...
Are there any plans for that (paralleling IO from one core)? Which
should make it (at least performance wise) okay again do but dmcrypt
below MD (not that I'd consider that much useful, personally).


> Block layer (including transparent mappings like dmcrypt) can
> reorder requests. It is FS responsibility to handle ordering (if it is
> important) though flush requests.
Interesting... but I guess the main filesystems (ext*, xfs, btrfs, jfs)
do just right with any combinations of MD/LVM/dmcrypt?


Lots of thanks again,
Chris.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-07 18:01     ` Christoph Anton Mitterer
@ 2013-07-07 18:50       ` Milan Broz
  2013-07-07 20:51         ` Christoph Anton Mitterer
  0 siblings, 1 reply; 21+ messages in thread
From: Milan Broz @ 2013-07-07 18:50 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 07/07/2013 08:01 PM, Christoph Anton Mitterer wrote:
> Hi Milan.
> 
> Thanks for your answers :)
> 
> On Sun, 2013-07-07 at 19:39 +0200, Milan Broz wrote:
>> dmcrypt keeps IO running on the CPU core which submitted it.
>>
>> So if you have multiple IOs submitted in parallel from *different* CPUs,
>> they are processed in parallel.
>>
>> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs
>> with the same cpu context and dmcrypt cannot run it in parallel.
> Interesting to know :)
> I will ask Arno over at the dm-crypt list to at this to the FAQ.

Yes, but for FAQ we need to cover even old kernels dmcrypt behavior.
(I had some document describing it somewhere.)

> 
> I'd guess there are no further black magical issues one should expect
> when mixing MD and/or dmcrypt with LVM (especially when contiguous
> allocation is used)... and even less expectable when using partitions
> (should be just offsetting)?!

The only problem is usually alignment - but all components now
supports automatic alignment setting using device topology so you should
not see this anymore. (For LUKS is it trivial, LVM is more complex
but should align to MD chunk/stripe properly as well.)


>> (Please note, this applies for kernel with patch above and later,
>> previously it was different. There were a lot of discussions about it,
>> some other patches which were never applied to mainline etc, see dmcrypt
>> and dm-devel list archive for more info...)
> IIRC, than these included discussions about paralleling IO sent from one
> CPU context, right?

I spent many hours testing this and it never convinced me that it is
generally better than existing mode.

> That's perhaps a bit off-topic now,... but given that stacking dmcrypt
> with MD seems to be done by many people I guess it's not totally
> off-topic, so...

Stacking dmcrypt over MD is very common and works well (usually)
even for high speed arrays (with AES-NI use).

(And if you connect some super-speed SSD or RAID array to old CPU
and encryption speed is bottleneck you can try to use faster cipher
"cryptsetup benchmark" should help.)


> Are there any plans for that (paralleling IO from one core)? Which
> should make it (at least performance wise) okay again do but dmcrypt
> below MD (not that I'd consider that much useful, personally).

TBH I have no idea, I no longer work for Red Hat (which maintains DM).
My response to these patches is still the same, see
http://permalink.gmane.org/gmane.linux.kernel/1464414

 
>> Block layer (including transparent mappings like dmcrypt) can
>> reorder requests. It is FS responsibility to handle ordering (if it is
>> important) though flush requests.
> Interesting... but I guess the main filesystems (ext*, xfs, btrfs, jfs)
> do just right with any combinations of MD/LVM/dmcrypt?

Yes.

Milan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-07 18:50       ` Milan Broz
@ 2013-07-07 20:51         ` Christoph Anton Mitterer
  2013-07-08  5:40           ` Milan Broz
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-07 20:51 UTC (permalink / raw)
  To: linux-raid; +Cc: Milan Broz

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On Sun, 2013-07-07 at 20:50 +0200, Milan Broz wrote:
> For LUKS is it trivial, LVM is more complex
> but should align to MD chunk/stripe properly as well.
Shouldn't one get that automatically by simply setting the
--dataalignment of the PVs to a multiple of the chunk size (or even
better of the stripe size)... and shouldn't the default alignment
against the 1MiB boundary (as in LUKS) work just well?
Is there any other hidden complexity I haven't realised, yet (when
aligning LVMs on top of MD, optionally with LUKS in between)?


> TBH I have no idea, I no longer work for Red Hat (which maintains DM).
:(

> My response to these patches is still the same, see
> http://permalink.gmane.org/gmane.linux.kernel/1464414
Well I haven't that detailed insight for sure... but your arguments
sound quite reasonable ;)


Thanks again for answering here :) I think many users of stacked
MD/dmcrypt setups may benefit from this.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-07 20:51         ` Christoph Anton Mitterer
@ 2013-07-08  5:40           ` Milan Broz
  0 siblings, 0 replies; 21+ messages in thread
From: Milan Broz @ 2013-07-08  5:40 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 07/07/2013 10:51 PM, Christoph Anton Mitterer wrote:
> On Sun, 2013-07-07 at 20:50 +0200, Milan Broz wrote:
>> For LUKS is it trivial, LVM is more complex
>> but should align to MD chunk/stripe properly as well.
> Shouldn't one get that automatically by simply setting the
> --dataalignment of the PVs to a multiple of the chunk size (or even
> better of the stripe size)... and shouldn't the default alignment
> against the 1MiB boundary (as in LUKS) work just well?

Try to not use --dataalignment explicitly.
It should be detected automatically through topology ioctl
(both for LVM and LUKS). (You can verify that itopology info
propagates correctly through stack by using lsblk -t).

Only if you know that something is wrong overwrite it
(and perhaps report bug to LVM, it should detect it properly now).

Default 1MiB alignment should work even for LVM IIRC.

IOW I am saying that for newly created array (e.g. like MD->dmcrypt->LVM)
you should get optimal alignment for performance without black magic.

(Also LVM can use MD RAID internally now but let's not complicate
already too complex device stack :)

Milan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-07 17:39   ` Milan Broz
  2013-07-07 18:01     ` Christoph Anton Mitterer
@ 2013-07-08  4:53     ` NeilBrown
  2013-07-08  5:25       ` Milan Broz
  1 sibling, 1 reply; 21+ messages in thread
From: NeilBrown @ 2013-07-08  4:53 UTC (permalink / raw)
  To: Milan Broz; +Cc: Phil Turmel, Christoph Anton Mitterer, linux-raid

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On Sun, 07 Jul 2013 19:39:27 +0200 Milan Broz <gmazyland@gmail.com> wrote:


> So if you have multiple IOs submitted in parallel from *different* CPUs,
> they are processed in parallel.
> 
> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs
> with the same cpu context and dmcrypt cannot run it in parallel.

For RAID1 and RAID10 this isn't true any more.

Commit f54a9d0e59c4bea3db733921ca9147612a6f292c
in 3.6 changed this for RAID1 and a similar commit did for RAID10.
RAID4/5/6 still submit from a single thread as you say.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-08  4:53     ` NeilBrown
@ 2013-07-08  5:25       ` Milan Broz
  0 siblings, 0 replies; 21+ messages in thread
From: Milan Broz @ 2013-07-08  5:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: Phil Turmel, Christoph Anton Mitterer, linux-raid

On 07/08/2013 06:53 AM, NeilBrown wrote:
> On Sun, 07 Jul 2013 19:39:27 +0200 Milan Broz <gmazyland@gmail.com> wrote:
> 
> 
>> So if you have multiple IOs submitted in parallel from *different* CPUs,
>> they are processed in parallel.
>>
>> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs
>> with the same cpu context and dmcrypt cannot run it in parallel.
> 
> For RAID1 and RAID10 this isn't true any more.
> 
> Commit f54a9d0e59c4bea3db733921ca9147612a6f292c
> in 3.6 changed this for RAID1 and a similar commit did for RAID10.
> RAID4/5/6 still submit from a single thread as you say.

Ah, sorry, missed that change, thanks Neil!

So then it should perform much better.
(But IIRC most reports about dmcrypt performance was either over
high-speed SSD without AES-NI or over RAID5 - case like huge ftp archive
where they need redundancy & offline data security.
But here current design should be help...)

Milan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer
  2013-07-04 22:07 ` Phil Turmel
@ 2013-07-05  1:13 ` Brad Campbell
  2013-07-05  1:39   ` Sam Bingner
  1 sibling, 1 reply; 21+ messages in thread
From: Brad Campbell @ 2013-07-05  1:13 UTC (permalink / raw)
  To: Christoph Anton Mitterer; +Cc: linux-raid

On 05/07/13 02:30, Christoph Anton Mitterer wrote:
> Hi.
>
> Well for me personally these are follow up questions to my scenario
> presented here: http://thread.gmane.org/gmane.linux.raid/43405
>
> But I think these questions would be generally interesting an I'd like
> to add them to the Debian FAQ for mdadm (and haven't found real good
> answers in the archives/google).
>
>
> 1) I plan to use dmcrypt and LUKS and had the following stacking in
> mind:
> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
> filesystems

I have two arrays with dmcrypt on top of MD.
Array 1 is 4 x Seagate 15k SAS drives in a RAID10 f2.
Array 2 is 6 x 240G SSD's in a RAID10 n2.

Array 2 is partitoned. All run ext4.
The CPU is an AMD FX8350. I can max out all arrays with sequential or 
random r/w loads. So dmcrypt is not a limiting factor for me.

When I say max out, I run into bandwidth limits on the hardware before 
dmcrypt gets in the way.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-05  1:13 ` Brad Campbell
@ 2013-07-05  1:39   ` Sam Bingner
  2013-07-05  3:06     ` Brad Campbell
  2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
  0 siblings, 2 replies; 21+ messages in thread
From: Sam Bingner @ 2013-07-05  1:39 UTC (permalink / raw)
  To: Brad Campbell, Christoph Anton Mitterer; +Cc: linux-raid@vger.kernel.org

On 7/4/13 3:13 PM, "Brad Campbell" <lists2009@fnarfbargle.com> wrote:

>On 05/07/13 02:30, Christoph Anton Mitterer wrote:
>> Hi.
>>
>> Well for me personally these are follow up questions to my scenario
>> presented here: http://thread.gmane.org/gmane.linux.raid/43405
>>
>> But I think these questions would be generally interesting an I'd like
>> to add them to the Debian FAQ for mdadm (and haven't found real good
>> answers in the archives/google).
>>
>>
>> 1) I plan to use dmcrypt and LUKS and had the following stacking in
>> mind:
>> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) ->
>> filesystems
>
>I have two arrays with dmcrypt on top of MD.
>Array 1 is 4 x Seagate 15k SAS drives in a RAID10 f2.
>Array 2 is 6 x 240G SSD's in a RAID10 n2.
>
>Array 2 is partitoned. All run ext4.
>The CPU is an AMD FX8350. I can max out all arrays with sequential or
>random r/w loads. So dmcrypt is not a limiting factor for me.
>
>When I say max out, I run into bandwidth limits on the hardware before
>dmcrypt gets in the way.
>

That is because your CPU has encryption features - the QNAP devices
largely do not; I replaced the CPU in mine with one that had encryption
features because otherwise there was nothing that could bring the
performance above about 80MB/sec


Once I put in a CPU supporting AESNI I could get up to about 500MB/sec -
and this was still not CPU bound


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID
  2013-07-05  1:39   ` Sam Bingner
@ 2013-07-05  3:06     ` Brad Campbell
  2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
  1 sibling, 0 replies; 21+ messages in thread
From: Brad Campbell @ 2013-07-05  3:06 UTC (permalink / raw)
  To: Sam Bingner; +Cc: Christoph Anton Mitterer, linux-raid@vger.kernel.org

On 05/07/13 09:39, Sam Bingner wrote:
> On 7/4/13 3:13 PM, "Brad Campbell" <lists2009@fnarfbargle.com> wrote:
>
>> On 05/07/13 02:30, Christoph Anton Mitterer wrote:

>> When I say max out, I run into bandwidth limits on the hardware before
>> dmcrypt gets in the way.
>>
>
> That is because your CPU has encryption features - the QNAP devices
> largely do not; I replaced the CPU in mine with one that had encryption
> features because otherwise there was nothing that could bring the
> performance above about 80MB/sec
>
>
> Once I put in a CPU supporting AESNI I could get up to about 500MB/sec -
> and this was still not CPU bound

Yes, of course. I forgot about the dedicated storage device rather than 
a general server. I run dmcrypt on a WD mybook live and get about 8MB/s 
on that (which is enough for what I need).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID (OT)
  2013-07-05  1:39   ` Sam Bingner
  2013-07-05  3:06     ` Brad Campbell
@ 2013-07-06  1:23     ` Christoph Anton Mitterer
  2013-07-06  6:23       ` Sam Bingner
  1 sibling, 1 reply; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-06  1:23 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 787 bytes --]

On Fri, 2013-07-05 at 01:39 +0000, Sam Bingner wrote:
> That is because your CPU has encryption features - the QNAP devices
> largely do not; I replaced the CPU in mine with one that had encryption
> features because otherwise there was nothing that could bring the
> performance above about 80MB/sec
That's a bit off topic now to my questions, but I guess many other
people using RAIDs on their QNAPs might be interested later as well:
which QNAP do you have exactly?

I mean they all have either ARM based CPUs, or Atom... so in my case, a
D2700 with an FCBGA559... which CPU (that has AESNI) could you find for
that? Since AFAIK there are no[0] Atoms at all with AESNI?


Cheers,
Chris.


[0] http://ark.intel.com/search/advanced/?s=t&Sockets=FCBGA559&AESTech=true

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID (OT)
  2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
@ 2013-07-06  6:23       ` Sam Bingner
  2013-07-06 15:11         ` Christoph Anton Mitterer
  0 siblings, 1 reply; 21+ messages in thread
From: Sam Bingner @ 2013-07-06  6:23 UTC (permalink / raw)
  To: Christoph Anton Mitterer, linux-raid@vger.kernel.org

The TS-879 and 1079 have Intel i3 Processors, and use a normal socket.  I
replaced it with an E3-1275 Xeon processor.  You can use Sandy bridge but
they do not have BIOS support for Ivy bridge processors.  I also replaced
my memory with 16GB ECC memory.

I did a lot of research about this stuff before buying it, which is why I
went with the TS-1079 Pro.  Works great with debian except for some lack
of LED support on a couple drives and the LCD always says "BootingŠ"

Sam

On 7/5/13 3:23 PM, "Christoph Anton Mitterer" <calestyo@scientia.net>
wrote:

>On Fri, 2013-07-05 at 01:39 +0000, Sam Bingner wrote:
>> That is because your CPU has encryption features - the QNAP devices
>> largely do not; I replaced the CPU in mine with one that had encryption
>> features because otherwise there was nothing that could bring the
>> performance above about 80MB/sec
>That's a bit off topic now to my questions, but I guess many other
>people using RAIDs on their QNAPs might be interested later as well:
>which QNAP do you have exactly?
>
>I mean they all have either ARM based CPUs, or Atom... so in my case, a
>D2700 with an FCBGA559... which CPU (that has AESNI) could you find for
>that? Since AFAIK there are no[0] Atoms at all with AESNI?
>
>
>Cheers,
>Chris.
>
>
>[0] 
>http://ark.intel.com/search/advanced/?s=t&Sockets=FCBGA559&AESTech=true

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: some general questions on RAID (OT)
  2013-07-06  6:23       ` Sam Bingner
@ 2013-07-06 15:11         ` Christoph Anton Mitterer
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Anton Mitterer @ 2013-07-06 15:11 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

On Sat, 2013-07-06 at 06:23 +0000, Sam Bingner wrote:
> Works great with debian except for some lack
> of LED support on a couple drives and the LCD always says "BootingŠ"
I tried a lot to get the LEDs/buzzers/buttons working (but doesn't seem
to be easy on the Intel based devices), you may be interested in:
http://thread.gmane.org/gmane.linux.kernel/1508763
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712283
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712191
https://github.com/groeck/it87/issues/1

Getting the LCD work is very easy... though (see also the links above).


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5113 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-07-08  5:40 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer
2013-07-04 22:07 ` Phil Turmel
2013-07-04 23:34   ` Christoph Anton Mitterer
2013-07-08  4:48     ` NeilBrown
2013-07-06  1:33   ` Christoph Anton Mitterer
2013-07-06  8:52     ` Stan Hoeppner
2013-07-06 15:15       ` Christoph Anton Mitterer
2013-07-07 16:51         ` Stan Hoeppner
2013-07-07 17:39   ` Milan Broz
2013-07-07 18:01     ` Christoph Anton Mitterer
2013-07-07 18:50       ` Milan Broz
2013-07-07 20:51         ` Christoph Anton Mitterer
2013-07-08  5:40           ` Milan Broz
2013-07-08  4:53     ` NeilBrown
2013-07-08  5:25       ` Milan Broz
2013-07-05  1:13 ` Brad Campbell
2013-07-05  1:39   ` Sam Bingner
2013-07-05  3:06     ` Brad Campbell
2013-07-06  1:23     ` some general questions on RAID (OT) Christoph Anton Mitterer
2013-07-06  6:23       ` Sam Bingner
2013-07-06 15:11         ` Christoph Anton Mitterer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).