* some general questions on RAID @ 2013-07-04 18:30 Christoph Anton Mitterer 2013-07-04 22:07 ` Phil Turmel 2013-07-05 1:13 ` Brad Campbell 0 siblings, 2 replies; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-04 18:30 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 2488 bytes --] Hi. Well for me personally these are follow up questions to my scenario presented here: http://thread.gmane.org/gmane.linux.raid/43405 But I think these questions would be generally interesting an I'd like to add them to the Debian FAQ for mdadm (and haven't found real good answers in the archives/google). 1) I plan to use dmcrypt and LUKS and had the following stacking in mind: physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) -> filesystems Basically I use LVM for partitioning here ;-) Are there any issues with that order? E.g. I've heard rumours that dmcrypt on top of MD performs much worse than vice versa... But when looking at potential disaster recovery... I think not having MD directly on top of the HDDs (especially having it above dmcrypt) seems stupid. 2) Chunks / Chunk size a) How does MD work in that matter... is it that it _always_ reads and/or writes FULL chunks? Guess it must at least do so on _write_ for the RAID levels with parity (5/6)... but what about read? And what about read/write with the non-parity RAID levels (1, 0, 10, linear)... is the chunk size of any real influence here (in terms of reading/writing)? b) What's the currently suggested chunk size when having a undetermined mix of file sizes? Well it's obviously >= filesystem block size... dm-crypt blocksize is always 512B so far so this won't matter... but do the LVM physical extents somehow play in (I guess not,... and LVM PEs are _NOT_ always FULLY read and/or written - why should they? .. right?) From our countless big (hardware) RAID systems at the faculty (we run a Tier-2 for the LHC Computing Grid)... experience seems that 256K is best for an undetermined mixture of small/medium/large files... and the biggest possible chunk size for mostly large files. But does the 256K apply to MD RAIDs as well? 3) Any extra benefit from the parity? What I mean is... does that parity give me kinda "integrity check"... I.e. when a drive fails completely (burns down or whatever)... then it's clear... the parity is used on rebuild to get the lost chunks back. But when I only have block errors... and do scrubbing... a) will it tell me that/which blocks are damaged... it will it be possible to recover the right value by the parity? Assuming of course that block error/damage doesn't mean the drive really tells me an error code for "BLOCK BROKEN"... but just gives me bogus data? Thanks again, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5165 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer @ 2013-07-04 22:07 ` Phil Turmel 2013-07-04 23:34 ` Christoph Anton Mitterer ` (2 more replies) 2013-07-05 1:13 ` Brad Campbell 1 sibling, 3 replies; 21+ messages in thread From: Phil Turmel @ 2013-07-04 22:07 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote: > 1) I plan to use dmcrypt and LUKS and had the following stacking in > mind: > physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) -> > filesystems > > Basically I use LVM for partitioning here ;-) > > Are there any issues with that order? E.g. I've heard rumours that > dmcrypt on top of MD performs much worse than vice versa... Last time I checked, dmcrypt treated barriers as no-ops, so filesystems that rely on barriers for integrity can be scrambled. As such, where I mix LVM and dmcrypt, I do it selectively on top of each LV. I believe dmcrypt is single-threaded, too. If either or both of those issues have been corrected, I wouldn't expect the layering order to matter. I'd be nice if a lurking dmcrypt dev or enthusiast would chime in here. > But when looking at potential disaster recovery... I think not having MD > directly on top of the HDDs (especially having it above dmcrypt) seems > stupid. I don't know that layering matters much in that case, but I can think of many cases where it could complicate things. > 2) Chunks / Chunk size > a) How does MD work in that matter... is it that it _always_ reads > and/or writes FULL chunks? No. It does not. It doesn't go below 4k though. > Guess it must at least do so on _write_ for the RAID levels with parity > (5/6)... but what about read? No, not even for write. If an isolated 4k block is written to a raid6, the corresponding 4k blocks from the other data drives in that stripe are read, both corresponding parity blocks are computed, and the three blocks are written. > And what about read/write with the non-parity RAID levels (1, 0, 10, > linear)... is the chunk size of any real influence here (in terms of > reading/writing)? Not really. At least, I've seen nothing on this list that shows any influence. > b) What's the currently suggested chunk size when having a undetermined > mix of file sizes? Well it's obviously >= filesystem block size... > dm-crypt blocksize is always 512B so far so this won't matter... but do > the LVM physical extents somehow play in (I guess not,... and LVM PEs > are _NOT_ always FULLY read and/or written - why should they? .. right?) > From our countless big (hardware) RAID systems at the faculty (we run a > Tier-2 for the LHC Computing Grid)... experience seems that 256K is best > for an undetermined mixture of small/medium/large files... and the > biggest possible chunk size for mostly large files. > But does the 256K apply to MD RAIDs as well? For parity raid, large chunk sizes are crazy, IMHO. As I pointed out in another mail, I use 16k for all of mine. > 3) Any extra benefit from the parity? > What I mean is... does that parity give me kinda "integrity check"... > I.e. when a drive fails completely (burns down or whatever)... then it's > clear... the parity is used on rebuild to get the lost chunks back. > > But when I only have block errors... and do scrubbing... a) will it tell > me that/which blocks are damaged... it will it be possible to recover > the right value by the parity? Assuming of course that block > error/damage doesn't mean the drive really tells me an error code for > "BLOCK BROKEN"... but just gives me bogus data? This capability exists as a separate userspace utility "raid6check" that is in the process of acceptance into the mdadm toolkit. It is not built into the kernel, and Neil Brown has a long blog post explaining why it shouldn't ever be. Built-in "check" scrubs will report such mismatches, and the built-in "repair" scrub fixes them by recomputing all parity from the data blocks. Phil ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 22:07 ` Phil Turmel @ 2013-07-04 23:34 ` Christoph Anton Mitterer 2013-07-08 4:48 ` NeilBrown 2013-07-06 1:33 ` Christoph Anton Mitterer 2013-07-07 17:39 ` Milan Broz 2 siblings, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-04 23:34 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 5185 bytes --] On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote: > Last time I checked, dmcrypt treated barriers as no-ops, so filesystems > that rely on barriers for integrity can be scrambled. Whow... uhmm... that would be awful... (since I already use ext4 on top of dmcrypt.. But wouldn't that be a general problem of dmcrypt, unrelated to any further stacking of LVM and/or MD?! > As such, where I > mix LVM and dmcrypt, I do it selectively on top of each LV. Don't understand what you exactly mean/do and why it should help against the barrier thingy? > I believe dmcrypt is single-threaded, too. I've had thought that, too,..at least it used to be. So that would basically mean... if I put dmcrypt on top of MD, then one single thread would handle the whole encryption for the whole MD and therefore also on all of my (e.g. 4) devices? But when I do it the other way round... MD being on top of dmcrypt... than each physical device would get it's own dmcrypt device... and also it's own thread, using potentially more CPUs? Well the QNAP in question would have 2 cores with HT so 4 threads... anyone here with an idea whether the performance boost would be worth running dmcrypt below MD (which somehow sounds ugly and wrong). > If either or both of those issues have been corrected, I wouldn't expect > the layering order to matter. I'd be nice if a lurking dmcrypt dev or > enthusiast would chime in here. I've mailed Milan Broz a pointer to this thread and hope he finds some time to have a look at :) If so, a question in addition,... if it's not already done,... are there plans to make dmcrypt multi-threaded (so I could just wait for it and put MD below it)? > > But when looking at potential disaster recovery... I think not having MD > > directly on top of the HDDs (especially having it above dmcrypt) seems > > stupid. > I don't know that layering matters much in that case, but I can think of > many cases where it could complicate things. What exactly do you mean? My idea was that when MD is directly above the physical device... then I will roughly which kind block should be where and how data blocks should yield parity blocks... i.e. when I do disc forensics or plain dd access. When dmcrypt is below, though,... all physical devices will look completely like garbage. > > 2) Chunks / Chunk size > > a) How does MD work in that matter... is it that it _always_ reads > > and/or writes FULL chunks? > > No. It does not. It doesn't go below 4k though. So what does that mean exactly? It always reads/writes at least 4k blocks? > > Guess it must at least do so on _write_ for the RAID levels with parity > > (5/6)... but what about read? > No, not even for write. :-O > If an isolated 4k block is written to a raid6, > the corresponding 4k blocks from the other data drives in that stripe > are read, both corresponding parity blocks are computed, and the three > blocks are written. okay that's clear... but uhm... why having chuk sizes then? I mean what's the difference when having a 128k chunk vs. a 256k one... when the parity/data blocks seem to be split in 4k blocks,... or did I get that completely wrong? > > And what about read/write with the non-parity RAID levels (1, 0, 10, > > linear)... is the chunk size of any real influence here (in terms of > > reading/writing)? > Not really. At least, I've seen nothing on this list that shows any > influence. So AFAIU now: a) Regardless the RAID level and regardless the chunk size, - data blocks are read/written in 4KiB blocks - when there IS parity information... then that parity information is _ALWAYS_ read/computed/written in 4KiB blocks. b) The chunks basically just control how much consecutive data is on one device, thereby allowing to speed up / slow down reads/write for small / large files. But that should basically only matter on seeking devices, i.e. not on SSDs... thus the chunk size is irrelevant on SSDs... Is all that right? Phil, Neil? :D > > b) What's the currently suggested chunk size when having a undetermined > > [snip] > For parity raid, large chunk sizes are crazy, IMHO. As I pointed out in > another mail, I use 16k for all of mine. Sounds contradicting to the 4 KiB parity blocks idea?! So why? Or do you have by chance a URL to your other mail? :) > > 3) Any extra benefit from the parity? > > [snip] > This capability exists as a separate userspace utility "raid6check" that > is in the process of acceptance into the mdadm toolkit. Interesting... just looking at it. > It is not built > into the kernel, and Neil Brown has a long blog post explaining why it > shouldn't ever be. I'll search for it... > Built-in "check" scrubs will report such mismatches, > and the built-in "repair" scrub fixes them by recomputing all parity > from the data blocks. So that basically means, that parity RAID (i.e. RAID6) *HAS* an resilience advantage even over the 3 block copy RAID10 version that we've discussed over there[0], right? Thanks a lot, Chris. [0] http://thread.gmane.org/gmane.linux.raid/43405/focus=43407 [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 23:34 ` Christoph Anton Mitterer @ 2013-07-08 4:48 ` NeilBrown 0 siblings, 0 replies; 21+ messages in thread From: NeilBrown @ 2013-07-08 4:48 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 3266 bytes --] On Fri, 05 Jul 2013 01:34:42 +0200 Christoph Anton Mitterer <calestyo@scientia.net> wrote: > > > > 2) Chunks / Chunk size > > > a) How does MD work in that matter... is it that it _always_ reads > > > and/or writes FULL chunks? > > > > No. It does not. It doesn't go below 4k though. > So what does that mean exactly? It always reads/writes at least 4k > blocks? RAID1 reads or writes whatever block size the filesystem sends RAID0,10 read or write whatever block size the filesystem sends up to the chunk size (and obviously less than chunksize when not aligned with the chunksize). RAID4/5/6 read the same as RAID0 when not degraded. When degraded or when writing, RAID4/5/6 does all IO in 4K blocks (hoping that the lower layers will merge as appropriate). > > > > > Guess it must at least do so on _write_ for the RAID levels with parity > > > (5/6)... but what about read? > > No, not even for write. > :-O > > > If an isolated 4k block is written to a raid6, > > the corresponding 4k blocks from the other data drives in that stripe > > are read, both corresponding parity blocks are computed, and the three > > blocks are written. > okay that's clear... but uhm... why having chuk sizes then? I mean > what's the difference when having a 128k chunk vs. a 256k one... when > the parity/data blocks seem to be split in 4k blocks,... or did I get > that completely wrong? A sequential read that only hits one chunk will be served faster than one which hits two chunks. So making the chunksize 1-2 times your typical block size for random reads can help read performance. For very large sequential reads it shouldn't really matter though large chunk sizes tend to result in larger IO requests to the underlying devices. For very small random reads it shouldn't really matter either. For writes, you want the stripe size (chunksize * (drives - parity_drives)) to match the typical size for writes - and you want those writes to be aligned. So the ideal load is smallish reads and largish writes with a chunksize between the two. > > > > > And what about read/write with the non-parity RAID levels (1, 0, 10, > > > linear)... is the chunk size of any real influence here (in terms of > > > reading/writing)? > > Not really. At least, I've seen nothing on this list that shows any > > influence. > So AFAIU now: > a) Regardless the RAID level and regardless the chunk size, > - data blocks are read/written in 4KiB blocks > - when there IS parity information... then that parity information is _ALWAYS_ read/computed/written in 4KiB blocks. > b) The chunks basically just control how much consecutive data is on one > device, thereby allowing to speed up / slow down reads/write for small / > large files. > But that should basically only matter on seeking devices, i.e. not on > SSDs... thus the chunk size is irrelevant on SSDs... Seeks are cheaper in SSDs than on spinning rust but the cost is not zero. If you are concerned about the effect of chunksize on performance, you should measure the performance of your hardware with your workload with differing chunk sizes and come to your own conclusion. All anyone else can do is offer generalities. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 22:07 ` Phil Turmel 2013-07-04 23:34 ` Christoph Anton Mitterer @ 2013-07-06 1:33 ` Christoph Anton Mitterer 2013-07-06 8:52 ` Stan Hoeppner 2013-07-07 17:39 ` Milan Broz 2 siblings, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-06 1:33 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 898 bytes --] One more on that, which I've just thought about: On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote: > I believe dmcrypt is single-threaded, too. Even if that is still true... and it with dmcrypt below MD it would run with _one_ thread _per_ physical device... while the other way round (dmcrypt on top of MD) it would run with _only one_ thread over the _whole_ MD device (and thus also all disks)... ...would the former really give a performance benefit (or wouldn't it be actually much worse). Since even though you'd have one dmcrypt thread per disk now,... each of them would have to en/decrypt the "same" actual data for different devices... so while you have 4x threads... you need to do 4x en/decryption works. Which wouldn't be the case when having dmcrypt on top of MD... sure you'd only have one thread... Does that sound reasonable? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-06 1:33 ` Christoph Anton Mitterer @ 2013-07-06 8:52 ` Stan Hoeppner 2013-07-06 15:15 ` Christoph Anton Mitterer 0 siblings, 1 reply; 21+ messages in thread From: Stan Hoeppner @ 2013-07-06 8:52 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 7/5/2013 8:33 PM, Christoph Anton Mitterer wrote: > One more on that, which I've just thought about: > > On Thu, 2013-07-04 at 18:07 -0400, Phil Turmel wrote: >> I believe dmcrypt is single-threaded, too. > Even if that is still true... and it with dmcrypt below MD it would run > with _one_ thread _per_ physical device... while the other way round > (dmcrypt on top of MD) it would run with _only one_ thread over the > _whole_ MD device (and thus also all disks)... > > ...would the former really give a performance benefit (or wouldn't it be > actually much worse). Yes. > Since even though you'd have one dmcrypt thread per disk now,... each of > them would have to en/decrypt the "same" actual data for different > devices... so while you have 4x threads... you need to do 4x > en/decryption works. This would only be true if using n-way mirroring. > Which wouldn't be the case when having dmcrypt on top of MD... sure > you'd only have one thread... You misunderstand the way this works. With striped md RAID each chunk goes to a different disk. All chunks contain dissimilar data. If you use dmcrypt at the lowest level of the stack you have one dmcrypt thread per chunk, each processing 1/n of the load. There is no duplication of work. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-06 8:52 ` Stan Hoeppner @ 2013-07-06 15:15 ` Christoph Anton Mitterer 2013-07-07 16:51 ` Stan Hoeppner 0 siblings, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-06 15:15 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 625 bytes --] On Sat, 2013-07-06 at 03:52 -0500, Stan Hoeppner wrote: > > Since even though you'd have one dmcrypt thread per disk now,... each of > > them would have to en/decrypt the "same" actual data for different > > devices... so while you have 4x threads... you need to do 4x > > en/decryption works. > > This would only be true if using n-way mirroring. Ah yes... (as you already assumed) I missed the point with striping... ;) ... But also for RAID5/6 you shouldn't get n times the performance (when n is the number of disks), probably "only" about (n-1) times respectively (n-2) times, right? Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-06 15:15 ` Christoph Anton Mitterer @ 2013-07-07 16:51 ` Stan Hoeppner 0 siblings, 0 replies; 21+ messages in thread From: Stan Hoeppner @ 2013-07-07 16:51 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 7/6/2013 10:15 AM, Christoph Anton Mitterer wrote: > On Sat, 2013-07-06 at 03:52 -0500, Stan Hoeppner wrote: >>> Since even though you'd have one dmcrypt thread per disk now,... each of >>> them would have to en/decrypt the "same" actual data for different >>> devices... so while you have 4x threads... you need to do 4x >>> en/decryption works. >> >> This would only be true if using n-way mirroring. > Ah yes... > (as you already assumed) I missed the point with striping... ;) ... > > But also for RAID5/6 you shouldn't get n times the performance (when n > is the number of disks), probably "only" about (n-1) times respectively > (n-2) times, right? One has nothing to do with the other. We're discussing dmcrypt performance here. -- Stan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 22:07 ` Phil Turmel 2013-07-04 23:34 ` Christoph Anton Mitterer 2013-07-06 1:33 ` Christoph Anton Mitterer @ 2013-07-07 17:39 ` Milan Broz 2013-07-07 18:01 ` Christoph Anton Mitterer 2013-07-08 4:53 ` NeilBrown 2 siblings, 2 replies; 21+ messages in thread From: Milan Broz @ 2013-07-07 17:39 UTC (permalink / raw) To: Phil Turmel; +Cc: Christoph Anton Mitterer, linux-raid On 07/05/2013 12:07 AM, Phil Turmel wrote: > On 07/04/2013 02:30 PM, Christoph Anton Mitterer wrote: > >> 1) I plan to use dmcrypt and LUKS and had the following stacking in >> mind: >> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) -> >> filesystems >> >> Basically I use LVM for partitioning here ;-) >> >> Are there any issues with that order? E.g. I've heard rumours that >> dmcrypt on top of MD performs much worse than vice versa... > > Last time I checked, dmcrypt treated barriers as no-ops, so filesystems > that rely on barriers for integrity can be scrambled. As such, where I > mix LVM and dmcrypt, I do it selectively on top of each LV. Hi, Barriers (later replaced by flush/fua) works on dmcrypt, this is the first commit implementing it http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=647c7db14ef9cacc4ccb3683e206b61f0de6dc2b (dm core ensures that there is no IO processing before it submits FLUSH/FUA to dmcrypt, so the implementation is so simple here) All features based on flush/FUA works over dmcrypt now. > > I believe dmcrypt is single-threaded, too. Since http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/dm-crypt.c?id=c029772125594e31eb1a5ad9e0913724ed9891f2 dmcrypt keeps IO running on the CPU core which submitted it. So if you have multiple IOs submitted in parallel from *different* CPUs, they are processed in parallel. If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs with the same cpu context and dmcrypt cannot run it in parallel. So with new kernel do not try put *multiple* dmcrypt mappings (per device or so) below MD RAID - it will not improve performance, it will cause exact opposite (everything will run on one core). (Please note, this applies for kernel with patch above and later, previously it was different. There were a lot of discussions about it, some other patches which were never applied to mainline etc, see dmcrypt and dm-devel list archive for more info...) > If either or both of those issues have been corrected, I wouldn't expect > the layering order to matter. Block layer (including transparent mappings like dmcrypt) can reorder requests. It is FS responsibility to handle ordering (if it is important) though flush requests. Milan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-07 17:39 ` Milan Broz @ 2013-07-07 18:01 ` Christoph Anton Mitterer 2013-07-07 18:50 ` Milan Broz 2013-07-08 4:53 ` NeilBrown 1 sibling, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-07 18:01 UTC (permalink / raw) To: linux-raid; +Cc: Milan Broz [-- Attachment #1: Type: text/plain, Size: 1829 bytes --] Hi Milan. Thanks for your answers :) On Sun, 2013-07-07 at 19:39 +0200, Milan Broz wrote: > dmcrypt keeps IO running on the CPU core which submitted it. > > So if you have multiple IOs submitted in parallel from *different* CPUs, > they are processed in parallel. > > If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs > with the same cpu context and dmcrypt cannot run it in parallel. Interesting to know :) I will ask Arno over at the dm-crypt list to at this to the FAQ. I'd guess there are no further black magical issues one should expect when mixing MD and/or dmcrypt with LVM (especially when contiguous allocation is used)... and even less expectable when using partitions (should be just offsetting)?! > (Please note, this applies for kernel with patch above and later, > previously it was different. There were a lot of discussions about it, > some other patches which were never applied to mainline etc, see dmcrypt > and dm-devel list archive for more info...) IIRC, than these included discussions about paralleling IO sent from one CPU context, right? That's perhaps a bit off-topic now,... but given that stacking dmcrypt with MD seems to be done by many people I guess it's not totally off-topic, so... Are there any plans for that (paralleling IO from one core)? Which should make it (at least performance wise) okay again do but dmcrypt below MD (not that I'd consider that much useful, personally). > Block layer (including transparent mappings like dmcrypt) can > reorder requests. It is FS responsibility to handle ordering (if it is > important) though flush requests. Interesting... but I guess the main filesystems (ext*, xfs, btrfs, jfs) do just right with any combinations of MD/LVM/dmcrypt? Lots of thanks again, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-07 18:01 ` Christoph Anton Mitterer @ 2013-07-07 18:50 ` Milan Broz 2013-07-07 20:51 ` Christoph Anton Mitterer 0 siblings, 1 reply; 21+ messages in thread From: Milan Broz @ 2013-07-07 18:50 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 07/07/2013 08:01 PM, Christoph Anton Mitterer wrote: > Hi Milan. > > Thanks for your answers :) > > On Sun, 2013-07-07 at 19:39 +0200, Milan Broz wrote: >> dmcrypt keeps IO running on the CPU core which submitted it. >> >> So if you have multiple IOs submitted in parallel from *different* CPUs, >> they are processed in parallel. >> >> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs >> with the same cpu context and dmcrypt cannot run it in parallel. > Interesting to know :) > I will ask Arno over at the dm-crypt list to at this to the FAQ. Yes, but for FAQ we need to cover even old kernels dmcrypt behavior. (I had some document describing it somewhere.) > > I'd guess there are no further black magical issues one should expect > when mixing MD and/or dmcrypt with LVM (especially when contiguous > allocation is used)... and even less expectable when using partitions > (should be just offsetting)?! The only problem is usually alignment - but all components now supports automatic alignment setting using device topology so you should not see this anymore. (For LUKS is it trivial, LVM is more complex but should align to MD chunk/stripe properly as well.) >> (Please note, this applies for kernel with patch above and later, >> previously it was different. There were a lot of discussions about it, >> some other patches which were never applied to mainline etc, see dmcrypt >> and dm-devel list archive for more info...) > IIRC, than these included discussions about paralleling IO sent from one > CPU context, right? I spent many hours testing this and it never convinced me that it is generally better than existing mode. > That's perhaps a bit off-topic now,... but given that stacking dmcrypt > with MD seems to be done by many people I guess it's not totally > off-topic, so... Stacking dmcrypt over MD is very common and works well (usually) even for high speed arrays (with AES-NI use). (And if you connect some super-speed SSD or RAID array to old CPU and encryption speed is bottleneck you can try to use faster cipher "cryptsetup benchmark" should help.) > Are there any plans for that (paralleling IO from one core)? Which > should make it (at least performance wise) okay again do but dmcrypt > below MD (not that I'd consider that much useful, personally). TBH I have no idea, I no longer work for Red Hat (which maintains DM). My response to these patches is still the same, see http://permalink.gmane.org/gmane.linux.kernel/1464414 >> Block layer (including transparent mappings like dmcrypt) can >> reorder requests. It is FS responsibility to handle ordering (if it is >> important) though flush requests. > Interesting... but I guess the main filesystems (ext*, xfs, btrfs, jfs) > do just right with any combinations of MD/LVM/dmcrypt? Yes. Milan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-07 18:50 ` Milan Broz @ 2013-07-07 20:51 ` Christoph Anton Mitterer 2013-07-08 5:40 ` Milan Broz 0 siblings, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-07 20:51 UTC (permalink / raw) To: linux-raid; +Cc: Milan Broz [-- Attachment #1: Type: text/plain, Size: 966 bytes --] On Sun, 2013-07-07 at 20:50 +0200, Milan Broz wrote: > For LUKS is it trivial, LVM is more complex > but should align to MD chunk/stripe properly as well. Shouldn't one get that automatically by simply setting the --dataalignment of the PVs to a multiple of the chunk size (or even better of the stripe size)... and shouldn't the default alignment against the 1MiB boundary (as in LUKS) work just well? Is there any other hidden complexity I haven't realised, yet (when aligning LVMs on top of MD, optionally with LUKS in between)? > TBH I have no idea, I no longer work for Red Hat (which maintains DM). :( > My response to these patches is still the same, see > http://permalink.gmane.org/gmane.linux.kernel/1464414 Well I haven't that detailed insight for sure... but your arguments sound quite reasonable ;) Thanks again for answering here :) I think many users of stacked MD/dmcrypt setups may benefit from this. Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-07 20:51 ` Christoph Anton Mitterer @ 2013-07-08 5:40 ` Milan Broz 0 siblings, 0 replies; 21+ messages in thread From: Milan Broz @ 2013-07-08 5:40 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 07/07/2013 10:51 PM, Christoph Anton Mitterer wrote: > On Sun, 2013-07-07 at 20:50 +0200, Milan Broz wrote: >> For LUKS is it trivial, LVM is more complex >> but should align to MD chunk/stripe properly as well. > Shouldn't one get that automatically by simply setting the > --dataalignment of the PVs to a multiple of the chunk size (or even > better of the stripe size)... and shouldn't the default alignment > against the 1MiB boundary (as in LUKS) work just well? Try to not use --dataalignment explicitly. It should be detected automatically through topology ioctl (both for LVM and LUKS). (You can verify that itopology info propagates correctly through stack by using lsblk -t). Only if you know that something is wrong overwrite it (and perhaps report bug to LVM, it should detect it properly now). Default 1MiB alignment should work even for LVM IIRC. IOW I am saying that for newly created array (e.g. like MD->dmcrypt->LVM) you should get optimal alignment for performance without black magic. (Also LVM can use MD RAID internally now but let's not complicate already too complex device stack :) Milan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-07 17:39 ` Milan Broz 2013-07-07 18:01 ` Christoph Anton Mitterer @ 2013-07-08 4:53 ` NeilBrown 2013-07-08 5:25 ` Milan Broz 1 sibling, 1 reply; 21+ messages in thread From: NeilBrown @ 2013-07-08 4:53 UTC (permalink / raw) To: Milan Broz; +Cc: Phil Turmel, Christoph Anton Mitterer, linux-raid [-- Attachment #1: Type: text/plain, Size: 583 bytes --] On Sun, 07 Jul 2013 19:39:27 +0200 Milan Broz <gmazyland@gmail.com> wrote: > So if you have multiple IOs submitted in parallel from *different* CPUs, > they are processed in parallel. > > If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs > with the same cpu context and dmcrypt cannot run it in parallel. For RAID1 and RAID10 this isn't true any more. Commit f54a9d0e59c4bea3db733921ca9147612a6f292c in 3.6 changed this for RAID1 and a similar commit did for RAID10. RAID4/5/6 still submit from a single thread as you say. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-08 4:53 ` NeilBrown @ 2013-07-08 5:25 ` Milan Broz 0 siblings, 0 replies; 21+ messages in thread From: Milan Broz @ 2013-07-08 5:25 UTC (permalink / raw) To: NeilBrown; +Cc: Phil Turmel, Christoph Anton Mitterer, linux-raid On 07/08/2013 06:53 AM, NeilBrown wrote: > On Sun, 07 Jul 2013 19:39:27 +0200 Milan Broz <gmazyland@gmail.com> wrote: > > >> So if you have multiple IOs submitted in parallel from *different* CPUs, >> they are processed in parallel. >> >> If you have MD over dmcrypt, this can cause problem that MD sumbits all IOs >> with the same cpu context and dmcrypt cannot run it in parallel. > > For RAID1 and RAID10 this isn't true any more. > > Commit f54a9d0e59c4bea3db733921ca9147612a6f292c > in 3.6 changed this for RAID1 and a similar commit did for RAID10. > RAID4/5/6 still submit from a single thread as you say. Ah, sorry, missed that change, thanks Neil! So then it should perform much better. (But IIRC most reports about dmcrypt performance was either over high-speed SSD without AES-NI or over RAID5 - case like huge ftp archive where they need redundancy & offline data security. But here current design should be help...) Milan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer 2013-07-04 22:07 ` Phil Turmel @ 2013-07-05 1:13 ` Brad Campbell 2013-07-05 1:39 ` Sam Bingner 1 sibling, 1 reply; 21+ messages in thread From: Brad Campbell @ 2013-07-05 1:13 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-raid On 05/07/13 02:30, Christoph Anton Mitterer wrote: > Hi. > > Well for me personally these are follow up questions to my scenario > presented here: http://thread.gmane.org/gmane.linux.raid/43405 > > But I think these questions would be generally interesting an I'd like > to add them to the Debian FAQ for mdadm (and haven't found real good > answers in the archives/google). > > > 1) I plan to use dmcrypt and LUKS and had the following stacking in > mind: > physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) -> > filesystems I have two arrays with dmcrypt on top of MD. Array 1 is 4 x Seagate 15k SAS drives in a RAID10 f2. Array 2 is 6 x 240G SSD's in a RAID10 n2. Array 2 is partitoned. All run ext4. The CPU is an AMD FX8350. I can max out all arrays with sequential or random r/w loads. So dmcrypt is not a limiting factor for me. When I say max out, I run into bandwidth limits on the hardware before dmcrypt gets in the way. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-05 1:13 ` Brad Campbell @ 2013-07-05 1:39 ` Sam Bingner 2013-07-05 3:06 ` Brad Campbell 2013-07-06 1:23 ` some general questions on RAID (OT) Christoph Anton Mitterer 0 siblings, 2 replies; 21+ messages in thread From: Sam Bingner @ 2013-07-05 1:39 UTC (permalink / raw) To: Brad Campbell, Christoph Anton Mitterer; +Cc: linux-raid@vger.kernel.org On 7/4/13 3:13 PM, "Brad Campbell" <lists2009@fnarfbargle.com> wrote: >On 05/07/13 02:30, Christoph Anton Mitterer wrote: >> Hi. >> >> Well for me personally these are follow up questions to my scenario >> presented here: http://thread.gmane.org/gmane.linux.raid/43405 >> >> But I think these questions would be generally interesting an I'd like >> to add them to the Debian FAQ for mdadm (and haven't found real good >> answers in the archives/google). >> >> >> 1) I plan to use dmcrypt and LUKS and had the following stacking in >> mind: >> physical devices -> MD -> dmcrypt -> LVM (with multiple LVs) -> >> filesystems > >I have two arrays with dmcrypt on top of MD. >Array 1 is 4 x Seagate 15k SAS drives in a RAID10 f2. >Array 2 is 6 x 240G SSD's in a RAID10 n2. > >Array 2 is partitoned. All run ext4. >The CPU is an AMD FX8350. I can max out all arrays with sequential or >random r/w loads. So dmcrypt is not a limiting factor for me. > >When I say max out, I run into bandwidth limits on the hardware before >dmcrypt gets in the way. > That is because your CPU has encryption features - the QNAP devices largely do not; I replaced the CPU in mine with one that had encryption features because otherwise there was nothing that could bring the performance above about 80MB/sec Once I put in a CPU supporting AESNI I could get up to about 500MB/sec - and this was still not CPU bound ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID 2013-07-05 1:39 ` Sam Bingner @ 2013-07-05 3:06 ` Brad Campbell 2013-07-06 1:23 ` some general questions on RAID (OT) Christoph Anton Mitterer 1 sibling, 0 replies; 21+ messages in thread From: Brad Campbell @ 2013-07-05 3:06 UTC (permalink / raw) To: Sam Bingner; +Cc: Christoph Anton Mitterer, linux-raid@vger.kernel.org On 05/07/13 09:39, Sam Bingner wrote: > On 7/4/13 3:13 PM, "Brad Campbell" <lists2009@fnarfbargle.com> wrote: > >> On 05/07/13 02:30, Christoph Anton Mitterer wrote: >> When I say max out, I run into bandwidth limits on the hardware before >> dmcrypt gets in the way. >> > > That is because your CPU has encryption features - the QNAP devices > largely do not; I replaced the CPU in mine with one that had encryption > features because otherwise there was nothing that could bring the > performance above about 80MB/sec > > > Once I put in a CPU supporting AESNI I could get up to about 500MB/sec - > and this was still not CPU bound Yes, of course. I forgot about the dedicated storage device rather than a general server. I run dmcrypt on a WD mybook live and get about 8MB/s on that (which is enough for what I need). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID (OT) 2013-07-05 1:39 ` Sam Bingner 2013-07-05 3:06 ` Brad Campbell @ 2013-07-06 1:23 ` Christoph Anton Mitterer 2013-07-06 6:23 ` Sam Bingner 1 sibling, 1 reply; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-06 1:23 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 787 bytes --] On Fri, 2013-07-05 at 01:39 +0000, Sam Bingner wrote: > That is because your CPU has encryption features - the QNAP devices > largely do not; I replaced the CPU in mine with one that had encryption > features because otherwise there was nothing that could bring the > performance above about 80MB/sec That's a bit off topic now to my questions, but I guess many other people using RAIDs on their QNAPs might be interested later as well: which QNAP do you have exactly? I mean they all have either ARM based CPUs, or Atom... so in my case, a D2700 with an FCBGA559... which CPU (that has AESNI) could you find for that? Since AFAIK there are no[0] Atoms at all with AESNI? Cheers, Chris. [0] http://ark.intel.com/search/advanced/?s=t&Sockets=FCBGA559&AESTech=true [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID (OT) 2013-07-06 1:23 ` some general questions on RAID (OT) Christoph Anton Mitterer @ 2013-07-06 6:23 ` Sam Bingner 2013-07-06 15:11 ` Christoph Anton Mitterer 0 siblings, 1 reply; 21+ messages in thread From: Sam Bingner @ 2013-07-06 6:23 UTC (permalink / raw) To: Christoph Anton Mitterer, linux-raid@vger.kernel.org The TS-879 and 1079 have Intel i3 Processors, and use a normal socket. I replaced it with an E3-1275 Xeon processor. You can use Sandy bridge but they do not have BIOS support for Ivy bridge processors. I also replaced my memory with 16GB ECC memory. I did a lot of research about this stuff before buying it, which is why I went with the TS-1079 Pro. Works great with debian except for some lack of LED support on a couple drives and the LCD always says "BootingŠ" Sam On 7/5/13 3:23 PM, "Christoph Anton Mitterer" <calestyo@scientia.net> wrote: >On Fri, 2013-07-05 at 01:39 +0000, Sam Bingner wrote: >> That is because your CPU has encryption features - the QNAP devices >> largely do not; I replaced the CPU in mine with one that had encryption >> features because otherwise there was nothing that could bring the >> performance above about 80MB/sec >That's a bit off topic now to my questions, but I guess many other >people using RAIDs on their QNAPs might be interested later as well: >which QNAP do you have exactly? > >I mean they all have either ARM based CPUs, or Atom... so in my case, a >D2700 with an FCBGA559... which CPU (that has AESNI) could you find for >that? Since AFAIK there are no[0] Atoms at all with AESNI? > > >Cheers, >Chris. > > >[0] >http://ark.intel.com/search/advanced/?s=t&Sockets=FCBGA559&AESTech=true -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: some general questions on RAID (OT) 2013-07-06 6:23 ` Sam Bingner @ 2013-07-06 15:11 ` Christoph Anton Mitterer 0 siblings, 0 replies; 21+ messages in thread From: Christoph Anton Mitterer @ 2013-07-06 15:11 UTC (permalink / raw) To: linux-raid [-- Attachment #1: Type: text/plain, Size: 619 bytes --] On Sat, 2013-07-06 at 06:23 +0000, Sam Bingner wrote: > Works great with debian except for some lack > of LED support on a couple drives and the LCD always says "BootingŠ" I tried a lot to get the LEDs/buzzers/buttons working (but doesn't seem to be easy on the Intel based devices), you may be interested in: http://thread.gmane.org/gmane.linux.kernel/1508763 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712283 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=712191 https://github.com/groeck/it87/issues/1 Getting the LCD work is very easy... though (see also the links above). Cheers, Chris. [-- Attachment #2: smime.p7s --] [-- Type: application/x-pkcs7-signature, Size: 5113 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2013-07-08 5:40 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-07-04 18:30 some general questions on RAID Christoph Anton Mitterer 2013-07-04 22:07 ` Phil Turmel 2013-07-04 23:34 ` Christoph Anton Mitterer 2013-07-08 4:48 ` NeilBrown 2013-07-06 1:33 ` Christoph Anton Mitterer 2013-07-06 8:52 ` Stan Hoeppner 2013-07-06 15:15 ` Christoph Anton Mitterer 2013-07-07 16:51 ` Stan Hoeppner 2013-07-07 17:39 ` Milan Broz 2013-07-07 18:01 ` Christoph Anton Mitterer 2013-07-07 18:50 ` Milan Broz 2013-07-07 20:51 ` Christoph Anton Mitterer 2013-07-08 5:40 ` Milan Broz 2013-07-08 4:53 ` NeilBrown 2013-07-08 5:25 ` Milan Broz 2013-07-05 1:13 ` Brad Campbell 2013-07-05 1:39 ` Sam Bingner 2013-07-05 3:06 ` Brad Campbell 2013-07-06 1:23 ` some general questions on RAID (OT) Christoph Anton Mitterer 2013-07-06 6:23 ` Sam Bingner 2013-07-06 15:11 ` Christoph Anton Mitterer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).