* filesystem stripe parameters @ 2009-06-18 19:08 Wil Reichert 2009-06-19 9:15 ` Michael Tokarev 0 siblings, 1 reply; 7+ messages in thread From: Wil Reichert @ 2009-06-18 19:08 UTC (permalink / raw) To: linux raid When using LVM on top of RAID 5, is it still worthwhile to pass RAID stripe information to the filesystem on creation? Or do the PE's in LVM blur the specific stripe sizes & I'd want to use some multiple of those instead? Wil ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-18 19:08 filesystem stripe parameters Wil Reichert @ 2009-06-19 9:15 ` Michael Tokarev 2009-06-19 9:36 ` Robin Hill ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Michael Tokarev @ 2009-06-19 9:15 UTC (permalink / raw) To: Wil Reichert; +Cc: linux raid Wil Reichert wrote: > When using LVM on top of RAID 5, is it still worthwhile to pass RAID > stripe information to the filesystem on creation? Or do the PE's in > LVM blur the specific stripe sizes & I'd want to use some multiple of > those instead? It's a very good question, especially in context of RAID5. Yes it is still a good idea to pass that info because it is still a RAID5 which requires proper treatment wrt unaligned writes and keeping redundancy. But the thing is that RAID5 and LVM are not good to each other UNLESS RAID5 consists of 3, 5 or 9 (or 17 etc) drives -- i.e. 2^N+1, so that there's 2^N data drives. This is because LVM can only have blocksize as a power of two and in order to be useful that blocksize should be a multiple of RAID5 data row size (stripe size etc). This is only possible when RAID5 has 2^N data drives or 2^N+1 total drives. The same is for RAID4, and for RAID6 it's 2^N+2 since RAID6 has 2 parity drives. But if you can't match LVM blocksize and RAID strip size, there's *almost* no point at telling raid parameters to the filesystem: no matter how hard you'll try, LVM will make the whole thing non-optimal. Ok, depending on the number of drives, *some* logical volumes will be properly aligned, but definitely not all of them. For example on a 4-drive RAID5 array, only every 3rd volume will be ok, -- provided the volumes are allocated in full one after another, without holes and fragmentation. When everything is properly aligned, it's still worth the effort IMHO to tell the filesystem about true raid properties. /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-19 9:15 ` Michael Tokarev @ 2009-06-19 9:36 ` Robin Hill 2009-06-19 20:59 ` Justin Perreault 2009-06-20 0:26 ` Wil Reichert 2 siblings, 0 replies; 7+ messages in thread From: Robin Hill @ 2009-06-19 9:36 UTC (permalink / raw) To: linux raid [-- Attachment #1: Type: text/plain, Size: 2340 bytes --] On Fri Jun 19, 2009 at 01:15:41PM +0400, Michael Tokarev wrote: > Wil Reichert wrote: >> When using LVM on top of RAID 5, is it still worthwhile to pass RAID >> stripe information to the filesystem on creation? Or do the PE's in >> LVM blur the specific stripe sizes & I'd want to use some multiple of >> those instead? > > It's a very good question, especially in context of RAID5. > > Yes it is still a good idea to pass that info because it is still a > RAID5 which requires proper treatment wrt unaligned writes and keeping > redundancy. > > But the thing is that RAID5 and LVM are not good to each other UNLESS > RAID5 consists of 3, 5 or 9 (or 17 etc) drives -- i.e. 2^N+1, so that > there's 2^N data drives. > > This is because LVM can only have blocksize as a power of two and in > order to be useful that blocksize should be a multiple of RAID5 data > row size (stripe size etc). > > This is only possible when RAID5 has 2^N data drives or 2^N+1 total > drives. The same is for RAID4, and for RAID6 it's 2^N+2 since RAID6 > has 2 parity drives. > > But if you can't match LVM blocksize and RAID strip size, there's > *almost* no point at telling raid parameters to the filesystem: no > matter how hard you'll try, LVM will make the whole thing non-optimal. > > Ok, depending on the number of drives, *some* logical volumes will > be properly aligned, but definitely not all of them. For example > on a 4-drive RAID5 array, only every 3rd volume will be ok, -- > provided the volumes are allocated in full one after another, > without holes and fragmentation. > > When everything is properly aligned, it's still worth the effort > IMHO to tell the filesystem about true raid properties. > You'll also need to get the LVM header padding right, otherwise the filesystem won't start on a stripe boundary and the alignment will all go wrong. I've no idea of the syntax but I recall seeing it discussed on here several times. A quick search throws up: http://www.issociate.de/board/goto/1859627/stride_/_stripe_alignment_on_LVM_?.html Cheers, Robin -- ___ ( ' } | Robin Hill <robin@robinhill.me.uk> | / / ) | Little Jim says .... | // !! | "He fallen in de water !!" | [-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-19 9:15 ` Michael Tokarev 2009-06-19 9:36 ` Robin Hill @ 2009-06-19 20:59 ` Justin Perreault 2009-06-20 6:35 ` Michael Tokarev 2009-06-20 0:26 ` Wil Reichert 2 siblings, 1 reply; 7+ messages in thread From: Justin Perreault @ 2009-06-19 20:59 UTC (permalink / raw) To: Michael Tokarev; +Cc: Wil Reichert, linux raid Still learning, please be gentle. On Fri, 2009-06-19 at 13:15 +0400, Michael Tokarev wrote: > Wil Reichert wrote: > > When using LVM on top of RAID 5, is it still worthwhile to pass RAID > > stripe information to the filesystem on creation? Or do the PE's in > > LVM blur the specific stripe sizes & I'd want to use some multiple of > > those instead? > Yes it is still a good idea to pass that info because it is still a > RAID5 which requires proper treatment wrt unaligned writes and keeping > redundancy. > > But the thing is that RAID5 and LVM are not good to each other UNLESS > RAID5 consists of 3, 5 or 9 (or 17 etc) drives -- i.e. 2^N+1, so that > there's 2^N data drives. > > This is because LVM can only have blocksize as a power of two and in > order to be useful that blocksize should be a multiple of RAID5 data > row size (stripe size etc). > > This is only possible when RAID5 has 2^N data drives or 2^N+1 total > drives. The same is for RAID4, and for RAID6 it's 2^N+2 since RAID6 > has 2 parity drives. > > But if you can't match LVM blocksize and RAID strip size, there's > *almost* no point at telling raid parameters to the filesystem: no > matter how hard you'll try, LVM will make the whole thing non-optimal. 2.5 questions: 1) Will this same issue affect a 5+0 raid array? 2) It is inferred that one can choose to not tell the filesystem the raid parameters, what negative effect does not doing it have? Conversely, what is the positive effect does doing it have? Thanks, Justin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-19 20:59 ` Justin Perreault @ 2009-06-20 6:35 ` Michael Tokarev 0 siblings, 0 replies; 7+ messages in thread From: Michael Tokarev @ 2009-06-20 6:35 UTC (permalink / raw) To: Justin Perreault; +Cc: Wil Reichert, linux raid Justin Perreault wrote: > Still learning, please be gentle. > > On Fri, 2009-06-19 at 13:15 +0400, Michael Tokarev wrote: >> Wil Reichert wrote: >>> When using LVM on top of RAID 5, is it still worthwhile to pass RAID >>> stripe information to the filesystem on creation? Or do the PE's in >>> LVM blur the specific stripe sizes & I'd want to use some multiple of >>> those instead? >> Yes it is still a good idea to pass that info because it is still a >> RAID5 which requires proper treatment wrt unaligned writes and keeping >> redundancy. >> >> But the thing is that RAID5 and LVM are not good to each other UNLESS >> RAID5 consists of 3, 5 or 9 (or 17 etc) drives -- i.e. 2^N+1, so that >> there's 2^N data drives. >> >> This is because LVM can only have blocksize as a power of two and in >> order to be useful that blocksize should be a multiple of RAID5 data >> row size (stripe size etc). >> >> This is only possible when RAID5 has 2^N data drives or 2^N+1 total >> drives. The same is for RAID4, and for RAID6 it's 2^N+2 since RAID6 >> has 2 parity drives. >> >> But if you can't match LVM blocksize and RAID strip size, there's >> *almost* no point at telling raid parameters to the filesystem: no >> matter how hard you'll try, LVM will make the whole thing non-optimal. > > 2.5 questions: > > 1) Will this same issue affect a 5+0 raid array? Yes, definitely. But with 5+0 it's a bit more complicated. In that case each raid5 should have 3, 5, 9 etc (2^N+1) drives and by combining the two into raid0 you'll have "combined stripe size" of 2*2^N which is still power of two and hence can be used with lvm. You still need to tell the fs about raid5 properties, not raid0, but this is really questionable. > 2) It is inferred that one can choose to not tell the filesystem the > raid parameters, what negative effect does not doing it have? > Conversely, what is the positive effect does doing it have? It's covered by the mkfs.ext3 and mkfs.xfs manpages. Telling the fs about your raid properties serves for two purposes - the filesystem tries to avoid read-modify-write cycle for raid5 (the most expensive thing, unavoidable if partitions/volumes are not aligned to the raid stripe-width) and tries to place various data to different disks. The most expensive thing is read-modify-write for writes on raid[456]. Basically, if you write only "small" amount of data, raid5 needs to re-calculate and re-write the parity block which is a function of your new data and content of all the other data in this stripe. So it has to read either all other data blocks from this raid row or at least the previous content of the blocks you're writing AND the previous parity block, -- in order to calculate new parity. On the other hand if you write whole stripe (or more), there's no need to read anything, all the data needed to calculate new parity is already here. So basically read-modify-write (for small/unaligned writes) is 3x more operations (plus seeks!) than direct write (for large and aligned writes). But note that by telling the filesystem about the raid properties we don't affect the file data itself, or, rather, how our applications will access it. Filesystem can change metadata location and file placement, but not the way how userspace writes. Ok, the fs can also perform smarter buffering, so that buffered writes will be sent to raid5 in multiplies of raid stripe width. Note also that for reads, especially for "large enough" reads all this alignment etc has little effect. /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-19 9:15 ` Michael Tokarev 2009-06-19 9:36 ` Robin Hill 2009-06-19 20:59 ` Justin Perreault @ 2009-06-20 0:26 ` Wil Reichert 2009-06-20 6:19 ` Michael Tokarev 2 siblings, 1 reply; 7+ messages in thread From: Wil Reichert @ 2009-06-20 0:26 UTC (permalink / raw) To: Michael Tokarev; +Cc: linux raid On Fri, Jun 19, 2009 at 2:15 AM, Michael Tokarev<mjt@tls.msk.ru> wrote: > Wil Reichert wrote: >> >> When using LVM on top of RAID 5, is it still worthwhile to pass RAID >> stripe information to the filesystem on creation? Or do the PE's in >> LVM blur the specific stripe sizes & I'd want to use some multiple of >> those instead? > > It's a very good question, especially in context of RAID5. > > Yes it is still a good idea to pass that info because it is still a > RAID5 which requires proper treatment wrt unaligned writes and keeping > redundancy. > > But the thing is that RAID5 and LVM are not good to each other UNLESS > RAID5 consists of 3, 5 or 9 (or 17 etc) drives -- i.e. 2^N+1, so that > there's 2^N data drives. > > This is because LVM can only have blocksize as a power of two and in > order to be useful that blocksize should be a multiple of RAID5 data > row size (stripe size etc). > > This is only possible when RAID5 has 2^N data drives or 2^N+1 total > drives. The same is for RAID4, and for RAID6 it's 2^N+2 since RAID6 > has 2 parity drives. > > But if you can't match LVM blocksize and RAID strip size, there's > *almost* no point at telling raid parameters to the filesystem: no > matter how hard you'll try, LVM will make the whole thing non-optimal. > > Ok, depending on the number of drives, *some* logical volumes will > be properly aligned, but definitely not all of them. For example > on a 4-drive RAID5 array, only every 3rd volume will be ok, -- > provided the volumes are allocated in full one after another, > without holes and fragmentation. > > When everything is properly aligned, it's still worth the effort > IMHO to tell the filesystem about true raid properties. Several questions answered, more questions arise =) I'm using 3 1T discs, so it seems I'm in luck. My chunk size is 128k, my PE size is the default 4M. Using mkfs.ext4 as an example, it takes stride (chunk) and stripe-width ( chunk * (N-1) ) parameters. So which would be optimal - using the RAID values (i.e. 128k, 256k) or the LVM values (i.e. 4M, 8M) when creating the filesystem or is there no right answer and it just depends on the usage pattern? Wil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: filesystem stripe parameters 2009-06-20 0:26 ` Wil Reichert @ 2009-06-20 6:19 ` Michael Tokarev 0 siblings, 0 replies; 7+ messages in thread From: Michael Tokarev @ 2009-06-20 6:19 UTC (permalink / raw) To: Wil Reichert; +Cc: linux raid Wil Reichert wrote: > On Fri, Jun 19, 2009 at 2:15 AM, Michael Tokarev<mjt@tls.msk.ru> wrote: [] >> When everything is properly aligned, it's still worth the effort >> IMHO to tell the filesystem about true raid properties. > > Several questions answered, more questions arise =) > > I'm using 3 1T discs, so it seems I'm in luck. My chunk size is 128k, > my PE size is the default 4M. Using mkfs.ext4 as an example, it takes > stride (chunk) and stripe-width ( chunk * (N-1) ) parameters. So > which would be optimal - using the RAID values (i.e. 128k, 256k) or > the LVM values (i.e. 4M, 8M) when creating the filesystem or is there > no right answer and it just depends on the usage pattern? No, see my last statement from my initial email, quoted above. Tell the fs about your raid. If raid strips are combined using some other way it's still raid and it's still strip size that matters much. After all, you need two parameters for the fs (chunk + stripe-width) not one (lvm block size) -- this fact already telling :) /mjt ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-06-20 6:35 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-18 19:08 filesystem stripe parameters Wil Reichert 2009-06-19 9:15 ` Michael Tokarev 2009-06-19 9:36 ` Robin Hill 2009-06-19 20:59 ` Justin Perreault 2009-06-20 6:35 ` Michael Tokarev 2009-06-20 0:26 ` Wil Reichert 2009-06-20 6:19 ` Michael Tokarev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).