* xfs hardware RAID alignment over linear lvm @ 2013-09-25 12:56 Stewart Webb 2013-09-25 21:18 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: Stewart Webb @ 2013-09-25 12:56 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 531 bytes --] Hi All, I am trying to do the following: 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks presented to the OS all raid units have a "stripe size" of 512 KB so given the info on the xfs.org wiki - I sould give each filesystem a sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) all well and good But - I would like to use Linear LVM to bring all 3 cards into 1 logical volume - here is where my question crops up: Does this effect how I need to align the filesystem? Regards -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 818 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 12:56 xfs hardware RAID alignment over linear lvm Stewart Webb @ 2013-09-25 21:18 ` Stan Hoeppner 2013-09-25 21:34 ` Chris Murphy 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2013-09-25 21:18 UTC (permalink / raw) To: Stewart Webb; +Cc: xfs On 9/25/2013 7:56 AM, Stewart Webb wrote: > Hi All, Hi Stewart, > I am trying to do the following: > 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks presented to > the OS > all raid units have a "stripe size" of 512 KB Just for future reference so you're using correct terminology, a value of 512KB is surely your XFS su value, also called a "strip" in LSI terminology, or a "chunk" in Linux software md/RAID terminology. This is the amount of data written to each data spindle (excluding parity) in the array. "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the amount of data written across the full RAID stripe (excluding parity). > so given the info on the xfs.org wiki - I sould give each filesystem a > sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) Partially correct. If you format each /dev/[device] presented by the RAID controller with an XFS filesystem, 3 filesystems total, then your values above are correct. EXCEPT you must use the su/sw parameters in mkfs.xfs if using BYTE values. See mkfs.xfs(8) > all well and good > > But - I would like to use Linear LVM to bring all 3 cards into 1 logical > volume - > here is where my question crops up: > Does this effect how I need to align the filesystem? In the case of a concatenation, which is what LVM linear is, you should use an XFS alignment identical to that for a single array as above. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:18 ` Stan Hoeppner @ 2013-09-25 21:34 ` Chris Murphy 2013-09-25 21:48 ` Stan Hoeppner 2013-09-25 21:57 ` Dave Chinner 0 siblings, 2 replies; 17+ messages in thread From: Chris Murphy @ 2013-09-25 21:34 UTC (permalink / raw) To: xfs@oss.sgi.com On Sep 25, 2013, at 3:18 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 9/25/2013 7:56 AM, Stewart Webb wrote: >> Hi All, > > Hi Stewart, > >> I am trying to do the following: >> 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks presented to >> the OS >> all raid units have a "stripe size" of 512 KB > > Just for future reference so you're using correct terminology, a value > of 512KB is surely your XFS su value, also called a "strip" in LSI > terminology, or a "chunk" in Linux software md/RAID terminology. This > is the amount of data written to each data spindle (excluding parity) in > the array. > > "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the > amount of data written across the full RAID stripe (excluding parity). > >> so given the info on the xfs.org wiki - I sould give each filesystem a >> sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) > > Partially correct. If you format each /dev/[device] presented by the > RAID controller with an XFS filesystem, 3 filesystems total, then your > values above are correct. EXCEPT you must use the su/sw parameters in > mkfs.xfs if using BYTE values. See mkfs.xfs(8) > >> all well and good >> >> But - I would like to use Linear LVM to bring all 3 cards into 1 logical >> volume - >> here is where my question crops up: >> Does this effect how I need to align the filesystem? > > In the case of a concatenation, which is what LVM linear is, you should > use an XFS alignment identical to that for a single array as above. So keeping the example, 3 arrays x 10 data disks, would this be su=512k and sw=30? Chris Murphy _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:34 ` Chris Murphy @ 2013-09-25 21:48 ` Stan Hoeppner 2013-09-25 21:53 ` Chris Murphy 2013-09-25 21:57 ` Dave Chinner 1 sibling, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2013-09-25 21:48 UTC (permalink / raw) To: Chris Murphy; +Cc: xfs@oss.sgi.com On 9/25/2013 4:34 PM, Chris Murphy wrote: > > On Sep 25, 2013, at 3:18 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: > >> On 9/25/2013 7:56 AM, Stewart Webb wrote: >>> Hi All, >> >> Hi Stewart, >> >>> I am trying to do the following: >>> 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks presented to >>> the OS >>> all raid units have a "stripe size" of 512 KB >> >> Just for future reference so you're using correct terminology, a value >> of 512KB is surely your XFS su value, also called a "strip" in LSI >> terminology, or a "chunk" in Linux software md/RAID terminology. This >> is the amount of data written to each data spindle (excluding parity) in >> the array. >> >> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the >> amount of data written across the full RAID stripe (excluding parity). >> >>> so given the info on the xfs.org wiki - I sould give each filesystem a >>> sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) >> >> Partially correct. If you format each /dev/[device] presented by the >> RAID controller with an XFS filesystem, 3 filesystems total, then your >> values above are correct. EXCEPT you must use the su/sw parameters in >> mkfs.xfs if using BYTE values. See mkfs.xfs(8) Small correction: su is a byte value. sw is an integer representing the number of data spindles. >>> all well and good >>> >>> But - I would like to use Linear LVM to bring all 3 cards into 1 logical >>> volume - >>> here is where my question crops up: >>> Does this effect how I need to align the filesystem? >> >> In the case of a concatenation, which is what LVM linear is, you should >> use an XFS alignment identical to that for a single array as above. > > So keeping the example, 3 arrays x 10 data disks, would this be su=512k and sw=30? No. In this configuration, as far as XFS is concerned LVM doesn't exist in the stack because it doesn't change the RAID geometry, so you ignore it. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:48 ` Stan Hoeppner @ 2013-09-25 21:53 ` Chris Murphy 0 siblings, 0 replies; 17+ messages in thread From: Chris Murphy @ 2013-09-25 21:53 UTC (permalink / raw) To: xfs@oss.sgi.com On Sep 25, 2013, at 3:48 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: >> >> So keeping the example, 3 arrays x 10 data disks, would this be su=512k and sw=30? > > No. In this configuration, as far as XFS is concerned LVM doesn't exist > in the stack because it doesn't change the RAID geometry, so you ignore it. OK and if this were md linear where the file system definitely would be created across all disks? Chris Murphy _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:34 ` Chris Murphy 2013-09-25 21:48 ` Stan Hoeppner @ 2013-09-25 21:57 ` Dave Chinner 2013-09-26 8:44 ` Stan Hoeppner 2013-09-26 8:55 ` Stewart Webb 1 sibling, 2 replies; 17+ messages in thread From: Dave Chinner @ 2013-09-25 21:57 UTC (permalink / raw) To: Chris Murphy; +Cc: xfs@oss.sgi.com On Wed, Sep 25, 2013 at 03:34:01PM -0600, Chris Murphy wrote: > > On Sep 25, 2013, at 3:18 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote: > > > On 9/25/2013 7:56 AM, Stewart Webb wrote: > >> Hi All, > > > > Hi Stewart, > > > >> I am trying to do the following: > >> 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks presented to > >> the OS > >> all raid units have a "stripe size" of 512 KB > > > > Just for future reference so you're using correct terminology, a value > > of 512KB is surely your XFS su value, also called a "strip" in LSI > > terminology, or a "chunk" in Linux software md/RAID terminology. This > > is the amount of data written to each data spindle (excluding parity) in > > the array. > > > > "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the > > amount of data written across the full RAID stripe (excluding parity). > > > >> so given the info on the xfs.org wiki - I sould give each filesystem a > >> sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) > > > > Partially correct. If you format each /dev/[device] presented by the > > RAID controller with an XFS filesystem, 3 filesystems total, then your > > values above are correct. EXCEPT you must use the su/sw parameters in > > mkfs.xfs if using BYTE values. See mkfs.xfs(8) > > > >> all well and good > >> > >> But - I would like to use Linear LVM to bring all 3 cards into 1 logical > >> volume - > >> here is where my question crops up: > >> Does this effect how I need to align the filesystem? > > > > In the case of a concatenation, which is what LVM linear is, you should > > use an XFS alignment identical to that for a single array as above. ^^^^^^ > So keeping the example, 3 arrays x 10 data disks, would this be su=512k and sw=30? No, the alignment should match that of a *single* 10 disk array, so su=512k,sw=10. Linear concatentation looks like this: offset volume array 0 +-D1-+-D2-+.....+-Dn-+ 0 # first sw ..... X-sw +-D1-+-D2-+.....+-Dn-+ 0 X +-E1-+-E2-+.....+-En-+ 1 # first sw ..... 2X-sw +-E1-+-E2-+.....+-En-+ 1 2X +-F1-+-F2-+.....+-Fn-+ 2 # first sw ..... 3X-sw +-F1-+-F2-+.....+-Fn-+ 2 Where: D1...Dn are the disks in the first array E1...En are the disks in the second array F1...Fn are the disks in the third array X is the size of the each array sw = su * number of data disks in array As you can see, all the volumes are arranged in a single column - identical to a larger single array of the same size. Hence the exposed alignment of a single array is what the filesystem should be aligned to, as that is how the linear concat behaves. You also might note here that if you want the second and subsequent arrays to be correctly aligned to the initial array in the linear concat (and you do want that), the arrays must be sized to be an exact multiple of the stripe width. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:57 ` Dave Chinner @ 2013-09-26 8:44 ` Stan Hoeppner 2013-09-26 8:55 ` Stewart Webb 1 sibling, 0 replies; 17+ messages in thread From: Stan Hoeppner @ 2013-09-26 8:44 UTC (permalink / raw) To: Dave Chinner; +Cc: Chris Murphy, xfs@oss.sgi.com On 9/25/2013 4:57 PM, Dave Chinner wrote: ... > Linear concatentation looks like this: > > offset volume array > 0 +-D1-+-D2-+.....+-Dn-+ 0 # first sw > ..... > X-sw +-D1-+-D2-+.....+-Dn-+ 0 > X +-E1-+-E2-+.....+-En-+ 1 # first sw > ..... > 2X-sw +-E1-+-E2-+.....+-En-+ 1 > 2X +-F1-+-F2-+.....+-Fn-+ 2 # first sw > ..... > 3X-sw +-F1-+-F2-+.....+-Fn-+ 2 > > Where: > D1...Dn are the disks in the first array > E1...En are the disks in the second array > F1...Fn are the disks in the third array > X is the size of the each array > sw = su * number of data disks in array > > As you can see, all the volumes are arranged in a single column - > identical to a larger single array of the same size. Hence the > exposed alignment of a single array is what the filesystem should be > aligned to, as that is how the linear concat behaves. > > You also might note here that if you want the second and subsequent > arrays to be correctly aligned to the initial array in the linear > concat (and you do want that), the arrays must be sized to be an > exact multiple of the stripe width. On a similar note, if I do a concat like this I specify agsize/agcount during mkfs.xfs so no AGs straddle array boundaries. I do this to keep per AG throughput consistent, among other concerns. This may or may not be of benefit to the OP. mkfs.xfs using defaults is not aware of the array boundaries within the concat, so it may well create AGs across array boundaries. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-25 21:57 ` Dave Chinner 2013-09-26 8:44 ` Stan Hoeppner @ 2013-09-26 8:55 ` Stewart Webb 2013-09-26 9:22 ` Stan Hoeppner 1 sibling, 1 reply; 17+ messages in thread From: Stewart Webb @ 2013-09-26 8:55 UTC (permalink / raw) To: Dave Chinner; +Cc: Chris Murphy, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 4848 bytes --] Thanks for all this info Stan and Dave, > "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the > amount of data written across the full RAID stripe (excluding parity). The reason I stated Stripe size is because in this instance, I have 3ware RAID controllers, which refer to this value as "Stripe" in their tw_cli software (god bless manufacturers renaming everything) I do, however, have a follow-on question: On other systems, I have similar hardware: 3x Raid Controllers 1 of them has 10 disks as RAID 6 that I would like to add to a logical volume 2 of them have 12 disks as a RAID 6 that I would like to add to the same logical volume All have the same "Stripe" or "Strip Size" of 512 KB So if I where going to make 3 seperate xfs volumes, I would do the following: mkfs.xfs -d su=512k sw=8 /dev/sda mkfs.xfs -d su=512k sw=10 /dev/sdb mkfs.xfs -d su=512k sw=10 /dev/sdc I assume, If I where going to bring them all into 1 logical volume, it would be best placed to have the sw value set to a value that is divisible by both 8 and 10 - in this case 2? Obviously, this is not an ideal situation, and I will most likely modify the hardware to better suite. But I'd really like to fully understand this. Thanks for any insight you are able to give Regards On 25 September 2013 22:57, Dave Chinner <david@fromorbit.com> wrote: > On Wed, Sep 25, 2013 at 03:34:01PM -0600, Chris Murphy wrote: > > > > On Sep 25, 2013, at 3:18 PM, Stan Hoeppner <stan@hardwarefreak.com> > wrote: > > > > > On 9/25/2013 7:56 AM, Stewart Webb wrote: > > >> Hi All, > > > > > > Hi Stewart, > > > > > >> I am trying to do the following: > > >> 3 x Hardware RAID Cards each with a raid 6 volume of 12 disks > presented to > > >> the OS > > >> all raid units have a "stripe size" of 512 KB > > > > > > Just for future reference so you're using correct terminology, a value > > > of 512KB is surely your XFS su value, also called a "strip" in LSI > > > terminology, or a "chunk" in Linux software md/RAID terminology. This > > > is the amount of data written to each data spindle (excluding parity) > in > > > the array. > > > > > > "Stripe size" is a synonym of XFS sw, which is su * #disks. This is > the > > > amount of data written across the full RAID stripe (excluding parity). > > > > > >> so given the info on the xfs.org wiki - I sould give each filesystem > a > > >> sunit of 512 KB and a swidth of 10 (because RAID 6 has 2 parity disks) > > > > > > Partially correct. If you format each /dev/[device] presented by the > > > RAID controller with an XFS filesystem, 3 filesystems total, then your > > > values above are correct. EXCEPT you must use the su/sw parameters in > > > mkfs.xfs if using BYTE values. See mkfs.xfs(8) > > > > > >> all well and good > > >> > > >> But - I would like to use Linear LVM to bring all 3 cards into 1 > logical > > >> volume - > > >> here is where my question crops up: > > >> Does this effect how I need to align the filesystem? > > > > > > In the case of a concatenation, which is what LVM linear is, you should > > > use an XFS alignment identical to that for a single array as above. > ^^^^^^ > > So keeping the example, 3 arrays x 10 data disks, would this be su=512k > and sw=30? > > No, the alignment should match that of a *single* 10 disk array, > so su=512k,sw=10. > > Linear concatentation looks like this: > > offset volume array > 0 +-D1-+-D2-+.....+-Dn-+ 0 # first sw > ..... > X-sw +-D1-+-D2-+.....+-Dn-+ 0 > X +-E1-+-E2-+.....+-En-+ 1 # first sw > ..... > 2X-sw +-E1-+-E2-+.....+-En-+ 1 > 2X +-F1-+-F2-+.....+-Fn-+ 2 # first sw > ..... > 3X-sw +-F1-+-F2-+.....+-Fn-+ 2 > > Where: > D1...Dn are the disks in the first array > E1...En are the disks in the second array > F1...Fn are the disks in the third array > X is the size of the each array > sw = su * number of data disks in array > > As you can see, all the volumes are arranged in a single column - > identical to a larger single array of the same size. Hence the > exposed alignment of a single array is what the filesystem should be > aligned to, as that is how the linear concat behaves. > > You also might note here that if you want the second and subsequent > arrays to be correctly aligned to the initial array in the linear > concat (and you do want that), the arrays must be sized to be an > exact multiple of the stripe width. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 6832 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-26 8:55 ` Stewart Webb @ 2013-09-26 9:22 ` Stan Hoeppner 2013-09-26 9:28 ` Stewart Webb 2013-09-26 21:58 ` Dave Chinner 0 siblings, 2 replies; 17+ messages in thread From: Stan Hoeppner @ 2013-09-26 9:22 UTC (permalink / raw) To: Stewart Webb; +Cc: Chris Murphy, xfs@oss.sgi.com On 9/26/2013 3:55 AM, Stewart Webb wrote: > Thanks for all this info Stan and Dave, > >> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the >> amount of data written across the full RAID stripe (excluding parity). > > The reason I stated Stripe size is because in this instance, I have 3ware > RAID controllers, which refer to > this value as "Stripe" in their tw_cli software (god bless manufacturers > renaming everything) > > I do, however, have a follow-on question: > On other systems, I have similar hardware: > 3x Raid Controllers > 1 of them has 10 disks as RAID 6 that I would like to add to a logical > volume > 2 of them have 12 disks as a RAID 6 that I would like to add to the same > logical volume > > All have the same "Stripe" or "Strip Size" of 512 KB > > So if I where going to make 3 seperate xfs volumes, I would do the > following: > mkfs.xfs -d su=512k sw=8 /dev/sda > mkfs.xfs -d su=512k sw=10 /dev/sdb > mkfs.xfs -d su=512k sw=10 /dev/sdc > > I assume, If I where going to bring them all into 1 logical volume, it > would be best placed to have the sw value set > to a value that is divisible by both 8 and 10 - in this case 2? No. In this case you do NOT stripe align XFS to the storage, because it's impossible--the RAID stripes are dissimilar. In this case you use the default 4KB write out, as if this is a single disk drive. As Dave stated, if you format a concatenated device with XFS and you desire to align XFS, then all constituent arrays must have the same geometry. Two things to be aware of here: 1. With a decent hardware write caching RAID controller, having XFS alined to the RAID geometry is a small optimization WRT overall write performance, because the controller is going to be doing the optimizing of final writeback to the drives. 2. Alignment does not affect read performance. 3. XFS only performs aligned writes during allocation. I.e. this only occurs when creating a new file, new inode, etc. For append and modify-in-place operations, there is no write alignment. So again, stripe alignment to the hardware geometry is merely an optimization, and only affect some types of writes. What really makes a difference as to whether alignment will be of benefit to you, and how often, is your workload. So at this point, you need to describe the primary workload(s) of your systems we're discussing. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-26 9:22 ` Stan Hoeppner @ 2013-09-26 9:28 ` Stewart Webb 2013-09-26 21:58 ` Dave Chinner 1 sibling, 0 replies; 17+ messages in thread From: Stewart Webb @ 2013-09-26 9:28 UTC (permalink / raw) To: stan; +Cc: Chris Murphy, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 2971 bytes --] Understood, My workload is primarily reads (about 80%+ read operations) - so defaults will most likely be best suited on this occasion. I was simply trying to follow the guidelines on the XFS wiki to the best of my ability, and felt I didn't understand the impact of using this via LVM. Now I feel I understand enough to continue in what I need to do. Thanks again On 26 September 2013 10:22, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 9/26/2013 3:55 AM, Stewart Webb wrote: > > Thanks for all this info Stan and Dave, > > > >> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the > >> amount of data written across the full RAID stripe (excluding parity). > > > > The reason I stated Stripe size is because in this instance, I have 3ware > > RAID controllers, which refer to > > this value as "Stripe" in their tw_cli software (god bless manufacturers > > renaming everything) > > > > I do, however, have a follow-on question: > > On other systems, I have similar hardware: > > 3x Raid Controllers > > 1 of them has 10 disks as RAID 6 that I would like to add to a logical > > volume > > 2 of them have 12 disks as a RAID 6 that I would like to add to the same > > logical volume > > > > All have the same "Stripe" or "Strip Size" of 512 KB > > > > So if I where going to make 3 seperate xfs volumes, I would do the > > following: > > mkfs.xfs -d su=512k sw=8 /dev/sda > > mkfs.xfs -d su=512k sw=10 /dev/sdb > > mkfs.xfs -d su=512k sw=10 /dev/sdc > > > > I assume, If I where going to bring them all into 1 logical volume, it > > would be best placed to have the sw value set > > to a value that is divisible by both 8 and 10 - in this case 2? > > No. In this case you do NOT stripe align XFS to the storage, because > it's impossible--the RAID stripes are dissimilar. In this case you use > the default 4KB write out, as if this is a single disk drive. > > As Dave stated, if you format a concatenated device with XFS and you > desire to align XFS, then all constituent arrays must have the same > geometry. > > Two things to be aware of here: > > 1. With a decent hardware write caching RAID controller, having XFS > alined to the RAID geometry is a small optimization WRT overall write > performance, because the controller is going to be doing the optimizing > of final writeback to the drives. > > 2. Alignment does not affect read performance. > > 3. XFS only performs aligned writes during allocation. I.e. this only > occurs when creating a new file, new inode, etc. For append and > modify-in-place operations, there is no write alignment. So again, > stripe alignment to the hardware geometry is merely an optimization, and > only affect some types of writes. > > What really makes a difference as to whether alignment will be of > benefit to you, and how often, is your workload. So at this point, you > need to describe the primary workload(s) of your systems we're discussing. > > -- > Stan > > -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 3876 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-26 9:22 ` Stan Hoeppner 2013-09-26 9:28 ` Stewart Webb @ 2013-09-26 21:58 ` Dave Chinner 2013-09-27 1:10 ` Stan Hoeppner 1 sibling, 1 reply; 17+ messages in thread From: Dave Chinner @ 2013-09-26 21:58 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Stewart Webb, Chris Murphy, xfs@oss.sgi.com On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: > On 9/26/2013 3:55 AM, Stewart Webb wrote: > > Thanks for all this info Stan and Dave, > > > >> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the > >> amount of data written across the full RAID stripe (excluding parity). > > > > The reason I stated Stripe size is because in this instance, I have 3ware > > RAID controllers, which refer to > > this value as "Stripe" in their tw_cli software (god bless manufacturers > > renaming everything) > > > > I do, however, have a follow-on question: > > On other systems, I have similar hardware: > > 3x Raid Controllers > > 1 of them has 10 disks as RAID 6 that I would like to add to a logical > > volume > > 2 of them have 12 disks as a RAID 6 that I would like to add to the same > > logical volume > > > > All have the same "Stripe" or "Strip Size" of 512 KB > > > > So if I where going to make 3 seperate xfs volumes, I would do the > > following: > > mkfs.xfs -d su=512k sw=8 /dev/sda > > mkfs.xfs -d su=512k sw=10 /dev/sdb > > mkfs.xfs -d su=512k sw=10 /dev/sdc > > > > I assume, If I where going to bring them all into 1 logical volume, it > > would be best placed to have the sw value set > > to a value that is divisible by both 8 and 10 - in this case 2? > > No. In this case you do NOT stripe align XFS to the storage, because > it's impossible--the RAID stripes are dissimilar. In this case you use > the default 4KB write out, as if this is a single disk drive. > > As Dave stated, if you format a concatenated device with XFS and you > desire to align XFS, then all constituent arrays must have the same > geometry. > > Two things to be aware of here: > > 1. With a decent hardware write caching RAID controller, having XFS > alined to the RAID geometry is a small optimization WRT overall write > performance, because the controller is going to be doing the optimizing > of final writeback to the drives. > > 2. Alignment does not affect read performance. Ah, but it does... > 3. XFS only performs aligned writes during allocation. Right, and it does so not only to improve write performance, but to also maximise sequential read performance of the data that is written, especially when multiple files are being read simultaneously and IO latency is important to keep low (e.g. realtime video ingest and playout). > What really makes a difference as to whether alignment will be of > benefit to you, and how often, is your workload. So at this point, you > need to describe the primary workload(s) of your systems we're discussing. Yup, my thoughts exactly... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-26 21:58 ` Dave Chinner @ 2013-09-27 1:10 ` Stan Hoeppner 2013-09-27 12:23 ` Stewart Webb 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2013-09-27 1:10 UTC (permalink / raw) To: Dave Chinner; +Cc: Stewart Webb, Chris Murphy, xfs@oss.sgi.com On 9/26/2013 4:58 PM, Dave Chinner wrote: > On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: >> On 9/26/2013 3:55 AM, Stewart Webb wrote: >>> Thanks for all this info Stan and Dave, >>> >>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is the >>>> amount of data written across the full RAID stripe (excluding parity). >>> >>> The reason I stated Stripe size is because in this instance, I have 3ware >>> RAID controllers, which refer to >>> this value as "Stripe" in their tw_cli software (god bless manufacturers >>> renaming everything) >>> >>> I do, however, have a follow-on question: >>> On other systems, I have similar hardware: >>> 3x Raid Controllers >>> 1 of them has 10 disks as RAID 6 that I would like to add to a logical >>> volume >>> 2 of them have 12 disks as a RAID 6 that I would like to add to the same >>> logical volume >>> >>> All have the same "Stripe" or "Strip Size" of 512 KB >>> >>> So if I where going to make 3 seperate xfs volumes, I would do the >>> following: >>> mkfs.xfs -d su=512k sw=8 /dev/sda >>> mkfs.xfs -d su=512k sw=10 /dev/sdb >>> mkfs.xfs -d su=512k sw=10 /dev/sdc >>> >>> I assume, If I where going to bring them all into 1 logical volume, it >>> would be best placed to have the sw value set >>> to a value that is divisible by both 8 and 10 - in this case 2? >> >> No. In this case you do NOT stripe align XFS to the storage, because >> it's impossible--the RAID stripes are dissimilar. In this case you use >> the default 4KB write out, as if this is a single disk drive. >> >> As Dave stated, if you format a concatenated device with XFS and you >> desire to align XFS, then all constituent arrays must have the same >> geometry. >> >> Two things to be aware of here: >> >> 1. With a decent hardware write caching RAID controller, having XFS >> alined to the RAID geometry is a small optimization WRT overall write >> performance, because the controller is going to be doing the optimizing >> of final writeback to the drives. >> >> 2. Alignment does not affect read performance. > > Ah, but it does... > >> 3. XFS only performs aligned writes during allocation. > > Right, and it does so not only to improve write performance, but to > also maximise sequential read performance of the data that is > written, especially when multiple files are being read > simultaneously and IO latency is important to keep low (e.g. > realtime video ingest and playout). Absolutely correct, as Dave always is. As my workloads are mostly random, as are those of others I consult in other fora, I sometimes forget the [multi]streaming case. Which is not good, as many folks choose XFS specifically for [multi]streaming workloads. My remarks to this audience should always reflect that. Apologies for my oversight on this occasion. >> What really makes a difference as to whether alignment will be of >> benefit to you, and how often, is your workload. So at this point, you >> need to describe the primary workload(s) of your systems we're discussing. > > Yup, my thoughts exactly... > > Cheers, > > Dave. > -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-27 1:10 ` Stan Hoeppner @ 2013-09-27 12:23 ` Stewart Webb 2013-09-27 13:09 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: Stewart Webb @ 2013-09-27 12:23 UTC (permalink / raw) To: stan; +Cc: Chris Murphy, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 3892 bytes --] >Right, and it does so not only to improve write performance, but to >also maximise sequential read performance of the data that is >written, especially when multiple files are being read >simultaneously and IO latency is important to keep low (e.g. >realtime video ingest and playout). So does this mean that I should avoid having devices in RAID with a differing amount of spindles (or non-parity disks) If I would like to use Linear concatenation LVM? Or is there a best practice if this instance is not avoidable? Regards On 27 September 2013 02:10, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 9/26/2013 4:58 PM, Dave Chinner wrote: > > On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: > >> On 9/26/2013 3:55 AM, Stewart Webb wrote: > >>> Thanks for all this info Stan and Dave, > >>> > >>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is > the > >>>> amount of data written across the full RAID stripe (excluding parity). > >>> > >>> The reason I stated Stripe size is because in this instance, I have > 3ware > >>> RAID controllers, which refer to > >>> this value as "Stripe" in their tw_cli software (god bless > manufacturers > >>> renaming everything) > >>> > >>> I do, however, have a follow-on question: > >>> On other systems, I have similar hardware: > >>> 3x Raid Controllers > >>> 1 of them has 10 disks as RAID 6 that I would like to add to a logical > >>> volume > >>> 2 of them have 12 disks as a RAID 6 that I would like to add to the > same > >>> logical volume > >>> > >>> All have the same "Stripe" or "Strip Size" of 512 KB > >>> > >>> So if I where going to make 3 seperate xfs volumes, I would do the > >>> following: > >>> mkfs.xfs -d su=512k sw=8 /dev/sda > >>> mkfs.xfs -d su=512k sw=10 /dev/sdb > >>> mkfs.xfs -d su=512k sw=10 /dev/sdc > >>> > >>> I assume, If I where going to bring them all into 1 logical volume, it > >>> would be best placed to have the sw value set > >>> to a value that is divisible by both 8 and 10 - in this case 2? > >> > >> No. In this case you do NOT stripe align XFS to the storage, because > >> it's impossible--the RAID stripes are dissimilar. In this case you use > >> the default 4KB write out, as if this is a single disk drive. > >> > >> As Dave stated, if you format a concatenated device with XFS and you > >> desire to align XFS, then all constituent arrays must have the same > >> geometry. > >> > >> Two things to be aware of here: > >> > >> 1. With a decent hardware write caching RAID controller, having XFS > >> alined to the RAID geometry is a small optimization WRT overall write > >> performance, because the controller is going to be doing the optimizing > >> of final writeback to the drives. > >> > >> 2. Alignment does not affect read performance. > > > > Ah, but it does... > > > >> 3. XFS only performs aligned writes during allocation. > > > > Right, and it does so not only to improve write performance, but to > > also maximise sequential read performance of the data that is > > written, especially when multiple files are being read > > simultaneously and IO latency is important to keep low (e.g. > > realtime video ingest and playout). > > Absolutely correct, as Dave always is. As my workloads are mostly > random, as are those of others I consult in other fora, I sometimes > forget the [multi]streaming case. Which is not good, as many folks > choose XFS specifically for [multi]streaming workloads. My remarks to > this audience should always reflect that. Apologies for my oversight on > this occasion. > > >> What really makes a difference as to whether alignment will be of > >> benefit to you, and how often, is your workload. So at this point, you > >> need to describe the primary workload(s) of your systems we're > discussing. > > > > Yup, my thoughts exactly... > > > > Cheers, > > > > Dave. > > > > -- > Stan > > -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 6345 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-27 12:23 ` Stewart Webb @ 2013-09-27 13:09 ` Stan Hoeppner 2013-09-27 13:29 ` Stewart Webb 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2013-09-27 13:09 UTC (permalink / raw) To: Stewart Webb; +Cc: Chris Murphy, xfs@oss.sgi.com On 9/27/2013 7:23 AM, Stewart Webb wrote: >> Right, and it does so not only to improve write performance, but to >> also maximise sequential read performance of the data that is >> written, especially when multiple files are being read >> simultaneously and IO latency is important to keep low (e.g. >> realtime video ingest and playout). > > So does this mean that I should avoid having devices in RAID with a > differing amount of spindles (or non-parity disks) > If I would like to use Linear concatenation LVM? Or is there a best > practice if this instance is not > avoidable? Above, Dave was correcting my oversight, not necessarily informing you, per se. It seems clear from your follow up question that you didn't really grasp what he was saying. Let's back up a little bit. What you need to concentrate on right now is the following which we stated previously in the thread, but which you did not reply to: >>>> What really makes a difference as to whether alignment will be of >>>> benefit to you, and how often, is your workload. So at this point, you >>>> need to describe the primary workload(s) of your systems we're >> discussing. >>> >>> Yup, my thoughts exactly... This means you need to describe in detail how you are writing your files, and how you are reading them back. I.e. what application are you using, what does it do, etc. You stated IIRC that your workload is 80% read. What types of files is it reading? Small, large? Is it reading multiple files in parallel? How are these files originally written before being read? Etc, etc. You may not understand why this is relevant, but it is the only thing that is relevant, at this point. Spindles, RAID level, alignment, no alignment...none of this matters if it doesn't match up with how your application(s) do their IO. Rule #1 of storage architecture: Always build your storage stack (i.e. disks, controller, driver, filesystem, etc) to fit the workload(s), not the other way around. > > On 27 September 2013 02:10, Stan Hoeppner <stan@hardwarefreak.com> wrote: > >> On 9/26/2013 4:58 PM, Dave Chinner wrote: >>> On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: >>>> On 9/26/2013 3:55 AM, Stewart Webb wrote: >>>>> Thanks for all this info Stan and Dave, >>>>> >>>>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is >> the >>>>>> amount of data written across the full RAID stripe (excluding parity). >>>>> >>>>> The reason I stated Stripe size is because in this instance, I have >> 3ware >>>>> RAID controllers, which refer to >>>>> this value as "Stripe" in their tw_cli software (god bless >> manufacturers >>>>> renaming everything) >>>>> >>>>> I do, however, have a follow-on question: >>>>> On other systems, I have similar hardware: >>>>> 3x Raid Controllers >>>>> 1 of them has 10 disks as RAID 6 that I would like to add to a logical >>>>> volume >>>>> 2 of them have 12 disks as a RAID 6 that I would like to add to the >> same >>>>> logical volume >>>>> >>>>> All have the same "Stripe" or "Strip Size" of 512 KB >>>>> >>>>> So if I where going to make 3 seperate xfs volumes, I would do the >>>>> following: >>>>> mkfs.xfs -d su=512k sw=8 /dev/sda >>>>> mkfs.xfs -d su=512k sw=10 /dev/sdb >>>>> mkfs.xfs -d su=512k sw=10 /dev/sdc >>>>> >>>>> I assume, If I where going to bring them all into 1 logical volume, it >>>>> would be best placed to have the sw value set >>>>> to a value that is divisible by both 8 and 10 - in this case 2? >>>> >>>> No. In this case you do NOT stripe align XFS to the storage, because >>>> it's impossible--the RAID stripes are dissimilar. In this case you use >>>> the default 4KB write out, as if this is a single disk drive. >>>> >>>> As Dave stated, if you format a concatenated device with XFS and you >>>> desire to align XFS, then all constituent arrays must have the same >>>> geometry. >>>> >>>> Two things to be aware of here: >>>> >>>> 1. With a decent hardware write caching RAID controller, having XFS >>>> alined to the RAID geometry is a small optimization WRT overall write >>>> performance, because the controller is going to be doing the optimizing >>>> of final writeback to the drives. >>>> >>>> 2. Alignment does not affect read performance. >>> >>> Ah, but it does... >>> >>>> 3. XFS only performs aligned writes during allocation. >>> >>> Right, and it does so not only to improve write performance, but to >>> also maximise sequential read performance of the data that is >>> written, especially when multiple files are being read >>> simultaneously and IO latency is important to keep low (e.g. >>> realtime video ingest and playout). >> >> Absolutely correct, as Dave always is. As my workloads are mostly >> random, as are those of others I consult in other fora, I sometimes >> forget the [multi]streaming case. Which is not good, as many folks >> choose XFS specifically for [multi]streaming workloads. My remarks to >> this audience should always reflect that. Apologies for my oversight on >> this occasion. >> >>>> What really makes a difference as to whether alignment will be of >>>> benefit to you, and how often, is your workload. So at this point, you >>>> need to describe the primary workload(s) of your systems we're >> discussing. >>> >>> Yup, my thoughts exactly... >>> >>> Cheers, >>> >>> Dave. >>> >> >> -- >> Stan >> >> > > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-27 13:09 ` Stan Hoeppner @ 2013-09-27 13:29 ` Stewart Webb 2013-09-28 14:54 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: Stewart Webb @ 2013-09-27 13:29 UTC (permalink / raw) To: stan; +Cc: Chris Murphy, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 6628 bytes --] Hi Stan, Apologies for not directly answering - I was aiming at filling gaps in my knowledge that I could not find in the xfs.org wiki. My workload for the storage is mainly reads of single large files (ranging for 20GB to 100GB each) These reads are mainly linear (video playback, although not always as the end user may be jumping to different points in the video) There are concurrent reads required, estimated at 2 to 8, any more would be a bonus. The challenge of this would be that the reads need to be "real-time" operations as they are interacted with by a person, and each read operation would have to consistently have a low latency and obtain speeds of over 50Mb/s Disk write speeds are not *as* important for me - as they these files are copied to location before they are required (in this case using rsync or scp) and these operations do not require as much "real-time" interaction. On 27 September 2013 14:09, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 9/27/2013 7:23 AM, Stewart Webb wrote: > >> Right, and it does so not only to improve write performance, but to > >> also maximise sequential read performance of the data that is > >> written, especially when multiple files are being read > >> simultaneously and IO latency is important to keep low (e.g. > >> realtime video ingest and playout). > > > > So does this mean that I should avoid having devices in RAID with a > > differing amount of spindles (or non-parity disks) > > If I would like to use Linear concatenation LVM? Or is there a best > > practice if this instance is not > > avoidable? > > Above, Dave was correcting my oversight, not necessarily informing you, > per se. It seems clear from your follow up question that you didn't > really grasp what he was saying. Let's back up a little bit. > > What you need to concentrate on right now is the following which we > stated previously in the thread, but which you did not reply to: > > >>>> What really makes a difference as to whether alignment will be of > >>>> benefit to you, and how often, is your workload. So at this point, > you > >>>> need to describe the primary workload(s) of your systems we're > >> discussing. > >>> > >>> Yup, my thoughts exactly... > > This means you need to describe in detail how you are writing your > files, and how you are reading them back. I.e. what application are you > using, what does it do, etc. You stated IIRC that your workload is 80% > read. What types of files is it reading? Small, large? Is it reading > multiple files in parallel? How are these files originally written > before being read? Etc, etc. > > You may not understand why this is relevant, but it is the only thing > that is relevant, at this point. Spindles, RAID level, alignment, no > alignment...none of this matters if it doesn't match up with how your > application(s) do their IO. > > Rule #1 of storage architecture: Always build your storage stack (i.e. > disks, controller, driver, filesystem, etc) to fit the workload(s), not > the other way around. > > > > > On 27 September 2013 02:10, Stan Hoeppner <stan@hardwarefreak.com> > wrote: > > > >> On 9/26/2013 4:58 PM, Dave Chinner wrote: > >>> On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: > >>>> On 9/26/2013 3:55 AM, Stewart Webb wrote: > >>>>> Thanks for all this info Stan and Dave, > >>>>> > >>>>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is > >> the > >>>>>> amount of data written across the full RAID stripe (excluding > parity). > >>>>> > >>>>> The reason I stated Stripe size is because in this instance, I have > >> 3ware > >>>>> RAID controllers, which refer to > >>>>> this value as "Stripe" in their tw_cli software (god bless > >> manufacturers > >>>>> renaming everything) > >>>>> > >>>>> I do, however, have a follow-on question: > >>>>> On other systems, I have similar hardware: > >>>>> 3x Raid Controllers > >>>>> 1 of them has 10 disks as RAID 6 that I would like to add to a > logical > >>>>> volume > >>>>> 2 of them have 12 disks as a RAID 6 that I would like to add to the > >> same > >>>>> logical volume > >>>>> > >>>>> All have the same "Stripe" or "Strip Size" of 512 KB > >>>>> > >>>>> So if I where going to make 3 seperate xfs volumes, I would do the > >>>>> following: > >>>>> mkfs.xfs -d su=512k sw=8 /dev/sda > >>>>> mkfs.xfs -d su=512k sw=10 /dev/sdb > >>>>> mkfs.xfs -d su=512k sw=10 /dev/sdc > >>>>> > >>>>> I assume, If I where going to bring them all into 1 logical volume, > it > >>>>> would be best placed to have the sw value set > >>>>> to a value that is divisible by both 8 and 10 - in this case 2? > >>>> > >>>> No. In this case you do NOT stripe align XFS to the storage, because > >>>> it's impossible--the RAID stripes are dissimilar. In this case you > use > >>>> the default 4KB write out, as if this is a single disk drive. > >>>> > >>>> As Dave stated, if you format a concatenated device with XFS and you > >>>> desire to align XFS, then all constituent arrays must have the same > >>>> geometry. > >>>> > >>>> Two things to be aware of here: > >>>> > >>>> 1. With a decent hardware write caching RAID controller, having XFS > >>>> alined to the RAID geometry is a small optimization WRT overall write > >>>> performance, because the controller is going to be doing the > optimizing > >>>> of final writeback to the drives. > >>>> > >>>> 2. Alignment does not affect read performance. > >>> > >>> Ah, but it does... > >>> > >>>> 3. XFS only performs aligned writes during allocation. > >>> > >>> Right, and it does so not only to improve write performance, but to > >>> also maximise sequential read performance of the data that is > >>> written, especially when multiple files are being read > >>> simultaneously and IO latency is important to keep low (e.g. > >>> realtime video ingest and playout). > >> > >> Absolutely correct, as Dave always is. As my workloads are mostly > >> random, as are those of others I consult in other fora, I sometimes > >> forget the [multi]streaming case. Which is not good, as many folks > >> choose XFS specifically for [multi]streaming workloads. My remarks to > >> this audience should always reflect that. Apologies for my oversight on > >> this occasion. > >> > >>>> What really makes a difference as to whether alignment will be of > >>>> benefit to you, and how often, is your workload. So at this point, > you > >>>> need to describe the primary workload(s) of your systems we're > >> discussing. > >>> > >>> Yup, my thoughts exactly... > >>> > >>> Cheers, > >>> > >>> Dave. > >>> > >> > >> -- > >> Stan > >> > >> > > > > > > -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 8893 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-27 13:29 ` Stewart Webb @ 2013-09-28 14:54 ` Stan Hoeppner 2013-09-30 8:48 ` Stewart Webb 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2013-09-28 14:54 UTC (permalink / raw) To: Stewart Webb; +Cc: Chris Murphy, xfs@oss.sgi.com On 9/27/2013 8:29 AM, Stewart Webb wrote: > Hi Stan, > > Apologies for not directly answering - No problem, sorry for the late reply. > I was aiming at filling gaps in my knowledge that I could not find in the > xfs.org wiki. Hopefully this is occurring. :) > My workload for the storage is mainly reads of single large files (ranging > for 20GB to 100GB each) > These reads are mainly linear (video playback, although not always as the > end user may be jumping to different points in the video) > There are concurrent reads required, estimated at 2 to 8, any more would be > a bonus. This is the type of workload Dave described previously that should exhibit an increase in read performance if the files are written with alignment, especially with concurrent readers, which you describe as 2-8, maybe more. The number of "maybe more" is dictated by whether you're aligned. I.e. with alignment your odds of successfully serving more readers is much greater. Thus, if you need to stitch arrays together with LVM concatenation, you'd definitely benefit from making the geometry of all arrays identical, and aligning the filesystem to that geometry. I.e. same number of disks, same RAID level, same RAID stripe unit (data per non parity disk), and stripe width (#non parity disks). > The challenge of this would be that the reads need to be "real-time" > operations as they are interacted with by a person, and each > read operation would have to consistently have a low latency and obtain > speeds of over 50Mb/s > > Disk write speeds are not *as* important for me - as they these files are > copied to location before they are required (in this case > using rsync or scp) and these operations do not require as much "real-time" > interaction. > > > On 27 September 2013 14:09, Stan Hoeppner <stan@hardwarefreak.com> wrote: > >> On 9/27/2013 7:23 AM, Stewart Webb wrote: >>>> Right, and it does so not only to improve write performance, but to >>>> also maximise sequential read performance of the data that is >>>> written, especially when multiple files are being read >>>> simultaneously and IO latency is important to keep low (e.g. >>>> realtime video ingest and playout). >>> >>> So does this mean that I should avoid having devices in RAID with a >>> differing amount of spindles (or non-parity disks) >>> If I would like to use Linear concatenation LVM? Or is there a best >>> practice if this instance is not >>> avoidable? >> >> Above, Dave was correcting my oversight, not necessarily informing you, >> per se. It seems clear from your follow up question that you didn't >> really grasp what he was saying. Let's back up a little bit. >> >> What you need to concentrate on right now is the following which we >> stated previously in the thread, but which you did not reply to: >> >>>>>> What really makes a difference as to whether alignment will be of >>>>>> benefit to you, and how often, is your workload. So at this point, >> you >>>>>> need to describe the primary workload(s) of your systems we're >>>> discussing. >>>>> >>>>> Yup, my thoughts exactly... >> >> This means you need to describe in detail how you are writing your >> files, and how you are reading them back. I.e. what application are you >> using, what does it do, etc. You stated IIRC that your workload is 80% >> read. What types of files is it reading? Small, large? Is it reading >> multiple files in parallel? How are these files originally written >> before being read? Etc, etc. >> >> You may not understand why this is relevant, but it is the only thing >> that is relevant, at this point. Spindles, RAID level, alignment, no >> alignment...none of this matters if it doesn't match up with how your >> application(s) do their IO. >> >> Rule #1 of storage architecture: Always build your storage stack (i.e. >> disks, controller, driver, filesystem, etc) to fit the workload(s), not >> the other way around. >> >>> >>> On 27 September 2013 02:10, Stan Hoeppner <stan@hardwarefreak.com> >> wrote: >>> >>>> On 9/26/2013 4:58 PM, Dave Chinner wrote: >>>>> On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: >>>>>> On 9/26/2013 3:55 AM, Stewart Webb wrote: >>>>>>> Thanks for all this info Stan and Dave, >>>>>>> >>>>>>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This is >>>> the >>>>>>>> amount of data written across the full RAID stripe (excluding >> parity). >>>>>>> >>>>>>> The reason I stated Stripe size is because in this instance, I have >>>> 3ware >>>>>>> RAID controllers, which refer to >>>>>>> this value as "Stripe" in their tw_cli software (god bless >>>> manufacturers >>>>>>> renaming everything) >>>>>>> >>>>>>> I do, however, have a follow-on question: >>>>>>> On other systems, I have similar hardware: >>>>>>> 3x Raid Controllers >>>>>>> 1 of them has 10 disks as RAID 6 that I would like to add to a >> logical >>>>>>> volume >>>>>>> 2 of them have 12 disks as a RAID 6 that I would like to add to the >>>> same >>>>>>> logical volume >>>>>>> >>>>>>> All have the same "Stripe" or "Strip Size" of 512 KB >>>>>>> >>>>>>> So if I where going to make 3 seperate xfs volumes, I would do the >>>>>>> following: >>>>>>> mkfs.xfs -d su=512k sw=8 /dev/sda >>>>>>> mkfs.xfs -d su=512k sw=10 /dev/sdb >>>>>>> mkfs.xfs -d su=512k sw=10 /dev/sdc >>>>>>> >>>>>>> I assume, If I where going to bring them all into 1 logical volume, >> it >>>>>>> would be best placed to have the sw value set >>>>>>> to a value that is divisible by both 8 and 10 - in this case 2? >>>>>> >>>>>> No. In this case you do NOT stripe align XFS to the storage, because >>>>>> it's impossible--the RAID stripes are dissimilar. In this case you >> use >>>>>> the default 4KB write out, as if this is a single disk drive. >>>>>> >>>>>> As Dave stated, if you format a concatenated device with XFS and you >>>>>> desire to align XFS, then all constituent arrays must have the same >>>>>> geometry. >>>>>> >>>>>> Two things to be aware of here: >>>>>> >>>>>> 1. With a decent hardware write caching RAID controller, having XFS >>>>>> alined to the RAID geometry is a small optimization WRT overall write >>>>>> performance, because the controller is going to be doing the >> optimizing >>>>>> of final writeback to the drives. >>>>>> >>>>>> 2. Alignment does not affect read performance. >>>>> >>>>> Ah, but it does... >>>>> >>>>>> 3. XFS only performs aligned writes during allocation. >>>>> >>>>> Right, and it does so not only to improve write performance, but to >>>>> also maximise sequential read performance of the data that is >>>>> written, especially when multiple files are being read >>>>> simultaneously and IO latency is important to keep low (e.g. >>>>> realtime video ingest and playout). >>>> >>>> Absolutely correct, as Dave always is. As my workloads are mostly >>>> random, as are those of others I consult in other fora, I sometimes >>>> forget the [multi]streaming case. Which is not good, as many folks >>>> choose XFS specifically for [multi]streaming workloads. My remarks to >>>> this audience should always reflect that. Apologies for my oversight on >>>> this occasion. >>>> >>>>>> What really makes a difference as to whether alignment will be of >>>>>> benefit to you, and how often, is your workload. So at this point, >> you >>>>>> need to describe the primary workload(s) of your systems we're >>>> discussing. >>>>> >>>>> Yup, my thoughts exactly... >>>>> >>>>> Cheers, >>>>> >>>>> Dave. >>>>> >>>> >>>> -- >>>> Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: xfs hardware RAID alignment over linear lvm 2013-09-28 14:54 ` Stan Hoeppner @ 2013-09-30 8:48 ` Stewart Webb 0 siblings, 0 replies; 17+ messages in thread From: Stewart Webb @ 2013-09-30 8:48 UTC (permalink / raw) To: stan; +Cc: Chris Murphy, xfs@oss.sgi.com [-- Attachment #1.1: Type: text/plain, Size: 7990 bytes --] Ok, Thanks Stan Much appreciated On 28 September 2013 15:54, Stan Hoeppner <stan@hardwarefreak.com> wrote: > On 9/27/2013 8:29 AM, Stewart Webb wrote: > > Hi Stan, > > > > Apologies for not directly answering - > > No problem, sorry for the late reply. > > > I was aiming at filling gaps in my knowledge that I could not find in the > > xfs.org wiki. > > Hopefully this is occurring. :) > > > My workload for the storage is mainly reads of single large files > (ranging > > for 20GB to 100GB each) > > These reads are mainly linear (video playback, although not always as the > > end user may be jumping to different points in the video) > > There are concurrent reads required, estimated at 2 to 8, any more would > be > > a bonus. > > This is the type of workload Dave described previously that should > exhibit an increase in read performance if the files are written with > alignment, especially with concurrent readers, which you describe as > 2-8, maybe more. The number of "maybe more" is dictated by whether > you're aligned. I.e. with alignment your odds of successfully serving > more readers is much greater. > > Thus, if you need to stitch arrays together with LVM concatenation, > you'd definitely benefit from making the geometry of all arrays > identical, and aligning the filesystem to that geometry. I.e. same > number of disks, same RAID level, same RAID stripe unit (data per non > parity disk), and stripe width (#non parity disks). > > > The challenge of this would be that the reads need to be "real-time" > > operations as they are interacted with by a person, and each > > read operation would have to consistently have a low latency and obtain > > speeds of over 50Mb/s > > > > Disk write speeds are not *as* important for me - as they these files are > > copied to location before they are required (in this case > > using rsync or scp) and these operations do not require as much > "real-time" > > interaction. > > > > > > On 27 September 2013 14:09, Stan Hoeppner <stan@hardwarefreak.com> > wrote: > > > >> On 9/27/2013 7:23 AM, Stewart Webb wrote: > >>>> Right, and it does so not only to improve write performance, but to > >>>> also maximise sequential read performance of the data that is > >>>> written, especially when multiple files are being read > >>>> simultaneously and IO latency is important to keep low (e.g. > >>>> realtime video ingest and playout). > >>> > >>> So does this mean that I should avoid having devices in RAID with a > >>> differing amount of spindles (or non-parity disks) > >>> If I would like to use Linear concatenation LVM? Or is there a best > >>> practice if this instance is not > >>> avoidable? > >> > >> Above, Dave was correcting my oversight, not necessarily informing you, > >> per se. It seems clear from your follow up question that you didn't > >> really grasp what he was saying. Let's back up a little bit. > >> > >> What you need to concentrate on right now is the following which we > >> stated previously in the thread, but which you did not reply to: > >> > >>>>>> What really makes a difference as to whether alignment will be of > >>>>>> benefit to you, and how often, is your workload. So at this point, > >> you > >>>>>> need to describe the primary workload(s) of your systems we're > >>>> discussing. > >>>>> > >>>>> Yup, my thoughts exactly... > >> > >> This means you need to describe in detail how you are writing your > >> files, and how you are reading them back. I.e. what application are you > >> using, what does it do, etc. You stated IIRC that your workload is 80% > >> read. What types of files is it reading? Small, large? Is it reading > >> multiple files in parallel? How are these files originally written > >> before being read? Etc, etc. > >> > >> You may not understand why this is relevant, but it is the only thing > >> that is relevant, at this point. Spindles, RAID level, alignment, no > >> alignment...none of this matters if it doesn't match up with how your > >> application(s) do their IO. > >> > >> Rule #1 of storage architecture: Always build your storage stack (i.e. > >> disks, controller, driver, filesystem, etc) to fit the workload(s), not > >> the other way around. > >> > >>> > >>> On 27 September 2013 02:10, Stan Hoeppner <stan@hardwarefreak.com> > >> wrote: > >>> > >>>> On 9/26/2013 4:58 PM, Dave Chinner wrote: > >>>>> On Thu, Sep 26, 2013 at 04:22:30AM -0500, Stan Hoeppner wrote: > >>>>>> On 9/26/2013 3:55 AM, Stewart Webb wrote: > >>>>>>> Thanks for all this info Stan and Dave, > >>>>>>> > >>>>>>>> "Stripe size" is a synonym of XFS sw, which is su * #disks. This > is > >>>> the > >>>>>>>> amount of data written across the full RAID stripe (excluding > >> parity). > >>>>>>> > >>>>>>> The reason I stated Stripe size is because in this instance, I have > >>>> 3ware > >>>>>>> RAID controllers, which refer to > >>>>>>> this value as "Stripe" in their tw_cli software (god bless > >>>> manufacturers > >>>>>>> renaming everything) > >>>>>>> > >>>>>>> I do, however, have a follow-on question: > >>>>>>> On other systems, I have similar hardware: > >>>>>>> 3x Raid Controllers > >>>>>>> 1 of them has 10 disks as RAID 6 that I would like to add to a > >> logical > >>>>>>> volume > >>>>>>> 2 of them have 12 disks as a RAID 6 that I would like to add to the > >>>> same > >>>>>>> logical volume > >>>>>>> > >>>>>>> All have the same "Stripe" or "Strip Size" of 512 KB > >>>>>>> > >>>>>>> So if I where going to make 3 seperate xfs volumes, I would do the > >>>>>>> following: > >>>>>>> mkfs.xfs -d su=512k sw=8 /dev/sda > >>>>>>> mkfs.xfs -d su=512k sw=10 /dev/sdb > >>>>>>> mkfs.xfs -d su=512k sw=10 /dev/sdc > >>>>>>> > >>>>>>> I assume, If I where going to bring them all into 1 logical volume, > >> it > >>>>>>> would be best placed to have the sw value set > >>>>>>> to a value that is divisible by both 8 and 10 - in this case 2? > >>>>>> > >>>>>> No. In this case you do NOT stripe align XFS to the storage, > because > >>>>>> it's impossible--the RAID stripes are dissimilar. In this case you > >> use > >>>>>> the default 4KB write out, as if this is a single disk drive. > >>>>>> > >>>>>> As Dave stated, if you format a concatenated device with XFS and you > >>>>>> desire to align XFS, then all constituent arrays must have the same > >>>>>> geometry. > >>>>>> > >>>>>> Two things to be aware of here: > >>>>>> > >>>>>> 1. With a decent hardware write caching RAID controller, having XFS > >>>>>> alined to the RAID geometry is a small optimization WRT overall > write > >>>>>> performance, because the controller is going to be doing the > >> optimizing > >>>>>> of final writeback to the drives. > >>>>>> > >>>>>> 2. Alignment does not affect read performance. > >>>>> > >>>>> Ah, but it does... > >>>>> > >>>>>> 3. XFS only performs aligned writes during allocation. > >>>>> > >>>>> Right, and it does so not only to improve write performance, but to > >>>>> also maximise sequential read performance of the data that is > >>>>> written, especially when multiple files are being read > >>>>> simultaneously and IO latency is important to keep low (e.g. > >>>>> realtime video ingest and playout). > >>>> > >>>> Absolutely correct, as Dave always is. As my workloads are mostly > >>>> random, as are those of others I consult in other fora, I sometimes > >>>> forget the [multi]streaming case. Which is not good, as many folks > >>>> choose XFS specifically for [multi]streaming workloads. My remarks to > >>>> this audience should always reflect that. Apologies for my oversight > on > >>>> this occasion. > >>>> > >>>>>> What really makes a difference as to whether alignment will be of > >>>>>> benefit to you, and how often, is your workload. So at this point, > >> you > >>>>>> need to describe the primary workload(s) of your systems we're > >>>> discussing. > >>>>> > >>>>> Yup, my thoughts exactly... > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Dave. > >>>>> > >>>> > >>>> -- > >>>> Stan > > -- Stewart Webb [-- Attachment #1.2: Type: text/html, Size: 11306 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-09-30 8:49 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-25 12:56 xfs hardware RAID alignment over linear lvm Stewart Webb 2013-09-25 21:18 ` Stan Hoeppner 2013-09-25 21:34 ` Chris Murphy 2013-09-25 21:48 ` Stan Hoeppner 2013-09-25 21:53 ` Chris Murphy 2013-09-25 21:57 ` Dave Chinner 2013-09-26 8:44 ` Stan Hoeppner 2013-09-26 8:55 ` Stewart Webb 2013-09-26 9:22 ` Stan Hoeppner 2013-09-26 9:28 ` Stewart Webb 2013-09-26 21:58 ` Dave Chinner 2013-09-27 1:10 ` Stan Hoeppner 2013-09-27 12:23 ` Stewart Webb 2013-09-27 13:09 ` Stan Hoeppner 2013-09-27 13:29 ` Stewart Webb 2013-09-28 14:54 ` Stan Hoeppner 2013-09-30 8:48 ` Stewart Webb
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox