* Optimal XFS formatting options? @ 2012-01-14 17:44 MikeJeezy 2012-01-14 22:23 ` Stan Hoeppner 2012-01-15 1:14 ` Peter Grandi 0 siblings, 2 replies; 17+ messages in thread From: MikeJeezy @ 2012-01-14 17:44 UTC (permalink / raw) To: xfs Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 TB SATA disks (4.9T is only one of the logical volumes). It will contain several million files of various sizes, but 80% of them will be less than 50 MB. I'm a novice at best and I usually just use the default #mkfs.xfs /dev/sdx1 This is server will be write heavy for about 8 hours a night, but every morning there are many reads to the disk. There is rarely a time where it will be write heavy and read heavy at the same time. Are there other XFS format options that I could use to optimize performance? Any input is greatly appreciated. Thank you. -- View this message in context: http://old.nabble.com/Optimal-XFS-formatting-options--tp33140169p33140169.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy @ 2012-01-14 22:23 ` Stan Hoeppner 2012-01-16 0:27 ` MikeJeezy 2012-01-15 1:14 ` Peter Grandi 1 sibling, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2012-01-14 22:23 UTC (permalink / raw) To: xfs On 1/14/2012 11:44 AM, MikeJeezy wrote: > > Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 TB SATA disks > (4.9T is only one of the logical volumes). It will contain several million > files of various sizes, but 80% of them will be less than 50 MB. I'm a > novice at best and I usually just use the default #mkfs.xfs /dev/sdx1 > > This is server will be write heavy for about 8 hours a night, but every > morning there are many reads to the disk. There is rarely a time where it > will be write heavy and read heavy at the same time. Are there other XFS > format options that I could use to optimize performance? sunit=value This is used to specify the stripe unit for a RAID device or a logical volume. The value has to be specified in 512-byte block units. Use the su suboption to specify the stripe unit size in bytes. This suboption ensures that data allocations will be stripe unit aligned when the current end of file is being extended and the file size is larger than 512KiB. Also inode allocations and the internal log will be stripe unit aligned. su=value This is an alternative to using sunit. The su suboption is used to specify the stripe unit for a RAID device or a striped logical volume. The value has to be specified in bytes, (usually using the m or g suffixes). This value must be a multiple of the filesystem block size. swidth=value This is used to specify the stripe width for a RAID device or a striped logical volume. The value has to be specified in 512-byte block units. Use the sw suboption to specify the stripe width size in bytes. This suboption is required if -d sunit has been specified and it has to be a multiple of the -d sunit suboption. sw=value suboption is an alternative to using swidth. The sw suboption is used to specify the stripe width for a RAID device or striped logical volume. The value is expressed as a multiplier of the stripe unit, usually the same as the number of stripe members in the logical volume configuration, or data disks in a RAID device. Using su and sw is often easier due to less conversions. With a 12 drive RAID6 array your stripe width, or sw, is 10. You will need to consult the array controller admin interface and documentation to discover the su value if you don't already know it. Different vendors call this parameter by different names. It could be "chunk size" or "strip size" or other. Some/many vendors don't specify this value at all, giving you only static pre-defined total stripe size options for the array, such as 64KB, 128KB, 1MB, etc, only in power of 2 values. In this case if you have 64KB stripe size and divide by 10 drives in the stripe you end up with a non filesystem block size multiple: 6553.6 bytes. This presents serious problems for alignment. In this case you must dig deep to find out exactly how your vendor controller handles this situation when your effective RAID spindle count is not a power of 2. So let's assume your vendor does the smart thing and allows you flexibility in specifying per drive strip size. Assume for example the stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe spindles (12-2=10), and the local device name of the LUN is /dev/sdb. To create an aligned XFS filesystem on this you would use something like: $ mkfs.xfs -d su=64k sw=10 /dev/sdb When using vendor array hardware that only allows one to define what XFS calls swidth, it is best to use a power of 2 stripe spindle count to get proper alignment. If you use a non power of 2 stripe spindle count the vendor firmware will either round down or round up to create the stripe unit size, and this formula is often not documented. With such vendor hardware, for a RAID6 array you would want to have 6, 10, or 18 total drives in the array, giving you 4, 8, or 16 stripe spindles. Alternatively, you need to know exactly how the firmware rounds up or down to arrive at the strip block size (sunit). If you find yourself in such a situation, and are unable to determine the strip size the array firmware is using, you may be better off using the mkfs.xfs defaults, vs guessing and ending up with unaligned writes. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-14 22:23 ` Stan Hoeppner @ 2012-01-16 0:27 ` MikeJeezy 2012-01-16 4:56 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: MikeJeezy @ 2012-01-16 0:27 UTC (permalink / raw) To: xfs >So let's assume your vendor does the smart thing and allows you >flexibility in specifying per drive strip size. Assume for example the >stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe >spindles (12-2=10), and the local device name of the LUN is /dev/sdb. >To create an aligned XFS filesystem on this you would use something like: >$ mkfs.xfs -d su=64k sw=10 /dev/sdb Great explanations! (some of it I am still trying to understand :-) In this case on my HP P2000 G3, I do have a 64k chunk size so I will do: $ mkfs.xfs -d su=64k,sw=10 /dev/sdd Question: Does the above command assume I do not already have a partition created? I was http://www.fhgfs.com/wiki/wikka.php?wakka=PartitionAlignment reading here that the easiest way to acheive partition alignment is to create the file system directly on the storage device without any paritions - such as $ mkfs.xfs /dev/sdd (and your example above also hints at this) When I created my current partiton, I used the following commands: $ parted -a optimal /dev/sdd $ mklabel gpt $ mkpart primary 0 -0 $ q I would like to align the partiton as well, but I am not sure how to acheive this using parted. This will be the only partition on the LUN, so not sure if I even need to create one (although I do like to stay consistent with my other volumes). When printing the partition info with parted I see: # (parted) p Model: HP P2000 G3 iSCSI (scsi) Disk /dev/sdd: 4900GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 1049kB 4900GB 4900GB xfs primary but from reading, I suspect the Sector size should be more like: (logical/physical): 512B/65536B. Any thoughts on partition alignment or other thoughts in general? Thank you. -- View this message in context: http://old.nabble.com/Optimal-XFS-formatting-options--tp33140169p33145068.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-16 0:27 ` MikeJeezy @ 2012-01-16 4:56 ` Stan Hoeppner 2012-01-16 23:11 ` Dave Chinner 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2012-01-16 4:56 UTC (permalink / raw) To: xfs On 1/15/2012 6:27 PM, MikeJeezy wrote: > >> So let's assume your vendor does the smart thing and allows you >> flexibility in specifying per drive strip size. Assume for example the >> stripe unit (strip, chunk) of the array is 64KB, there are 10 stripe >> spindles (12-2=10), and the local device name of the LUN is /dev/sdb. >> To create an aligned XFS filesystem on this you would use something like: > >> $ mkfs.xfs -d su=64k sw=10 /dev/sdb > > Great explanations! (some of it I am still trying to understand :-) In this > case on my HP P2000 G3, I do have a 64k chunk size so I will do: > > $ mkfs.xfs -d su=64k,sw=10 /dev/sdd That should be fine. > Question: Does the above command assume I do not already have a partition > created? I was > http://www.fhgfs.com/wiki/wikka.php?wakka=PartitionAlignment reading here > that the easiest way to acheive partition alignment is to create the file > system directly on the storage device without any paritions - such as $ > mkfs.xfs /dev/sdd (and your example above also hints at this) That example and command assume you're not using partitions. > When I created my current partiton, I used the following commands: > > $ parted -a optimal /dev/sdd > $ mklabel gpt > $ mkpart primary 0 -0 > $ q > > I would like to align the partiton as well, but I am not sure how to acheive > this using parted. This will be the only partition on the LUN, so not sure > if I even need to create one (although I do like to stay consistent with my > other volumes). If your drives have 512 byte physical sectors (not advanced format drives with 4096 byte sectors) then there is no need to worry about partition alignment. And in fact, if you plan to put a single filesystem on this entire 4.9TB virtual drive, you don't need to partition the disk device at all. Recall the dictionary definition of "partition". You're not dividing the whole into smaller pieces here. > When printing the partition info with parted I see: > > # (parted) p > Model: HP P2000 G3 iSCSI (scsi) > Disk /dev/sdd: 4900GB > Sector size (logical/physical): 512B/512B > Partition Table: gpt > > Number Start End Size File system Name Flags > 1 1049kB 4900GB 4900GB xfs primary > > but from reading, I suspect the Sector size should be more like: > (logical/physical): 512B/65536B. No, that 65536 figure is wrong. Their are only two possibilities for sector size (logical/physical): 512/512 and 512/4096. These are the only two disk sector formats currently used on disk drives. Partitioning utils look strictly at disk parameters, not RAID parameters. Sectors deal with how many books (bytes) fit on each shelf (sector) in the library, and which shelf (sector) we're going to store a given set of books (bytes) on. RAID parameters, such as stripe unit, deal with how many shelves (sectors) worth of books (bytes) we can carry most efficiently down the isle and place on the shelves at one time. In short, sectors are a destination where we store bytes, much like books on a shelf. A stripe unit acts as a book cart in which we carry a fixed number of books, allowing us to fill a fixed number of shelves most efficiently per cart transported down the isle. > Any thoughts on partition alignment or > other thoughts in general? Thank you. Yes, don't use partitions if you don't need to divide your disk device (LUN/virtual disk) into multiple pieces. Now, if you need to make use of snapshots or other volume management features, you may want to create an LVM device on top of the disk device (LUN) and then make your XFS on top of the LVM device. If you have no need for LVM features, I'd say directly format the LUN with XFS, no partition table necessary. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-16 4:56 ` Stan Hoeppner @ 2012-01-16 23:11 ` Dave Chinner 2012-01-17 3:31 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: Dave Chinner @ 2012-01-16 23:11 UTC (permalink / raw) To: Stan Hoeppner; +Cc: xfs On Sun, Jan 15, 2012 at 10:56:22PM -0600, Stan Hoeppner wrote: > On 1/15/2012 6:27 PM, MikeJeezy wrote: > > I would like to align the partiton as well, but I am not sure how to acheive > > this using parted. This will be the only partition on the LUN, so not sure > > if I even need to create one (although I do like to stay consistent with my > > other volumes). > > If your drives have 512 byte physical sectors (not advanced format > drives with 4096 byte sectors) then there is no need to worry about > partition alignment. That is incorrect. Partitions need to be aligned to the underlying stripe configuration, regardless of the sector size of the drives that make up the stripe. If you do not align the partition to the stripe, then the filesystem will be unaligned no matter how you configure it. Every layer of the storage stack under the filesystem needs to be correctly aligned and sized for filesystem alignment to make any difference to performance. > > Any thoughts on partition alignment or > > other thoughts in general? Thank you. > > Yes, don't use partitions if you don't need to divide your disk device > (LUN/virtual disk) into multiple pieces. Now, if you need to make use > of snapshots or other volume management features, you may want to create > an LVM device on top of the disk device (LUN) and then make your XFS on > top of the LVM device. If you have no need for LVM features, I'd say > directly format the LUN with XFS, no partition table necessary. If you use LVM, then you need to ensure that it is slicing up the device in a manner that is aligned correctly to the underlying stripe, just like if you are using partitions to provide the same functionality. Different technologies, same problem. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-16 23:11 ` Dave Chinner @ 2012-01-17 3:31 ` Stan Hoeppner 2012-01-17 9:19 ` Michael Monnerie 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2012-01-17 3:31 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs On 1/16/2012 5:11 PM, Dave Chinner wrote: > On Sun, Jan 15, 2012 at 10:56:22PM -0600, Stan Hoeppner wrote: >> On 1/15/2012 6:27 PM, MikeJeezy wrote: >>> I would like to align the partiton as well, but I am not sure how to acheive >>> this using parted. This will be the only partition on the LUN, so not sure >>> if I even need to create one (although I do like to stay consistent with my >>> other volumes). >> >> If your drives have 512 byte physical sectors (not advanced format >> drives with 4096 byte sectors) then there is no need to worry about >> partition alignment. > > That is incorrect. Partitions need to be aligned to the underlying > stripe configuration, regardless of the sector size of the drives > that make up the stripe. If you do not align the partition to the > stripe, then the filesystem will be unaligned no matter how you > configure it. Every layer of the storage stack under the filesystem > needs to be correctly aligned and sized for filesystem alignment to > make any difference to performance. Thanks for the correction/reminder Dave. So in this case the first sector of the first partition would need to reside at LBA1280 in this array (655360 byte stripe width, 1280 sectors/stripe), as the partition table itself is going to occupy some sectors at the beginning of the first stripe. By creating the partition at LBA1280 we make sure the first sector of the XFS filesystem is aligned with the first sector of the 2nd stripe. This exercise demonstrates why it's often preferable to directly format the LUN. If you don't have a _need_ for a partition table, such as cloning/backup software that works at the partition level, or something of that nature, avoid partitions. >>> Any thoughts on partition alignment or >>> other thoughts in general? Thank you. >> >> Yes, don't use partitions if you don't need to divide your disk device >> (LUN/virtual disk) into multiple pieces. Now, if you need to make use >> of snapshots or other volume management features, you may want to create >> an LVM device on top of the disk device (LUN) and then make your XFS on >> top of the LVM device. If you have no need for LVM features, I'd say >> directly format the LUN with XFS, no partition table necessary. > > If you use LVM, then you need to ensure that it is slicing up the > device in a manner that is aligned correctly to the underlying > stripe, just like if you are using partitions to provide the same > functionality. Different technologies, same problem. If he's doing a single LVM volume then alignment should be automatic during mkfs.xfs shouldn't it? -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-17 3:31 ` Stan Hoeppner @ 2012-01-17 9:19 ` Michael Monnerie 2012-01-17 11:17 ` Emmanuel Florac 2012-01-17 11:34 ` Stan Hoeppner 0 siblings, 2 replies; 17+ messages in thread From: Michael Monnerie @ 2012-01-17 9:19 UTC (permalink / raw) To: xfs, stan [-- Attachment #1.1: Type: Text/Plain, Size: 1379 bytes --] On Dienstag, 17. Januar 2012 Stan Hoeppner wrote: > Thanks for the correction/reminder Dave. So in this case the first > sector of the first partition would need to reside at LBA1280 in this > array (655360 byte stripe width, 1280 sectors/stripe), as the > partition table itself is going to occupy some sectors at the > beginning of the first stripe. By creating the partition at LBA1280 > we make sure the first sector of the XFS filesystem is aligned with > the first sector of the 2nd stripe. There's one big problem with that: Many people will sooner or later expand and existing array. If you add one drive, all your nice stripe width alignment becomes bogus, and suddenly your performance will drop. There's no real way out of that, but three solutions come to my mind: - backup before expand/restore after expand with new alignment - leave existing data, just change mount options so after expansion at least new files are going to be aligned to the new stripe width. - expand array by factors of two. So if you have 10 data drives, add 10 data drives. But that creates other problems (probability of single drive failure + time to recover a single broken disk) -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-17 9:19 ` Michael Monnerie @ 2012-01-17 11:17 ` Emmanuel Florac 2012-01-17 11:34 ` Stan Hoeppner 1 sibling, 0 replies; 17+ messages in thread From: Emmanuel Florac @ 2012-01-17 11:17 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 854 bytes --] Le Tue, 17 Jan 2012 10:19:55 +0100 Michael Monnerie <michael.monnerie@is.it-management.at> écrivait: > - expand array by factors of two. So if you have 10 data drives, add > 10 data drives. But that creates other problems (probability of > single drive failure + time to recover a single broken disk) From my experience 20 drives is OK for RAID-6. And rebuild time doesn't change much with array size, anyway. Misaligned partitions, on the other hand, can easily halve array throughput from my own measurements. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 197 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-17 9:19 ` Michael Monnerie 2012-01-17 11:17 ` Emmanuel Florac @ 2012-01-17 11:34 ` Stan Hoeppner 2012-01-20 15:52 ` Michael Monnerie 1 sibling, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2012-01-17 11:34 UTC (permalink / raw) To: Michael Monnerie; +Cc: xfs On 1/17/2012 3:19 AM, Michael Monnerie wrote: > On Dienstag, 17. Januar 2012 Stan Hoeppner wrote: >> Thanks for the correction/reminder Dave. So in this case the first >> sector of the first partition would need to reside at LBA1280 in this >> array (655360 byte stripe width, 1280 sectors/stripe), as the >> partition table itself is going to occupy some sectors at the >> beginning of the first stripe. By creating the partition at LBA1280 >> we make sure the first sector of the XFS filesystem is aligned with >> the first sector of the 2nd stripe. > > There's one big problem with that: Many people will sooner or later > expand and existing array. If you add one drive, all your nice stripe > width alignment becomes bogus, and suddenly your performance will drop. So to be clear, your issue with the above isn't with my partition alignment math WRT the OP's P2000 array, but is with using XFS stripe alignment in general, correct? > There's no real way out of that, but three solutions come to my mind: > - backup before expand/restore after expand with new alignment > - leave existing data, just change mount options so after expansion at > least new files are going to be aligned to the new stripe width. > - expand array by factors of two. So if you have 10 data drives, add 10 > data drives. But that creates other problems (probability of single > drive failure + time to recover a single broken disk) There is one really simple way around this issue you describe: don't add drives to an existing array. Simply create another array with new disks, create a new aligned XFS on the array, and mount the filesystem in an appropriate location. There is no 11th Commandment stating one must have a single massive XFS atop all of one's disks. ;) There is little to no application software today that can't be configured to store its data files across multiple directories. So there's no need to box oneself into the corner you describe above. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-17 11:34 ` Stan Hoeppner @ 2012-01-20 15:52 ` Michael Monnerie 2012-01-20 22:44 ` Stan Hoeppner 0 siblings, 1 reply; 17+ messages in thread From: Michael Monnerie @ 2012-01-20 15:52 UTC (permalink / raw) To: xfs, stan [-- Attachment #1.1: Type: Text/Plain, Size: 1378 bytes --] On Dienstag, 17. Januar 2012 Stan Hoeppner wrote: > So to be clear, your issue with the above isn't with my partition > alignment math WRT the OP's P2000 array, but is with using XFS stripe > alignment in general, correct? Yes. I just wanted to document this as people often expand RAIDs and forget to apply the changes to stripe width. > There is one really simple way around this issue you describe: don't > add drives to an existing array. Simply create another array with > new disks, create a new aligned XFS on the array, and mount the > filesystem in an appropriate location. There is no 11th Commandment > stating one must have a single massive XFS atop all of one's disks. > ;) > > There is little to no application software today that can't be > configured to store its data files across multiple directories. So > there's no need to box oneself into the corner you describe above. It's a management burden to do that. I've learned that systems usually are strictly structured in their configuration, so it's often better to extend a RAID and to keep the config, as this is cheaper in the end. At least for the salaries of good admins here in Europe ;-) -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-20 15:52 ` Michael Monnerie @ 2012-01-20 22:44 ` Stan Hoeppner 2012-01-24 10:31 ` Michael Monnerie 0 siblings, 1 reply; 17+ messages in thread From: Stan Hoeppner @ 2012-01-20 22:44 UTC (permalink / raw) To: xfs On 1/20/2012 9:52 AM, Michael Monnerie wrote: > On Dienstag, 17. Januar 2012 Stan Hoeppner wrote: >> So to be clear, your issue with the above isn't with my partition >> alignment math WRT the OP's P2000 array, but is with using XFS stripe >> alignment in general, correct? > > Yes. I just wanted to document this as people often expand RAIDs and > forget to apply the changes to stripe width. > >> There is one really simple way around this issue you describe: don't >> add drives to an existing array. Simply create another array with >> new disks, create a new aligned XFS on the array, and mount the >> filesystem in an appropriate location. There is no 11th Commandment >> stating one must have a single massive XFS atop all of one's disks. >> ;) >> >> There is little to no application software today that can't be >> configured to store its data files across multiple directories. So >> there's no need to box oneself into the corner you describe above. > > It's a management burden to do that. I've learned that systems usually > are strictly structured in their configuration, so it's often better to > extend a RAID and to keep the config, as this is cheaper in the end. At > least for the salaries of good admins here in Europe ;-) If ease (or cost) of filesystem administration is of that much greater priority than performance, then why are you using XFS in the first place instead of EXT? -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-20 22:44 ` Stan Hoeppner @ 2012-01-24 10:31 ` Michael Monnerie 0 siblings, 0 replies; 17+ messages in thread From: Michael Monnerie @ 2012-01-24 10:31 UTC (permalink / raw) To: xfs, stan [-- Attachment #1.1: Type: Text/Plain, Size: 825 bytes --] On Freitag, 20. Januar 2012 Stan Hoeppner wrote: > If ease (or cost) of filesystem administration is of that much > greater priority than performance, then why are you using XFS in the > first place instead of EXT? Great experience in recovery of disaster filesystem problems on XFS. A switch to another FS costs a lot of time, and why switch if it works great? And administration comes down to mkfs, mount, maybe xfs_fsr, in disaster xfs_repair, and sometimes xfs_growfs. Basically nothing. Also, this list has been of great help during the years, whenever there were problems they got fixed. That's ease of administration :-) -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy 2012-01-14 22:23 ` Stan Hoeppner @ 2012-01-15 1:14 ` Peter Grandi 2012-01-20 9:03 ` Linda Walsh 1 sibling, 1 reply; 17+ messages in thread From: Peter Grandi @ 2012-01-15 1:14 UTC (permalink / raw) To: Linux fs XFS [ ... ] > Hi, I have a 4.9 TB iSCSI LUN on a RAID 6 array with twelve 2 > TB SATA disks (4.9T is only one of the logical volumes). It > will contain several million files of various sizes, but 80% > of them will be less than 50 MB. I'm a novice at best and I > usually just use the default #mkfs.xfs /dev/sdx1 The default :-) advice in this list and in the XFS FAQ is that in any recent edition of the XFS tools and XFS code in the kernel the defaults are usually best, unless you have a special situation, for example if the kernel cannot get storage geometry from the storage layer. Also, "several million" in a about 5,000,000MB filesystem indicates an average file size of 1MB. That's not too small, fortunately. Anyhow consider how long it will take to 'fsck' all that if it gets damaged, or the extra load to backup the whole filetree if backups scan the tree (e.g. RYNC based). > This is server will be write heavy for about 8 hours a night, > but every morning there are many reads to the disk. There is > rarely a time where it will be write heavy and read heavy at > the same time. Are there other XFS format options that I > could use to optimize performance? Any input is greatly > appreciated. Thank you. As usual, the first note is that in general RAID6 is a bad idea, with RMW and reliability (especially during rebuild) issues, but salesmen and management usually love it because it embodies a promise of something for nothing (let's say that the parity RAID industry is the Wall Street of storage system :->). To mitigate problems In general if you are doing a lot of writing it is very important that the filesystem try to align to address/length of the full RAID stripe, but this should be automatic if the relevant geometry is reported to the Linux kernel. Otherwise thee are many previous messages in this list about that, and the FAQ etc. Things that you might want to double check in case they matter for you, as to not-'mkfs' options: * XFS has several limitations on 32b kernels. Just make sure you have a 64b kernel. * Make really sure your partitions (or LUNs if unpartitioned) are aligned, certainly to a multiple of stripe size, ideally to something larg, at least like 1MiB. * Recent (let's say at least 2.6.32 or EL57) kernels and editions of XFS tools and partitioning tools (if you use any) are very improved. The newer usually the better. * Usually just in case explicitly specify at 'mount' (not 'mkfs') time the 'inode64' option; and the 'barrier' option unless you really know better (and pray hard that your storage layer supports it). The 'delaylog' option or its opposite are also something to look carefully into. * Check carefully whether your app is compatible with the 'noatime' and 'nodiratime' options and enable them if possible, "just in case" :-). * Look very attentively at the kernel page cache flusher parameters to make it run more often (tom prevent the accumulation of very large gulps of unwritten data) but not too often (to give a chance to the delayed allocator). As to proper 'mkfs' you may want to look into: * Explicitly set the sector size because most storage layers lie. In general if possible you should set it to 4096, just in case :-). This also allegedly extends the range where inodes can be stored if you cannot specify 'inode64' at mount time. * If you have a critically high rate of metadata work (like file creation/deletion, and it seems your case overnight) you may want to ensure that your log is not only aligned, but perhaps on a separate device, and/or you have a host adapter with a large battery backed cache. Logs are small, so it should be easy either way. * Depending on the degree of multihtreading of your application you may want more/less AGs, but usually on a 4.9TB filetree there will be plenty. * You may want larger inodes than the default if you have lots of ACLs or your files are written slowly and thus have many extents. They are recommended also for small files but I cannot remember whether XFS really stores small files or directories into the inode (I remember that directories of less than 8 entries are stored in the inode, but I don't know whether depends on its size). Run first 'mfs.fs -N ....' so it will print out which parameters it will use without actually doing anything. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-15 1:14 ` Peter Grandi @ 2012-01-20 9:03 ` Linda Walsh 2012-01-20 12:06 ` Peter Grandi 0 siblings, 1 reply; 17+ messages in thread From: Linda Walsh @ 2012-01-20 9:03 UTC (permalink / raw) To: Linux fs XFS Peter Grandi wrote: > > > * XFS has several limitations on 32b kernels. Just make sure > you have a 64b kernel. ---- I was unaware that the block size was larger on 64b kernels. Is that what you are referring to ? (would be nice)... One thing I have a Q on -- you (OP), said this was an 'iscsi' box? That means hookup over an network, right? You are planning on using a 10Gbit or faster network fabric, right? a 1Gb ethernet will only get you 125MB/s max... doesn't take much tuning to hit that speed. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-20 9:03 ` Linda Walsh @ 2012-01-20 12:06 ` Peter Grandi 2012-01-20 15:55 ` Michael Monnerie 2012-01-23 4:21 ` Dave Chinner 0 siblings, 2 replies; 17+ messages in thread From: Peter Grandi @ 2012-01-20 12:06 UTC (permalink / raw) To: Linux fs XFS [ ... ] >> * XFS has several limitations on 32b kernels. Just make sure >> you have a 64b kernel. [ ... ] > I was unaware that the block size was larger on 64b kernels. > Is that what you are referring to ? (would be nice)... Not as such, the maximum block size is limited by the Linux page cache, that is hw page size, which is for IA32 and AMD64 architectures the same at 4KiB. However other architectures which are natively 64b allow bigger page sizes (notably IA64 [aka Itanium]), so the page cache and thus XFS can do larger blocks sizes. The limitations of XFS on 32b kernels come from limitations of XFS itself in 32b mode, limitations of Linux in 32b mode, and combined limitations. For example: * There be 32b inode numbers, which limit inodes to the first 1TB of a filetree if sector size is 512B. * The 32b block IO subsystems limits partition sizes to 16TiB. * XFS tools scanning a large filesystem, usually for repair, can run out of the available 32b address space (by default around 2GiB). Page 5 and 6 here list some limits: http://oss.sgi.com/projects/xfs/training/xfs_slides_02_overview.pdf _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-20 12:06 ` Peter Grandi @ 2012-01-20 15:55 ` Michael Monnerie 2012-01-23 4:21 ` Dave Chinner 1 sibling, 0 replies; 17+ messages in thread From: Michael Monnerie @ 2012-01-20 15:55 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: Text/Plain, Size: 660 bytes --] On Freitag, 20. Januar 2012 Peter Grandi wrote: > * There be 32b inode numbers, which limit inodes to the first > 1TB of a filetree if sector size is 512B. > > * The 32b block IO subsystems limits partition sizes to 16TiB. I thought those two have been removed by some updates? I think I remember to have read that. Not that it's too interesting, I've been running on 64b Linux everywhere since AMD has put it in their processors. Should be 10+ years or so. -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 [-- Attachment #1.2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Optimal XFS formatting options? 2012-01-20 12:06 ` Peter Grandi 2012-01-20 15:55 ` Michael Monnerie @ 2012-01-23 4:21 ` Dave Chinner 1 sibling, 0 replies; 17+ messages in thread From: Dave Chinner @ 2012-01-23 4:21 UTC (permalink / raw) To: Peter Grandi; +Cc: Linux fs XFS On Fri, Jan 20, 2012 at 12:06:31PM +0000, Peter Grandi wrote: > [ ... ] > >> * XFS has several limitations on 32b kernels. Just make sure > >> you have a 64b kernel. > [ ... ] > > I was unaware that the block size was larger on 64b kernels. > > Is that what you are referring to ? (would be nice)... > > Not as such, the maximum block size is limited by the Linux page > cache, that is hw page size, which is for IA32 and AMD64 > architectures the same at 4KiB. However other architectures > which are natively 64b allow bigger page sizes (notably IA64 > [aka Itanium]), so the page cache and thus XFS can do larger > blocks sizes. > > The limitations of XFS on 32b kernels come from limitations of > XFS itself in 32b mode, limitations of Linux in 32b mode, and > combined limitations. For example: > > * There be 32b inode numbers, which limit inodes to the first > 1TB of a filetree if sector size is 512B. Internally XFS still uses 64 bit inode numbers - the on-disk format does not change just because the CPU arch has changed. If you use the stat64() style interfaces, even on 32 bit machines you can access the full 64 bit inode numbers. > * The 32b block IO subsystems limits partition sizes to 16TiB. The sector_t is a 64 bit number even on 32 bit systems. The problem is that the page cache cannot index past offsets of 16TB. Given that XFS no longer uses the page cache for it's metadata indexing, we could remove this limit in the kernel code if we wanted to. And given that the userpsace tools use direct IO, the page cache limitation doesn't cause problems there, either, because we bypass it. So in theory we could lift this limit, but there really isn't much demand for >16TB filesystems on 32 bit, because.... > * XFS tools scanning a large filesystem, usually for repair, > can run out of the available 32b address space (by default > around 2GiB). .... you need 64 bit systems to handle the userspace memory requirements tools like xfs_check and xfs_repair require to run. If the filesystem is large enough that you can't run repair because it needs more than 2GB of RAM, then you shouldn't be using a 32 bit systems. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-01-24 10:31 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-14 17:44 Optimal XFS formatting options? MikeJeezy 2012-01-14 22:23 ` Stan Hoeppner 2012-01-16 0:27 ` MikeJeezy 2012-01-16 4:56 ` Stan Hoeppner 2012-01-16 23:11 ` Dave Chinner 2012-01-17 3:31 ` Stan Hoeppner 2012-01-17 9:19 ` Michael Monnerie 2012-01-17 11:17 ` Emmanuel Florac 2012-01-17 11:34 ` Stan Hoeppner 2012-01-20 15:52 ` Michael Monnerie 2012-01-20 22:44 ` Stan Hoeppner 2012-01-24 10:31 ` Michael Monnerie 2012-01-15 1:14 ` Peter Grandi 2012-01-20 9:03 ` Linda Walsh 2012-01-20 12:06 ` Peter Grandi 2012-01-20 15:55 ` Michael Monnerie 2012-01-23 4:21 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox