* Help with XFS in VMs on VMFS @ 2013-03-28 13:21 Jan Perci 2013-03-28 14:59 ` Stefan Ring 2013-03-28 19:50 ` Stan Hoeppner 0 siblings, 2 replies; 11+ messages in thread From: Jan Perci @ 2013-03-28 13:21 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 1525 bytes --] Hello. I would like to use XFS in VMs with VMFS datastores on top of RAID-6. The RAID is a FC 14+2 x 4TB with 64K stripe. There are 6 of these arrays. Each contains one aligned VMFS partition, and this VMFS partition is shared by 4 ESXi hosts. Each host runs 2-3 compute nodes, and some of these nodes have multiple partitions consuming 20-50 TB. The data is comprised of files ranging from 100KB to 500KB, with few outliers reaching many MB. The directory hierarchy is such that no single directory contains more than 2,000 or so of these files. The data is added almost exclusively append-only, i.e. write once when added and read many times afterwards, but they come in spikes of 1-20GB at a time. As the partitions fill up, new ones are added, but sometimes the existing partitions must be grown. Normally I would use raw mappings and XFS directly on the volumes. But there is a hard requirement to support VM snapshots, so all the data must reside within VMDK files on the VMFS datastores. ESXi has a VMDK size limit of 2TB. So, I am forced to create many 2TB virtual disks and attach them to the host, then use Linux LVM to group them into a single LV, then create XFS on the LV. This setup is not optimal and has risks, but I must work within some constraints. There are a few things I can do to increase I/O performance, such as distributing the VMDK files used by each LV across the 6 VMFS datastores. But can XFS be tuned as well? Do stripe unit and stripe width help? Thanks for your help. Jan. [-- Attachment #1.2: Type: text/html, Size: 1666 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci @ 2013-03-28 14:59 ` Stefan Ring 2013-03-28 19:50 ` Stan Hoeppner 1 sibling, 0 replies; 11+ messages in thread From: Stefan Ring @ 2013-03-28 14:59 UTC (permalink / raw) To: Jan Perci; +Cc: xfs > This setup is not optimal and has risks, but I must work within some > constraints. There are a few things I can do to increase I/O performance, > such as distributing the VMDK files used by each LV across the 6 VMFS > datastores. But can XFS be tuned as well? Do stripe unit and stripe width > help? Thanks for your help. I guess you should make the number of allocation groups equal to or a multiple of the number of concatenated VMDK files (assuming they are equally sized). Any more fiddling is probably not worth the effort. But I'm sure you'll get lengthy answers from other people on the list ;). _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci 2013-03-28 14:59 ` Stefan Ring @ 2013-03-28 19:50 ` Stan Hoeppner 2013-03-28 21:45 ` Ralf Gross 1 sibling, 1 reply; 11+ messages in thread From: Stan Hoeppner @ 2013-03-28 19:50 UTC (permalink / raw) To: Jan Perci; +Cc: xfs On 3/28/2013 8:21 AM, Jan Perci wrote: > Normally I would use raw mappings and XFS directly on the volumes. But > there is a hard requirement to support VM snapshots, so all the data must > reside within VMDK files on the VMFS datastores. Since when? ESX has had LUN snapshot capability back to 3.0, 6 years or so. It may have required the VCB add on back then. Is this simply a limitation of the freebie version? If so, pony up and pay for what you need, or switch to a FOSS solution which has no such limitations. VMFS volumes are not intended for high performance IO. Unless things have changed recently, VMware has always recommended housing only OS images and the like in VMDKs, not user data. They've always recommended using RDMs for everything else. IIRC VMDKs have a huge block (sector) size, something like 1MB. That's going to make XFS alignment difficult, if not impossible. I cannot stress emphatically enough that you should not stitch 2TB VMDKs together and use them in the manner you described. This is a recipe for disaster. Find another solution. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 19:50 ` Stan Hoeppner @ 2013-03-28 21:45 ` Ralf Gross 2013-03-28 22:13 ` Emmanuel Florac 2013-03-29 0:56 ` Stan Hoeppner 0 siblings, 2 replies; 11+ messages in thread From: Ralf Gross @ 2013-03-28 21:45 UTC (permalink / raw) To: xfs Stan Hoeppner schrieb: > On 3/28/2013 8:21 AM, Jan Perci wrote: > > > Normally I would use raw mappings and XFS directly on the volumes. But > > there is a hard requirement to support VM snapshots, so all the data must > > reside within VMDK files on the VMFS datastores. > > Since when? ESX has had LUN snapshot capability back to 3.0, 6 years or > so. It may have required the VCB add on back then. Snapshots are possible with RDM in virtual compatibily mode, not physical mode (> 2 TB). http://pubs.vmware.com/vsphere-51/topic/com.vmware.vsphere.storage.doc/GUID-0114693D-94BF-4D0E-9BA4-416D4A51A5A1.html > Is this simply a limitation of the freebie version? If so, pony up and > pay for what you need, or switch to a FOSS solution which has no such > limitations. No, thats the limit for all versions. > VMFS volumes are not intended for high performance IO. Unless things > have changed recently, VMware has always recommended housing only OS > images and the like in VMDKs, not user data. They've always recommended > using RDMs for everything else. IIRC VMDKs have a huge block (sector) > size, something like 1MB. That's going to make XFS alignment difficult, > if not impossible. I can't remember that I've every found this recommendation on a vmware page. http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html > I cannot stress emphatically enough that you should not stitch 2TB VMDKs > together and use them in the manner you described. This is a recipe for > disaster. Find another solution. I'm seeing more and more requests for VMs with large disks lately in my env. Right now the max. is ~2 TB. I'm also thinking about where to go, > 2 TB ist only possible with pRDMs which can't be snapshotted. You have to use the snapshot features of your storage array. Ralf _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 21:45 ` Ralf Gross @ 2013-03-28 22:13 ` Emmanuel Florac 2013-03-29 14:23 ` Ralf Gross 2013-03-29 0:56 ` Stan Hoeppner 1 sibling, 1 reply; 11+ messages in thread From: Emmanuel Florac @ 2013-03-28 22:13 UTC (permalink / raw) To: Ralf Gross; +Cc: xfs Le Thu, 28 Mar 2013 22:45:50 +0100 vous écriviez: > I'm seeing more and more requests for VMs with large disks lately in > my env. Right now the max. is ~2 TB. I'm also thinking about where to > go, > > 2 TB ist only possible with pRDMs which can't be snapshotted. You > have to use the snapshot features of your storage array. Maybe you could give LVM snapshot a new try. They got better recently. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 22:13 ` Emmanuel Florac @ 2013-03-29 14:23 ` Ralf Gross 0 siblings, 0 replies; 11+ messages in thread From: Ralf Gross @ 2013-03-29 14:23 UTC (permalink / raw) To: xfs Emmanuel Florac schrieb: > Le Thu, 28 Mar 2013 22:45:50 +0100 vous écriviez: > > > I'm seeing more and more requests for VMs with large disks lately in > > my env. Right now the max. is ~2 TB. I'm also thinking about where to > > go, > > > 2 TB ist only possible with pRDMs which can't be snapshotted. You > > have to use the snapshot features of your storage array. > > Maybe you could give LVM snapshot a new try. They got better recently. I need the bigger disks for win VMs ;) Ralf _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-28 21:45 ` Ralf Gross 2013-03-28 22:13 ` Emmanuel Florac @ 2013-03-29 0:56 ` Stan Hoeppner 2013-03-29 3:30 ` Jan Perci 1 sibling, 1 reply; 11+ messages in thread From: Stan Hoeppner @ 2013-03-29 0:56 UTC (permalink / raw) To: xfs On 3/28/2013 4:45 PM, Ralf Gross wrote: > Stan Hoeppner schrieb: > Snapshots are possible with RDM in virtual compatibily mode, not > physical mode (> 2 TB). So 2TB is the kicker here. I haven't used ESX since 3.x, and none of our RDMs back then were close to 2TB. IIRC our largest was 500GB. >> VMFS volumes are not intended for high performance IO. Unless things >> have changed recently, VMware has always recommended housing only OS >> images and the like in VMDKs, not user data. They've always recommended >> using RDMs for everything else. IIRC VMDKs have a huge block (sector) >> size, something like 1MB. That's going to make XFS alignment difficult, >> if not impossible. > > I can't remember that I've every found this recommendation on a vmware > page. > > http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html If you drill down through that you find this: http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf RDMs have better large sequential performance, and lower CPU burn than VMDKs. The OP mentioned "compute node" in his post, which suggests an HPC application workload, which suggests large sequential IO. Also note that VMware is Microsoft centric so they always run their tests using an MS Server guest. Also note they always test with tiny volumes, in this case 20GB. NTFS isn't going to have any trouble at this size, but at say 20TB it probably will and these published results would likely be quite different at that scale. XFS performance characteristics on a 2TB or 20TB or ?? TB volume will likely be substantially different than NTFS. Their tests show 5-8% lower CPU burn for RDM vs VMDK. Not a huge difference, but again they're testing only 20GB. >> I cannot stress emphatically enough that you should not stitch 2TB VMDKs >> together and use them in the manner you described. This is a recipe for >> disaster. Find another solution. > > I'm seeing more and more requests for VMs with large disks lately in my > env. Right now the max. is ~2 TB. I'm also thinking about where to go, > > 2 TB ist only possible with pRDMs which can't be snapshotted. You > have to use the snapshot features of your storage array. And more and more folks are using midrange FC/iSCSI arrays that don't have snapshot features, others are using DAS with RAID HBAs, in both cases forcing them to rely on ESX snapshots. Sounds like VMware needs to bump this artificial 2TB limit quite a bit higher. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-29 0:56 ` Stan Hoeppner @ 2013-03-29 3:30 ` Jan Perci 2013-03-29 20:27 ` Ben Myers 0 siblings, 1 reply; 11+ messages in thread From: Jan Perci @ 2013-03-29 3:30 UTC (permalink / raw) To: xfs [-- Attachment #1.1: Type: text/plain, Size: 4498 bytes --] Thank you for your responses. Since this list is for XFS, I do not wish to go off topic too far into VM's. But I will provide more context. A key factor is the need for >2TB file systems that can be snapshot and reverted quickly. We have other FC arrays attached to compute nodes without this requirement, and they have XFS directly on the FC logical volumes made accessible to native nodes and VM nodes via RDM. Our FC arrays do not have native snapshot features, so we must use a software layer whether that is Linux LVM, ESXi, or something else. And because of our unique usage patterns and constraints, we have settled on VMware over other virtualization technologies. We are using ESXi (free version) but can upgrade to ESX if necessary. However, the upgrade wouldn't fix the 2TB snapshot limit. We are certainly not in the true HPC realm, but we do have about 20 physical compute nodes that do both random and sequential I/O. An example query might identify a 10-500GB data set comprised of 100-500KB files. Some work sets are processor bound with disk I/O accounting for less than 5%. However, others are spending about 50% on disk I/O, so improving performance would be helpful - again in the context of the snapshot requirement. Point well understood about the risks of striping multiple 2TB VMDK files together. But because of the constraints, it's either 2TB VMDK's or 2TB RDM's in virtual compatibility mode, and they both seem about equally risky. Do you have better suggestions? Back to XFS, in this context, is there any benefit in tuning some parameters to get better performance, or will it all just be overshadowed by poor performance of the VMDKs that tuning isn't worthwhile? Jan. On Thu, Mar 28, 2013 at 8:56 PM, Stan Hoeppner <stan@hardwarefreak.com>wrote: > On 3/28/2013 4:45 PM, Ralf Gross wrote: > > Stan Hoeppner schrieb: > > > Snapshots are possible with RDM in virtual compatibily mode, not > > physical mode (> 2 TB). > > So 2TB is the kicker here. I haven't used ESX since 3.x, and none of > our RDMs back then were close to 2TB. IIRC our largest was 500GB. > > >> VMFS volumes are not intended for high performance IO. Unless things > >> have changed recently, VMware has always recommended housing only OS > >> images and the like in VMDKs, not user data. They've always recommended > >> using RDMs for everything else. IIRC VMDKs have a huge block (sector) > >> size, something like 1MB. That's going to make XFS alignment difficult, > >> if not impossible. > > > > I can't remember that I've every found this recommendation on a vmware > > page. > > > > http://blogs.vmware.com/vsphere/2013/01/vsphere-5-1-vmdk-versus-rdm.html > > If you drill down through that you find this: > http://www.vmware.com/files/pdf/performance_char_vmfs_rdm.pdf > > RDMs have better large sequential performance, and lower CPU burn than > VMDKs. The OP mentioned "compute node" in his post, which suggests an > HPC application workload, which suggests large sequential IO. > > Also note that VMware is Microsoft centric so they always run their > tests using an MS Server guest. Also note they always test with tiny > volumes, in this case 20GB. NTFS isn't going to have any trouble at > this size, but at say 20TB it probably will and these published results > would likely be quite different at that scale. XFS performance > characteristics on a 2TB or 20TB or ?? TB volume will likely be > substantially different than NTFS. Their tests show 5-8% lower CPU burn > for RDM vs VMDK. Not a huge difference, but again they're testing only > 20GB. > > >> I cannot stress emphatically enough that you should not stitch 2TB VMDKs > >> together and use them in the manner you described. This is a recipe for > >> disaster. Find another solution. > > > > I'm seeing more and more requests for VMs with large disks lately in my > > env. Right now the max. is ~2 TB. I'm also thinking about where to go, > > > 2 TB ist only possible with pRDMs which can't be snapshotted. You > > have to use the snapshot features of your storage array. > > And more and more folks are using midrange FC/iSCSI arrays that don't > have snapshot features, others are using DAS with RAID HBAs, in both > cases forcing them to rely on ESX snapshots. Sounds like VMware needs > to bump this artificial 2TB limit quite a bit higher. > > -- > Stan > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > [-- Attachment #1.2: Type: text/html, Size: 5774 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-29 3:30 ` Jan Perci @ 2013-03-29 20:27 ` Ben Myers 2013-03-30 19:12 ` Stan Hoeppner 0 siblings, 1 reply; 11+ messages in thread From: Ben Myers @ 2013-03-29 20:27 UTC (permalink / raw) To: Jan Perci; +Cc: xfs Hi Jan, On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote: > Back to XFS, in this context, is there any benefit in tuning some > parameters to get better performance, or will it all just be overshadowed > by poor performance of the VMDKs that tuning isn't worthwhile? At least get your stripe unit and width correct. http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance Beyond that I suggest you stick with the defaults unless you have a specific need. e.g. heavy usage of extended attributes might prompt you to use a larger inode size to keep them inline. Regards, Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-29 20:27 ` Ben Myers @ 2013-03-30 19:12 ` Stan Hoeppner 2013-03-31 2:04 ` Dave Chinner 0 siblings, 1 reply; 11+ messages in thread From: Stan Hoeppner @ 2013-03-30 19:12 UTC (permalink / raw) To: Ben Myers; +Cc: Jan Perci, xfs On 3/29/2013 3:27 PM, Ben Myers wrote: > Hi Jan, > > On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote: >> Back to XFS, in this context, is there any benefit in tuning some >> parameters to get better performance, or will it all just be overshadowed >> by poor performance of the VMDKs that tuning isn't worthwhile? > > At least get your stripe unit and width correct. > http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance Is this really a good idea given that XFS sits atop a virtual disk which consists of multiple concatenated 2TB sparse files sitting on the VMFS filesystem, which, IIRC, has a 1MB sector size? Thus can one rely on XFS being able to properly align to the physical RAID stripe, even if the math is done 'properly' (if that's even possible here)? In a complex stack like this I'd recommend defaults across the board. Misalignment hurts performance far more than proper alignment increases it. No alignment is agnostic, 4KB IOs only, so you neither gain nor lose. > Beyond that I suggest you stick with the defaults unless you have a specific > need. e.g. heavy usage of extended attributes might prompt you to use a larger > inode size to keep them inline. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with XFS in VMs on VMFS 2013-03-30 19:12 ` Stan Hoeppner @ 2013-03-31 2:04 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2013-03-31 2:04 UTC (permalink / raw) To: Stan Hoeppner; +Cc: Ben Myers, Jan Perci, xfs On Sat, Mar 30, 2013 at 02:12:36PM -0500, Stan Hoeppner wrote: > On 3/29/2013 3:27 PM, Ben Myers wrote: > > Hi Jan, > > > > On Thu, Mar 28, 2013 at 11:30:01PM -0400, Jan Perci wrote: > >> Back to XFS, in this context, is there any benefit in tuning some > >> parameters to get better performance, or will it all just be overshadowed > >> by poor performance of the VMDKs that tuning isn't worthwhile? > > > > At least get your stripe unit and width correct. > > http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance > > Is this really a good idea given that XFS sits atop a virtual disk which > consists of multiple concatenated 2TB sparse files sitting on the VMFS > filesystem, which, IIRC, has a 1MB sector size? Thus can one rely on > XFS being able to properly align to the physical RAID stripe, even if > the math is done 'properly' (if that's even possible here)? No, because VMFS doesn't do any specific alignment to the underlying storage geometry. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-03-31 2:04 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-28 13:21 Help with XFS in VMs on VMFS Jan Perci 2013-03-28 14:59 ` Stefan Ring 2013-03-28 19:50 ` Stan Hoeppner 2013-03-28 21:45 ` Ralf Gross 2013-03-28 22:13 ` Emmanuel Florac 2013-03-29 14:23 ` Ralf Gross 2013-03-29 0:56 ` Stan Hoeppner 2013-03-29 3:30 ` Jan Perci 2013-03-29 20:27 ` Ben Myers 2013-03-30 19:12 ` Stan Hoeppner 2013-03-31 2:04 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox