* XFS + LVM + DM-Thin + Multi-Volume External RAID @ 2016-11-24 1:23 Dave Hall 2016-11-24 9:43 ` Carlos Maiolino 0 siblings, 1 reply; 8+ messages in thread From: Dave Hall @ 2016-11-24 1:23 UTC (permalink / raw) To: linux-xfs Hello, I'm planning a storage installation on new hardware and I'd like to configure for best performance. I will have 24 to 48 drives in a SAS-attached RAID box with dual 12GB/s controllers (Dell MD3420 with 10K 1.8TB drives. The server is dual socket with 28 cores, 256GB RAM, dual 12GB HBAs, and multiple 10GB NICS. My workload is NFS for user home directories - highly random access patterns with frequent bursts of random writes. In order to maximize performance I'm planning to make multiple small RAID volumes (i.e. RAID5 - 4+1, or RAID6 - 8+2) that would be either striped or concatenated together. I'm looking for information on: - Are there any cautions or recommendations about XFS stability/performance on a thin volume with thin snapshots? - I've read that there are tricks and calculations for aligning XFS to the RAID stripes. Can use suggest any guidelines or tools for calculating the right configuration? - I've read also about tuning the number of allocation groups to reflect the CPU configuration of the server. Any suggestions on this? Thanks. -Dave ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-24 1:23 XFS + LVM + DM-Thin + Multi-Volume External RAID Dave Hall @ 2016-11-24 9:43 ` Carlos Maiolino 2016-11-24 19:44 ` Dave Chinner [not found] ` <ca974620-fff5-e26b-897d-c1c62d47cc64@binghamton.edu> 0 siblings, 2 replies; 8+ messages in thread From: Carlos Maiolino @ 2016-11-24 9:43 UTC (permalink / raw) To: Dave Hall; +Cc: linux-xfs Hi, On Wed, Nov 23, 2016 at 08:23:42PM -0500, Dave Hall wrote: > Hello, > > I'm planning a storage installation on new hardware and I'd like to > configure for best performance. I will have 24 to 48 drives in a > SAS-attached RAID box with dual 12GB/s controllers (Dell MD3420 with 10K > 1.8TB drives. The server is dual socket with 28 cores, 256GB RAM, dual 12GB > HBAs, and multiple 10GB NICS. > > My workload is NFS for user home directories - highly random access patterns > with frequent bursts of random writes. > > In order to maximize performance I'm planning to make multiple small RAID > volumes (i.e. RAID5 - 4+1, or RAID6 - 8+2) that would be either striped or > concatenated together. > > I'm looking for information on: > > - Are there any cautions or recommendations about XFS stability/performance > on a thin volume with thin snapshots? > > - I've read that there are tricks and calculations for aligning XFS to the > RAID stripes. Can use suggest any guidelines or tools for calculating the > right configuration? There is no magical trick :), you need to configure Stripe unit and stripe width according to your raid configuration. You should set stripe unit (su option) to the size of the stripes on your raid, and set the stripe width (sw option) according to the number of data disks on your array (if you have a 4+1 raid 5, it should be 4, into a 8+2 raid 6, it should be 8). > > - I've read also about tuning the number of allocation groups to reflect the > CPU configuration of the server. Any suggestions on this? > Allocation groups can't be bigger than 1TB. Assuming it should reflect your cpu configuration is wrong, having too few or too many allocation groups can kill your performance, and you also might face some another allocation problems in the future, when the filesystem get aged when runnning with very small allocation groups. Determining the size of the allocation groups, is a case-by-case approach, and it might need some experimenting. Since you are dealing with thin provisioning devices, I'd be more careful then. If you start with a small filesystem, and use the default configuration for mkfs, it will give you a number of AGs according to your current block device size, which can be a problem in the future when you decide to extend the filesystem, AG size can't be changed after you make the filesystem. Make a search on xfs list and you will see some reports of performance problems that ended up being caused by very small filesystems that were extended later, causing it to have lots of AGs. So, what's the initial size that you expect to have such filesystems? How much do you expect to grow them? These are some questions that might help you to have some idea about the size of the AGs. Regarding thing-provisioning, there are a couple things that you should keep in mind. - AGs segment the metadata across a whole disk, and increase parallelism in the filesystem, but, thin-provisioning will make such allocations sequential, despite where in the block device the filesystem tries to write, this is the nature of thin-provisioning devices so, I believe you should be more careful planning your DM-thin structure than the filesystem itself. - There is a bug I'm working on with XFS while using thin-provisioning devices, where, if you overcommit the filesystem size (i.e. it's bigger than the real amount of space the dm-thin device really has), you might face some problems in case you try to write to the filesystem but there is no more space available in the dm-thin device, this thread contains a part of the story: http://www.spinics.net/lists/linux-xfs/msg01248.html Which remembers my I need to come back to this bug asap. Just my 0.02, some other folks might remember something else. Cheers > Thanks. > > -Dave > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Carlos ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-24 9:43 ` Carlos Maiolino @ 2016-11-24 19:44 ` Dave Chinner 2016-11-25 14:50 ` Dave Hall [not found] ` <ca974620-fff5-e26b-897d-c1c62d47cc64@binghamton.edu> 1 sibling, 1 reply; 8+ messages in thread From: Dave Chinner @ 2016-11-24 19:44 UTC (permalink / raw) To: Dave Hall, linux-xfs On Thu, Nov 24, 2016 at 10:43:32AM +0100, Carlos Maiolino wrote: > Hi, > > On Wed, Nov 23, 2016 at 08:23:42PM -0500, Dave Hall wrote: > > Hello, > > > > I'm planning a storage installation on new hardware and I'd like to > > configure for best performance. I will have 24 to 48 drives in a > > SAS-attached RAID box with dual 12GB/s controllers (Dell MD3420 with 10K > > 1.8TB drives. The server is dual socket with 28 cores, 256GB RAM, dual 12GB > > HBAs, and multiple 10GB NICS. > > > > My workload is NFS for user home directories - highly random access patterns > > with frequent bursts of random writes. > > > > In order to maximize performance I'm planning to make multiple small RAID > > volumes (i.e. RAID5 - 4+1, or RAID6 - 8+2) that would be either striped or > > concatenated together. > > > > I'm looking for information on: > > > > - Are there any cautions or recommendations about XFS stability/performance > > on a thin volume with thin snapshots? > > > > - I've read that there are tricks and calculations for aligning XFS to the > > RAID stripes. Can use suggest any guidelines or tools for calculating the > > right configuration? > > There is no magical trick :), you need to configure Stripe unit and stripe width > according to your raid configuration. You should set stripe unit (su option) to > the size of the stripes on your raid, and set the stripe width (sw option) > according to the number of data disks on your array (if you have a 4+1 raid 5, > it should be 4, into a 8+2 raid 6, it should be 8). mkfs.xfs will do this setup automatically on software raid and any block device that exports the necessary information to set it up. In general, it's only older/cheaper hardware RAID that you have to worry about this anymore. > > - I've read also about tuning the number of allocation groups to reflect the > > CPU configuration of the server. Any suggestions on this? > > > > Allocation groups can't be bigger than 1TB. Assuming it should reflect your cpu > configuration is wrong, having too few or too many allocation groups can kill > your performance, and you also might face some another allocation problems in > the future, when the filesystem get aged when runnning with very small > allocation groups. It also depends on your storage, mostly. SSDs can handle agcount=NCPUS*2 easily, but for spinning storage this will cause additional seek loading and slow things down. In this case, the defaults are best. > Determining the size of the allocation groups, is a case-by-case approach, and > it might need some experimenting. > > Since you are dealing with thin provisioning devices, I'd be more careful then. > If you start with a small filesystem, and use the default configuration for > mkfs, it will give you a number of AGs according to your current block device > size, which can be a problem in the future when you decide to extend the > filesystem, AG size can't be changed after you make the filesystem. Make a > search on xfs list and you will see some reports of performance problems that > ended up being caused by very small filesystems that were extended later, > causing it to have lots of AGs. Yup, rule of thumb is that growing the fs size by an order of magnitude is fine, growing it by two orders of magnitude will cause problems. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-24 19:44 ` Dave Chinner @ 2016-11-25 14:50 ` Dave Hall 2016-11-26 17:52 ` Eric Sandeen 0 siblings, 1 reply; 8+ messages in thread From: Dave Hall @ 2016-11-25 14:50 UTC (permalink / raw) To: Dave Chinner, linux-xfs On 11/24/16 2:44 PM, Dave Chinner wrote: >>> - I've read that there are tricks and calculations for aligning XFS to the >>> > > RAID stripes. Can use suggest any guidelines or tools for calculating the >>> > > right configuration? >> > >> > There is no magical trick :), you need to configure Stripe unit and stripe width >> > according to your raid configuration. You should set stripe unit (su option) to >> > the size of the stripes on your raid, and set the stripe width (sw option) >> > according to the number of data disks on your array (if you have a 4+1 raid 5, >> > it should be 4, into a 8+2 raid 6, it should be 8). > mkfs.xfs will do this setup automatically on software raid and any > block device that exports the necessary information to set it up. > In general, it's only older/cheaper hardware RAID that you have to > worry about this anymore. > So how do we know for sure? Is there a way that we can be sure that the hardware RAID has exported this information? In lieu of this, is there a solid way to deduce or test for correct alignment? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-25 14:50 ` Dave Hall @ 2016-11-26 17:52 ` Eric Sandeen 0 siblings, 0 replies; 8+ messages in thread From: Eric Sandeen @ 2016-11-26 17:52 UTC (permalink / raw) To: Dave Hall, Dave Chinner, linux-xfs On 11/25/16 8:50 AM, Dave Hall wrote: >> mkfs.xfs will do this setup automatically on software raid and any >> block device that exports the necessary information to set it up. >> In general, it's only older/cheaper hardware RAID that you have to >> worry about this anymore. >> > So how do we know for sure? Is there a way that we can be sure that > the hardware RAID has exported this information? If you run mkfs.xfs and it shows stripe geometry in the stdout info, then it detected stripe geometry. Otherwise you can use lsblk -t to print the advertised topology: # lsblk -t /dev/md121 NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME md121 0 512 0 512 512 1 128 128 0B min io size is the stripe unit, opt IO size is the stripe width. (in the above case there is no stripe geometry; 512-byte min IO and 0 optimal IO is uninteresting). > In lieu of this, is > there a solid way to deduce or test for correct alignment? If the device itself doesn't advertise a stripe geometry and you think it has one, you'll need to look at the device settings, bios, documentation, configuration, or whatever else to work it out on your own, and then specify that manually on the mkfs.xfs commandline. -Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <ca974620-fff5-e26b-897d-c1c62d47cc64@binghamton.edu>]
[parent not found: <20161125111814.p7ltczag7akqk3w5@eorzea.usersys.redhat.com>]
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID [not found] ` <20161125111814.p7ltczag7akqk3w5@eorzea.usersys.redhat.com> @ 2016-11-25 15:20 ` Dave Hall 2016-11-25 17:09 ` Dave Hall 0 siblings, 1 reply; 8+ messages in thread From: Dave Hall @ 2016-11-25 15:20 UTC (permalink / raw) To: linux-xfs; +Cc: Carlos Maiolino On 11/25/16 6:18 AM, Carlos Maiolino wrote: >>> > > Regarding thing-provisioning, there are a couple things that you should keep in >>> > > mind. >>> > > >>> > > - AGs segment the metadata across a whole disk, and increase parallelism in the >>> > > filesystem, but, thin-provisioning will make such allocations sequential, >>> > > despite where in the block device the filesystem tries to write, this is the >>> > > nature of thin-provisioning devices so, I believe you should be more careful >>> > > planning your DM-thin structure than the filesystem itself. >> > >> > So it sounds like I should used striping for my logical volume to assure >> > that data is distributed across the whole physical array? > I'm not sure if I understand you question here, what kind of architecture you > have in your mind. All thin-provisioning allocation are sequential, block > requested, next block available served (although with the recent dm-thin versions > it will serve blocks in bundles, not on a block-by-block granularity anymore, > but it is still a sequential alignment. > > I am really not sure what you have in mind to 'force' the distribution across > the whole physical array. The only thing I could think was to have 2 dm-thin > devices, on different pools, and use them to build a stripped LVM. I don't know > if it is possible tbh, I never tried such configuration, but it's a setup bound > to have problems IMHO. > Currently I have 4 LVM PVs that are mapped to explicit groups of physical disks (RAID 5) in my array. I would either stripe or concatenate them together and create a single large DM-Thin LV and format it for XFS. If the PVs are concatenated it sounds like DM-Thin would fill up the first PV before moving to the next. It seems that DM-Thin on striped PVs would assure that disk activity is spread across all of the PVs and thus across all of the physical disks. Without DM-Thin, an XFS on concatenated PVs would probably tend to organize an AGs into single PVs which would spread disk activity across all of the physical disks, just in a different way. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-25 15:20 ` Dave Hall @ 2016-11-25 17:09 ` Dave Hall 2016-11-27 21:56 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Dave Hall @ 2016-11-25 17:09 UTC (permalink / raw) To: linux-xfs; +Cc: Carlos Maiolino -- Dave Hall Binghamton University kdhall@binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On 11/25/16 10:20 AM, Dave Hall wrote: > > > On 11/25/16 6:18 AM, Carlos Maiolino wrote: >>>> > > Regarding thing-provisioning, there are a couple things that >>>> you should keep in >>>> > > mind. >>>> > > >>>> > > - AGs segment the metadata across a whole disk, and increase >>>> parallelism in the >>>> > > filesystem, but, thin-provisioning will make such allocations >>>> sequential, >>>> > > despite where in the block device the filesystem tries to >>>> write, this is the >>>> > > nature of thin-provisioning devices so, I believe you should >>>> be more careful >>>> > > planning your DM-thin structure than the filesystem itself. >>> > >>> > So it sounds like I should used striping for my logical volume to >>> assure >>> > that data is distributed across the whole physical array? >> I'm not sure if I understand you question here, what kind of >> architecture you >> have in your mind. All thin-provisioning allocation are sequential, >> block >> requested, next block available served (although with the recent >> dm-thin versions >> it will serve blocks in bundles, not on a block-by-block granularity >> anymore, >> but it is still a sequential alignment. >> >> I am really not sure what you have in mind to 'force' the >> distribution across >> the whole physical array. The only thing I could think was to have 2 >> dm-thin >> devices, on different pools, and use them to build a stripped LVM. I >> don't know >> if it is possible tbh, I never tried such configuration, but it's a >> setup bound >> to have problems IMHO. >> > > Currently I have 4 LVM PVs that are mapped to explicit groups of > physical disks (RAID 5) in my array. I would either stripe or > concatenate them together and create a single large DM-Thin LV and > format it for XFS. > > If the PVs are concatenated it sounds like DM-Thin would fill up the > first PV before moving to the next. It seems that DM-Thin on striped > PVs would assure that disk activity is spread across all of the PVs > and thus across all of the physical disks. Without DM-Thin, an XFS on > concatenated PVs would probably tend to organize an AGs into single > PVs which would spread disk activity across all of the physical disks, > just in a different way. > I'd like to add some clarification just to be sure... The configuration strategy I've been using for my physical storage array is to map specific disks into a small RAID group and define a single LUN per RAID group. Thus, each LUN presented to the server is currently mapped to a group of 5 disks in RAID 5. If I understand correctly an LVM Logical Volume presents a single linear storage space to the file system (XFS) regardless of the underlying storage organization. XFS divides this space into a number of Allocation Groups that it perceives to be contiguous sub-volumes within the Logical Volume. With a concatenated LV most AGs would be mapped to a single PV, but XFS would still disperse disk activity across all AGs and thus across all PVs. With a striped LV each AG would be striped across multiple PVs, which would change the distribution of disk activity across the PVs but still lead to all PVs being fairly active. With DM-Thin, things would change. XFS would perceive that it's AGs were fully allocated, but in reality new chunks of storage would be allocated as needed. If DM-Thin uses a linear allocation algorithm on a concatenated LV it would seem that certain kinds of disk activity would tend to be concentrated in a single PV at a time. On the other hand, DM-Thin in a striped LV would tend to spread things around more evenly regardless of allocation patterns. Please let me know if this perception is accurate. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: XFS + LVM + DM-Thin + Multi-Volume External RAID 2016-11-25 17:09 ` Dave Hall @ 2016-11-27 21:56 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2016-11-27 21:56 UTC (permalink / raw) To: Dave Hall; +Cc: linux-xfs, Carlos Maiolino On Fri, Nov 25, 2016 at 12:09:10PM -0500, Dave Hall wrote: > With a concatenated LV most AGs would be mapped to a single PV, but > XFS would still disperse disk activity across all AGs and thus > across all PVs. Like all things, this is only partially true. For inode64 (the default) the allocation load is spread based on directory structure. If all your work hits a single directory, then it won't get spread across multiple devices. The log will land on a single device, so it will always be limited by the throughput of that device. And read/overwrite workloads will only hit single devices, too. So unless you have a largely concurrent, widely distributed set of access patterns, XFS won't distribute the IO load. Now inode32, OTOH, distributes the data to different AGs at allocation time, meaning that data in a single directory is spread across multiple devices. However, all the metadata will be on the first device and that guarantees a device loading imbalance will occur. > With a striped LV each AG would be striped across > multiple PVs, which would change the distribution of disk activity > across the PVs but still lead to all PVs being fairly active. Striped devices can be thought of as the same as a single spindle - the characteristics from the filesystem perspective are the same, just with some added alignment constraints to optimise placement... > With DM-Thin, things would change. XFS would perceive that it's AGs > were fully allocated, but in reality new chunks of storage would be > allocated as needed. If DM-Thin uses a linear allocation algorithm > on a concatenated LV it would seem that certain kinds of disk > activity would tend to be concentrated in a single PV at a time. On > the other hand, DM-Thin in a striped LV would tend to spread things > around more evenly regardless of allocation patterns. Yup, exactly the same as for a filesystem. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-27 21:57 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-11-24 1:23 XFS + LVM + DM-Thin + Multi-Volume External RAID Dave Hall 2016-11-24 9:43 ` Carlos Maiolino 2016-11-24 19:44 ` Dave Chinner 2016-11-25 14:50 ` Dave Hall 2016-11-26 17:52 ` Eric Sandeen [not found] ` <ca974620-fff5-e26b-897d-c1c62d47cc64@binghamton.edu> [not found] ` <20161125111814.p7ltczag7akqk3w5@eorzea.usersys.redhat.com> 2016-11-25 15:20 ` Dave Hall 2016-11-25 17:09 ` Dave Hall 2016-11-27 21:56 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).