* Question on migrating data between PVs in xfs @ 2016-08-09 14:50 Wei Lin 2016-08-09 22:35 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Wei Lin @ 2016-08-09 14:50 UTC (permalink / raw) To: xfs Hi there, I am working on an xfs based project and want to modify the allocation algorithm, which is quite involved. I am wondering if anyone could help with this. The high level goal is to create xfs agains multiple physical volumes, allow user to specify the target PV for files, and migrate files automatically. I plan to implement the user interface with extended attributes, but am now stuck with the allocation/migration part. Is there a way to make xfs respect the attribute, i.e. only allocate blocks/extents from the target PV specified by user? Any suggestion would be highly appreciated. Cheers, -- Wei Lin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-09 14:50 Question on migrating data between PVs in xfs Wei Lin @ 2016-08-09 22:35 ` Dave Chinner [not found] ` <20160810092313.GA16193@ic> 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2016-08-09 22:35 UTC (permalink / raw) To: Wei Lin; +Cc: xfs On Tue, Aug 09, 2016 at 03:50:47PM +0100, Wei Lin wrote: > Hi there, > > I am working on an xfs based project and want to modify the allocation > algorithm, which is quite involved. I am wondering if anyone could help > with this. > > The high level goal is to create xfs agains multiple physical volumes, > allow user to specify the target PV for files, and migrate files > automatically. So, essentially tiered storage with automatic migration. Can you describe the storage layout and setup you are thinking of using and how that will map to a single XFS filesystem so we have a better idea of what you are thinking of? > I plan to implement the user interface with extended attributes, but am > now stuck with the allocation/migration part. Is there a way to make xfs > respect the attribute, i.e. only allocate blocks/extents from the target > PV specified by user? Define "PV". XFS separates allocation by allocation group - it has no concept of underlying physical device layout. If I understand what you , you have multiple "physical volumes" set up in a single block device (somehow - please describe!) and now you want to control how data is allocated to those underlying volumes, right? So what you're asking about is how to define and implement user controlled allocation policies, right? Sorta like this old prototype I was working on years ago? http://oss.sgi.com/archives/xfs/2009-02/msg00250.html And some more info from a later discussion: http://oss.sgi.com/archives/xfs/2013-01/msg00611.html And maybe in conjunction with this, which added groupings of AGs together to form independent regions of "physical separation" that the allocator could then be made aware of: http://oss.sgi.com/archives/xfs/2009-02/msg00253.html These were more aimed at defining failure domains for error and corruption isolation: http://xfs.org/index.php/Reliable_Detection_and_Repair_of_Metadata_Corruption#Failure_Domains Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20160810092313.GA16193@ic>]
* Re: Question on migrating data between PVs in xfs [not found] ` <20160810092313.GA16193@ic> @ 2016-08-10 10:56 ` Dave Chinner 2016-08-10 16:31 ` Emmanuel Florac 2016-08-11 9:04 ` Wei Lin 0 siblings, 2 replies; 8+ messages in thread From: Dave Chinner @ 2016-08-10 10:56 UTC (permalink / raw) To: Wei Lin; +Cc: xfs Hi Wei, Please keep the discussion on the list unless there's good reason not to. I've readded the list cc... On Wed, Aug 10, 2016 at 10:23:14AM +0100, Wei Lin wrote: > Hi Dave, > > Thank you very much for the reply. Comment inline. > > On 16-08-10 08:35:03, Dave Chinner wrote: > > On Tue, Aug 09, 2016 at 03:50:47PM +0100, Wei Lin wrote: > > > Hi there, > > > > > > I am working on an xfs based project and want to modify the allocation > > > algorithm, which is quite involved. I am wondering if anyone could help > > > with this. > > > > > > The high level goal is to create xfs agains multiple physical volumes, > > > allow user to specify the target PV for files, and migrate files > > > automatically. > > > > So, essentially tiered storage with automatic migration. Can you > > describe the storage layout and setup you are thinking of using and > > how that will map to a single XFS filesystem so we have a better > > idea of what you are thinking of? > > > Yes, but the migration is triggerd by user specifying a device, instead > of kernel monitoring the usage pattern. That's not migration - that's an allocation policy. Migration means moving data at rest to a different physical location, such as via a HSM, automatic teiring or defragmentation. Deciding where to write when the first data is written is the job of the filesystem allocator, so what you are describing here is user-controlled allocation policy. > By "PV" I meant physical volumes of LVM. Currently I have two physical > volumes, one based on two SSDs and the other six HDDs. That's what I thought, but you still need to describe everything in full rather than assume the reader understands your abbreviations. > The XFS was > created as follows: > > mdadm --create /dev/md1 --raid-devices=2 --level=10 -p f2 --bitmap=internal --assume-clean /dev/nvme?n1 > mdadm --create /dev/md2 --raid-devices=6 --level=5 --bitmap=internal --assume-clean /dev/sd[c-h] > pvcreate /dev/md1 > pvcreate /dev/md2 > vgcreate researchvg /dev/md1 /dev/md2 > lvcreate -n hsd -l 100%FREE researchvg > mkfs.xfs -L HSD -l internal,lazy-count=1,size=128m /dev/mapper/researchvg-hsd It's a linear concatenation of multiple separate block devices, so the physical boundaries are hidden from the filesystem by the LVM layer. Have you lookd at using dm-cache instead of modifying the filesystem? > > > I plan to implement the user interface with extended attributes, but am > > > now stuck with the allocation/migration part. Is there a way to make xfs > > > respect the attribute, i.e. only allocate blocks/extents from the target > > > PV specified by user? > > > > Define "PV". > > > > XFS separates allocation by allocation group - it has no concept of > > underlying physical device layout. If I understand what you , you have > > multiple "physical volumes" set up in a single block device (somehow > > - please describe!) and now you want to control how data is > > allocated to those underlying volumes, right? > > I thought about storing the mapping between the physical volumes and the > logical volume in a special file, probably including metainfo like IOPS, > access time as well. And consulting this file on the fly to determine if > the allocated extent is within the target device. How does the filesystem determine whether an allocated extent is on a specific device when it has no knowledge of the underlying physical device boundaries? > > So what you're asking about is how to define and implement user > > controlled allocation policies, right? Sorta like this old > > prototype I was working on years ago? > > > > http://oss.sgi.com/archives/xfs/2009-02/msg00250.html > > > > And some more info from a later discussion: > > > > http://oss.sgi.com/archives/xfs/2013-01/msg00611.html > > > > And maybe in conjunction with this, which added groupings of AGs > > together to form independent regions of "physical separation" that > > the allocator could then be made aware of: > > > > http://oss.sgi.com/archives/xfs/2009-02/msg00253.html > > I am not sure if allocation(s) group would be a good unit of "physical > separation". There is no other construct in XFS designed for that purpose. > Since the underlying physical devices (thus the physical > volumes) have quite different characteristics, physical volumes seem > naturally a good choice. XFS knows nothing about those boundaries - you have to tell it where the boundaries are. e.g. size your allocation groups to fit the smallest physical boundary you have, then assign a different policy to the user of that allocation group. THat's the point of the patch set that allowed mkfs to define sets of AGs that lay in specific domains so that the allocator could target them based on the requirements supplied from the user in the allocation policy (which was the first patch set I pointed to). > On the other hand an allocation group may span > multiple physical volumes, providing quite different QoS. This is why I > planned to let users specify target "PV" instead of target allocation > group. Any ideas? Go read the code in the patches I pointed to first - they answer both the questions you are asking right now as these were the problems that I was looking to solve all that time ago. They will also answer many questions you haven't yet realised you need to ask, too. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-10 10:56 ` Dave Chinner @ 2016-08-10 16:31 ` Emmanuel Florac 2016-08-10 21:51 ` Dave Chinner 2016-08-11 9:04 ` Wei Lin 1 sibling, 1 reply; 8+ messages in thread From: Emmanuel Florac @ 2016-08-10 16:31 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, Wei Lin Le Wed, 10 Aug 2016 20:56:39 +1000 Dave Chinner <david@fromorbit.com> écrivait: > Have you lookd at using dm-cache instead of modifying the > filesystem? > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is significantly faster and dm-cache by far the slowest of the bunch, but bcache needs some more loving (his main developer is busy writing some new tiered, caching filesystem instead). -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-10 16:31 ` Emmanuel Florac @ 2016-08-10 21:51 ` Dave Chinner 2016-08-11 9:26 ` Wei Lin 2016-08-11 10:44 ` Emmanuel Florac 0 siblings, 2 replies; 8+ messages in thread From: Dave Chinner @ 2016-08-10 21:51 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs, Wei Lin On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote: > Le Wed, 10 Aug 2016 20:56:39 +1000 > Dave Chinner <david@fromorbit.com> écrivait: > > > Have you lookd at using dm-cache instead of modifying the > > filesystem? > > > > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is > significantly faster and dm-cache by far the slowest of the bunch, but > bcache needs some more loving (his main developer is busy writing > some new tiered, caching filesystem instead). Yeah, the problem with bcache is that it is effectively an orphaned driver. If there are obvious and reproducable performance differentials between bcache and dm-cache, you should bring them to the attention of the dm developers to see if they can fix them... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-10 21:51 ` Dave Chinner @ 2016-08-11 9:26 ` Wei Lin 2016-08-11 10:44 ` Emmanuel Florac 1 sibling, 0 replies; 8+ messages in thread From: Wei Lin @ 2016-08-11 9:26 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, Wei Lin On 16-08-11 07:51:49, Dave Chinner wrote: > On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote: > > Le Wed, 10 Aug 2016 20:56:39 +1000 > > Dave Chinner <david@fromorbit.com> écrivait: > > > > > Have you lookd at using dm-cache instead of modifying the > > > filesystem? > > > > > > > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache is > > significantly faster and dm-cache by far the slowest of the bunch, but > > bcache needs some more loving (his main developer is busy writing > > some new tiered, caching filesystem instead). > > Yeah, the problem with bcache is that it is effectively an orphaned > driver. If there are obvious and reproducable performance > differentials between bcache and dm-cache, you should bring them to > the attention of the dm developers to see if they can fix them... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com Software like dm-cache and bcache seem to use SSDs merely as caches instead of aggregating the capacity of all devices. However I just found aufs and overlayfs, which conceptually suit the purpose better. Cheers, -- Wei Lin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-10 21:51 ` Dave Chinner 2016-08-11 9:26 ` Wei Lin @ 2016-08-11 10:44 ` Emmanuel Florac 1 sibling, 0 replies; 8+ messages in thread From: Emmanuel Florac @ 2016-08-11 10:44 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, Wei Lin Le Thu, 11 Aug 2016 07:51:49 +1000 Dave Chinner <david@fromorbit.com> écrivait: > On Wed, Aug 10, 2016 at 06:31:32PM +0200, Emmanuel Florac wrote: > > Le Wed, 10 Aug 2016 20:56:39 +1000 > > Dave Chinner <david@fromorbit.com> écrivait: > > > > > Have you lookd at using dm-cache instead of modifying the > > > filesystem? > > > > > > > Or bcache, fcache, or EnhanceIO. So far from my own testing bcache > > is significantly faster and dm-cache by far the slowest of the > > bunch, but bcache needs some more loving (his main developer is > > busy writing some new tiered, caching filesystem instead). > > Yeah, the problem with bcache is that it is effectively an orphaned > driver. If there are obvious and reproducable performance > differentials between bcache and dm-cache, you should bring them to > the attention of the dm developers to see if they can fix them... Good idea. Well bcache may be orphaned of its main developer, however others still submit quite a lot of stability patches (among them Christoph Hellwig which is also active here IIRC). -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Question on migrating data between PVs in xfs 2016-08-10 10:56 ` Dave Chinner 2016-08-10 16:31 ` Emmanuel Florac @ 2016-08-11 9:04 ` Wei Lin 1 sibling, 0 replies; 8+ messages in thread From: Wei Lin @ 2016-08-11 9:04 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs, Wei Lin Hi Dave, Now I see your point. Initially I wanted to store the linear concat layout in a special file, read this file during allocation to compute on the fly if an extent falls into a physical volume. But now I do agree that from the perspective of OS engineering, filesystems should not know the underlying layout, at least not in such an "ad-hoc" way. Aligning AGs to physical volumes and applying allocation policy might be the best approach. Thank you very much for the help. Cheers, -- Wei Lin _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-08-11 10:45 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-09 14:50 Question on migrating data between PVs in xfs Wei Lin
2016-08-09 22:35 ` Dave Chinner
[not found] ` <20160810092313.GA16193@ic>
2016-08-10 10:56 ` Dave Chinner
2016-08-10 16:31 ` Emmanuel Florac
2016-08-10 21:51 ` Dave Chinner
2016-08-11 9:26 ` Wei Lin
2016-08-11 10:44 ` Emmanuel Florac
2016-08-11 9:04 ` Wei Lin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox