* question on new feature complexity/possibility/sensibility? (^ Alternate) @ 2011-06-26 0:29 Linda A. Walsh 2011-06-27 0:32 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Linda A. Walsh @ 2011-06-26 0:29 UTC (permalink / raw) To: xfs-oss I noticed in the 'cp' (coretuils 8.9-4.1) command on suse, there is a a "--reflink" that controls "clone/CoW" copies -- which says it performs a 'lightweight' copy where the data blocks are copied only when modified. Now it is vague the 'when modified', (i.e. does it mean ones that are different between the two copies) (src and dst), or does it mean only to copy blocks that were modified since 'some point' -- Doesn't say, but would guess it's src/dst diffs (I wonder if it is restricted to the same physical filesystem). Anyway, turns out, it's only for BTRFS (which I haven't yet used, and therefore know only that it supports operations like the above). Would it be practice to implement, some similar, feature in XFS? It wouldn't be practice or useful to do it on an 'extent' basis due to their large size...So to do something similar on XFS, I was thinking, with "some amount of effort", some number of "updated extents" could be kept, in addition to the original data. Any future modifications to the file would also have the extents modified, but any extents that overlap previous mods will be merged, and only the newest data would be kept (meaning that new sections that are written, that skip over parts of the file, wouldn't overwrite a pending change to that section -- only the bytes (granularity?) that were changed. I.e. file is 1Mb. User1 updates bytes 1k-200k. User2, later updates bytes 100k-300k, New modification 'extent' is created with 1k-300k, with bytes 1k-(100k-1) from user1 be saved, and 100k-300k from user2. Changes to the 'base' copy would be made upon some ioctl 'sync' command (file-by-file)... It would require up to double the amount of file space. ---- Another possibility would simply be to create a record of byte ranges that have been updated in the extent and the extent's last modification time. Then one could compare the mod times and apply the changes. The problem there would be having to keep a possibly 'large' log of changes (what if it's not sync/purged... couldn't be circular as that would allow events to be lost -- though the file system could be forced 'offline' if the event log became full ...a major pain...)..., but if it was created with a few G of space, might take a while...and if synced in time, no prob. Still, may be no great desire or benefit, but DAMN if I haven't wanted copy-on-write files for a LONG time. I.e. being able to hardlink files, but have an option to mark it as copy on write -- allowing space to be save when copying directory trees, but then dynamically making new copies when someone updates one of the linked copies. Maybe that's a different feature that could be more easily done? Anyway -- the new copy option just got me thinking.... Any ideas or work in thinking about things in this area for XFS to keep it updated and 'current as a "Filesystem of Choice"....? Thanks, Linda so if a user wants to modify it -- it _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question on new feature complexity/possibility/sensibility? (^ Alternate) 2011-06-26 0:29 question on new feature complexity/possibility/sensibility? (^ Alternate) Linda A. Walsh @ 2011-06-27 0:32 ` Dave Chinner 2011-06-27 4:08 ` Linda Walsh 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2011-06-27 0:32 UTC (permalink / raw) To: Linda A. Walsh; +Cc: xfs-oss On Sat, Jun 25, 2011 at 05:29:53PM -0700, Linda A. Walsh wrote: > I noticed in the 'cp' (coretuils 8.9-4.1) command on suse, there is > a a "--reflink" that controls "clone/CoW" copies -- which says > it performs a 'lightweight' copy where the data blocks are copied > only when modified. Now it is vague the 'when modified', (i.e. > does it mean ones that are different between the two copies) (src > and dst), or does it mean only to copy blocks that were modified > since 'some point' -- Doesn't say, but would guess it's src/dst diffs > (I wonder if it is restricted to the same physical filesystem). > > Anyway, turns out, it's only for BTRFS (which I haven't yet used, > and therefore know only that it supports operations like the above). Yup, it requires a refcounted, shared, copy-on-write extent index to do efficiently. > Would it be practice to implement, some similar, feature in XFS? It could be done, but it's a fairly large chunk of work.... > It wouldn't be practice or useful to do it on an 'extent' basis due > to their large size...So to do something similar on XFS, I was > thinking, with "some amount of effort", some number of "updated > extents" could be kept, in addition to the original data. Kept where, exactly? And how do you share the original extent tree between multiple inodes? And if all the inodes that share the original tree get truncated, how do you know that you can break the ref-link state and remove the original, now unreferenced tree? Once you have solved those problems, you have effectively designed a refcounted, shared, copy-on-write extent index.... > Any future modifications to the file would also have the extents > modified, but any extents that overlap previous mods will be merged, > and only the newest data would be kept (meaning that > new sections that are written, that skip over parts of the file, > wouldn't overwrite a pending change to that section -- only > the bytes (granularity?) that were changed. > > I.e. file is 1Mb. > User1 updates bytes 1k-200k. > User2, later updates bytes 100k-300k, New modification 'extent' is > created with 1k-300k, with bytes > 1k-(100k-1) from user1 be saved, and 100k-300k from user2. > > Changes to the 'base' copy would be made upon some ioctl 'sync' > command (file-by-file)... > > It would require up to double the amount of file space. For a single reflink copy, yes. But there's nothing stopping you from having multiple ref-link copies of the one file. And so the problem is far more complex than you are considering. I've looked at what it would require to implemnt reflinks transparently in XFS, and it's not pretty. Major surgery to the bmap code, a new btree type that includes back pointers to all the owner inodes, a new shadow inode type that holds the original tree, a new reflink inode type that contains the overwrite extent tree instead of a normal extent tree, a bunch of new transactions, new extent lookup/seek code, etc. I'd estimate it to be a 6 month project for someone who knew what they were doing. It's not just kernel code, but all the userspace tools need to be updated to understand reflinks and the COW based format (repair, check, db, bmap, etc) FWIW, I haven't even looked at how extended attributes are supposed to be handled on reflinked files, so that could increase the complexity significantly. > ---- > Another possibility would simply be to create a record of byte > ranges that have been updated in the extent and the extent's last > modification time. Then one could compare the mod times and apply > the changes. The problem there would be having to keep a > possibly 'large' log of changes (what if it's not sync/purged... > couldn't be circular as that would allow events to be lost -- though > the file system could be forced 'offline' if the event log became full > ...a major pain...)..., but if it was created with a few G of space, > might take a while...and if synced in time, no prob. > > Still, may be no great desire or benefit, but DAMN if I haven't > wanted copy-on-write files for a LONG time. So use a filesystem that supports them natively ;) > I.e. being able to hardlink files, but have an option to mark it as > copy on write -- allowing space to be save when copying directory trees, > but then dynamically making new copies when someone updates one of the > linked copies. The problem is that a reflink sort of looks like a hard link, but in many cases behaves like a soft link (e.g. different owners, permissions, etc are possible) and hence - combined with the copy-on-write behaviour - they need to be treated more like a soft-link in terms of implementation. Soft links have their own inode so can hold state separate to the inode they are pointing to, and for reflinked files it is simply not practical to retroactively modify the directory structure to point at a different inode when the first COW operation occurs. Like I said, it can be done, but it's not a small project. If you want to sink a significant amount of development time to the project, we will help you in any way we can. However, I don't think anyone has the time to do something like this for you.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question on new feature complexity/possibility/sensibility? (^ Alternate) 2011-06-27 0:32 ` Dave Chinner @ 2011-06-27 4:08 ` Linda Walsh 2011-06-27 6:02 ` Dave Chinner 0 siblings, 1 reply; 5+ messages in thread From: Linda Walsh @ 2011-06-27 4:08 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs-oss Dave Chinner wrote: > So use a filesystem that supports them natively ;) --- Got any in in mind that also support acls, extended attrs and has the reliability and performance of xfs? ;-) > >> I.e. being able to hardlink files, but have an option to mark it as >> copy on write -- allowing space to be save when copying directory trees, >> but then dynamically making new copies when someone updates one of the >> linked copies. > > The problem is that a reflink sort of looks like a hard link, but in > many cases behaves like a soft link (e.g. different owners, > permissions, etc are possible) and hence - combined with the > copy-on-write behaviour - they need to be treated more like a > soft-link in terms of implementation. ---- I can see this -- now this doesn't mean we are talking the same type of reflink with the shared data above is it? Cuz in this lower case, I was talking about hmmm....interesting.... I was thinking of just a copy-of-the whole file on write, but that'd be a potential pain on some systems depending on the file size.... > Soft links have their own > inode so can hold state separate to the inode they are pointing to, > and for reflinked files it is simply not practical to retroactively > modify the directory structure to point at a different inode when > the first COW operation occurs. ---- I see.... > > Like I said, it can be done, but it's not a small project. If you > want to sink a significant amount of development time to the > project, we will help you in any way we can. However, I don't think > anyone has the time to do something like this for you.... --- If I had the time and mental resources...I'd love to. But am a bit overwhelmed for time now, and even if I wasn't, I'd not be anywhere near certain I'd be able to maintain continuous focus for the length of time necessary to do that length of project.... if you know what I mean... Maybe certain if it was something that I or others could break down into useful 'subchunks' that could go in at separate times. Nothing useless by itself (if nothing less than specific 'upgrading' of data infrastructure (data fields on disk, routines to parse such...etc), but the whole thing at once seems pretty large. (I know...I'm being a wimp)... to support such things in the future _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question on new feature complexity/possibility/sensibility? (^ Alternate) 2011-06-27 4:08 ` Linda Walsh @ 2011-06-27 6:02 ` Dave Chinner 2011-06-27 8:22 ` Stan Hoeppner 0 siblings, 1 reply; 5+ messages in thread From: Dave Chinner @ 2011-06-27 6:02 UTC (permalink / raw) To: Linda Walsh; +Cc: xfs-oss On Sun, Jun 26, 2011 at 09:08:41PM -0700, Linda Walsh wrote: > > > Dave Chinner wrote: > >So use a filesystem that supports them natively ;) > --- > Got any in in mind that also support acls, extended attrs > and has the reliability and performance of xfs? ;-) For most "normal" workloads, ZFS would probably be your only production ready option. But that's not something you could use on Linux, is it? :/ Until btrfs is a completely baked cake, you won't be able to tick all those boxes, and even then there will be questions about performance... Mmmm, cake.... :) > >>I.e. being able to hardlink files, but have an option to mark it as > >>copy on write -- allowing space to be save when copying directory trees, > >>but then dynamically making new copies when someone updates one of the > >>linked copies. > > > >The problem is that a reflink sort of looks like a hard link, but in > >many cases behaves like a soft link (e.g. different owners, > >permissions, etc are possible) and hence - combined with the > >copy-on-write behaviour - they need to be treated more like a > >soft-link in terms of implementation. > ---- > I can see this -- now this doesn't mean we are talking the same > type of reflink with the shared data above is it? Cuz in this lower > case, I was talking about hmmm....interesting.... > > I was thinking of just a copy-of-the whole file on write, > but that'd be a potential pain on some systems depending on the file > size.... If you are going to take the pain of copying the entire file anyway, just copy it up front. XFS is designed for fast, large storage subsystems, so make use of it's capabilities ;) > >Soft links have their own > >inode so can hold state separate to the inode they are pointing to, > >and for reflinked files it is simply not practical to retroactively > >modify the directory structure to point at a different inode when > >the first COW operation occurs. > ---- > I see.... > > > >Like I said, it can be done, but it's not a small project. If you > >want to sink a significant amount of development time to the > >project, we will help you in any way we can. However, I don't think > >anyone has the time to do something like this for you.... > --- > If I had the time and mental resources...I'd love to. > But am a bit overwhelmed for time now, and even if I wasn't, I'd > not be anywhere near certain I'd be able to maintain continuous focus > for the length of time necessary to do that length of project.... > if you know what I mean... > > Maybe certain if it was something that I or others could break > down into useful 'subchunks' that could go in at separate times. Nothing > useless by itself (if nothing less than specific 'upgrading' of data > infrastructure (data fields on disk, routines to parse such...etc), > but the whole thing at once seems pretty large. Actually, none of it needs to be merged until complete support is available. If there's a need for parts of the work in mainline before the bigger set of work is complete, then we'd merge the bits needed and rebase the working tree on top the new mainline tree. git makes this sort of operation easy, so developement of such a feature could be done quite simply in a parallel branch. IOWs, even if I was doing this, I'd still be doing it a small chunk at a time before committing it to a public branch somewhere. It's easy to point interested parties at the code that way to collaborate on a separate branch until everything is done. I just wouldn't be asking for review/mainline inclusion until I had everything done and tested.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: question on new feature complexity/possibility/sensibility? (^ Alternate) 2011-06-27 6:02 ` Dave Chinner @ 2011-06-27 8:22 ` Stan Hoeppner 0 siblings, 0 replies; 5+ messages in thread From: Stan Hoeppner @ 2011-06-27 8:22 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs-oss On 6/27/2011 1:02 AM, Dave Chinner wrote: > On Sun, Jun 26, 2011 at 09:08:41PM -0700, Linda Walsh wrote: >> >> >> Dave Chinner wrote: >>> So use a filesystem that supports them natively ;) >> --- >> Got any in in mind that also support acls, extended attrs >> and has the reliability and performance of xfs? ;-) > > For most "normal" workloads, ZFS would probably be your only > production ready option. But that's not something you could use on > Linux, is it? :/ It is apparently usable on Linux for those willing to build and maintain it themselves. But that level of administrative burden in itself is the opposite of production ready. > Until btrfs is a completely baked cake, you won't be able to tick > all those boxes, and even then there will be questions about > performance... Or until Oracle releases ZFS under GPL, or a compatible license, which probably isn't going to happen. Apparently Oracle believes their SPARC and x86 hardware won't sell if ZFS is free on Linux. Now that Oracle owns Solaris and ZFS I'd bet they wish they could put the BTRFS genie back in the bottle, as well as OCFS, etc. -- Stan _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-06-27 8:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-26 0:29 question on new feature complexity/possibility/sensibility? (^ Alternate) Linda A. Walsh 2011-06-27 0:32 ` Dave Chinner 2011-06-27 4:08 ` Linda Walsh 2011-06-27 6:02 ` Dave Chinner 2011-06-27 8:22 ` Stan Hoeppner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox