* Re: BTRFS file clone support for cp [not found] ` <87ws5tvrq8.fsf@master.homenet> @ 2009-07-27 23:40 ` Pádraig Brady 2009-07-28 20:06 ` Giuseppe Scrivano [not found] ` <87k51r9sxh.fsf@master.homenet> 0 siblings, 2 replies; 21+ messages in thread From: Pádraig Brady @ 2009-07-27 23:40 UTC (permalink / raw) To: Giuseppe Scrivano; +Cc: bug-coreutils, Jim Meyering, linux-btrfs Giuseppe Scrivano wrote: > Jim Meyering <jim@meyering.net> writes: >=20 >>> Another possible issue with this I can think of is >>> depending on the modification pattern of the COW files, >>> the modification processes could fragment the file or >>> more seriously be given ENOSPC errors. >> I hope btrfs takes care of this behind the scene. >> >> How does the clone work wrt to space consumed, a la df? >> If copying a 1GB file this way does not update usage >> stats to reflect the additional 1GB of space used, ... >=20 > I tried to clone a big file and df reported a different "used blocks" > stat that it was before the clone operation. How different exactly? OK I tried this myself on F11 with inconclusive results. $ uname -r 2.6.29.6-213.fc11.i586 $ sudo yum install btrfs-progs # dd bs=3D1M count=3D300 if=3D/dev/zero of=3D/btrfs.img #min size? # mkfs.btrfs /btrfs.img # mkdir /btrfs # mount -o loop /btrfs.img /btrfs # cd /btrfs # dd bs=3D1M count=3D100 if=3D/dev/zero of=3Dalloc.test # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 28K 300M 1% /btrfs # df -h . #only allocated about 30s later Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 101M 200M 34% /btrfs # /home/padraig/clone_file alloc.test alloc.test.clone # umount /btrfs # mount -o loop /btrfs.img /btrfs # cd btrfs # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 101M 200M 34% /btrfs OK the above suggests that the clone doesn't take any space as I would expect. Then it starts getting confusing... # du -h * 100M alloc.test 244M alloc.test.clone #wha? # dd bs=3D1M count=3D200 if=3D/dev/zero of=3Duse.space dd: writing `use.space': No space left on device 101+0 records in 100+0 records out # ls -l total 454656 -rw-r--r-- 1 root root 104857600 2009-07-28 00:06 alloc.test -rw-r--r-- 1 root root 104857600 2009-07-28 00:07 alloc.test.clone -rw-r--r-- 1 root root 104857600 2009-07-28 00:18 use.space # df -h . Filesystem Size Used Avail Use% Mounted on /dev/loop0 300M 184M 117M 62% /btrfs The above suggests that the clone does actually allocate space but btrfs isn't reporting it through statvfs correctly? If the clone does allocate space, then how can one clone without allocation which could be very useful for snapshotting for example? Also I tried the above twice and both times got: http://www.kerneloops.org/submitresult.php?number=3D578993 cheers, P=E1draig. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-27 23:40 ` BTRFS file clone support for cp Pádraig Brady @ 2009-07-28 20:06 ` Giuseppe Scrivano 2009-07-29 13:01 ` Chris Mason [not found] ` <87k51r9sxh.fsf@master.homenet> 1 sibling, 1 reply; 21+ messages in thread From: Giuseppe Scrivano @ 2009-07-28 20:06 UTC (permalink / raw) To: Pádraig Brady; +Cc: bug-coreutils, Jim Meyering, linux-btrfs Hi P=C3=A1draig, P=C3=A1draig Brady <P@draigBrady.com> writes: > How different exactly? > OK I tried this myself on F11 with inconclusive results. I can't replicate it now, all tests I am doing report that blocks used before and after the clone are the same. Probably yesterday the difference I noticed was in reality the original file flushed to the disk. > The above suggests that the clone does actually allocate space > but btrfs isn't reporting it through statvfs correctly? The same message appeared here too some days ago, though I cloned only few Kb files, not much to fill the entire partition. > If the clone does allocate space, then how can one > clone without allocation which could be very useful > for snapshotting for example? I don't know if snapshotting is handled in the same way as a "clone", but in this case it seems more obvious to me that no additional space should be reported. > Also I tried the above twice and both times got: > http://www.kerneloops.org/submitresult.php?number=3D578993 I didn't get these errors. I am using the btrfs git version. Regards, Giuseppe ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-28 20:06 ` Giuseppe Scrivano @ 2009-07-29 13:01 ` Chris Mason 2009-07-29 14:14 ` Pádraig Brady 0 siblings, 1 reply; 21+ messages in thread From: Chris Mason @ 2009-07-29 13:01 UTC (permalink / raw) To: Giuseppe Scrivano Cc: Pádraig Brady, Jim Meyering, bug-coreutils, linux-btrfs On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: > Hi P=E1draig, >=20 >=20 > P=E1draig Brady <P@draigBrady.com> writes: >=20 > > How different exactly? > > OK I tried this myself on F11 with inconclusive results. >=20 > I can't replicate it now, all tests I am doing report that blocks use= d > before and after the clone are the same. Probably yesterday the > difference I noticed was in reality the original file flushed to the > disk. The clone will use some additional space for the metadata required to point to the cloned blocks. It isn't exactly O(1) it is O(metadata for the file). >=20 >=20 > > The above suggests that the clone does actually allocate space > > but btrfs isn't reporting it through statvfs correctly? >=20 > The same message appeared here too some days ago, though I cloned onl= y > few Kb files, not much to fill the entire partition. >=20 >=20 > > If the clone does allocate space, then how can one > > clone without allocation which could be very useful > > for snapshotting for example? >=20 > I don't know if snapshotting is handled in the same way as a "clone", > but in this case it seems more obvious to me that no additional space > should be reported. The COW for snapshotting and a clone are the same, but the way we get there is a little different. For a snapshot, we have two btree roots pointing to the same nodes, and we've incremented the reference count o= n each of the nodes they both point to. No matter how big the subvolume is, this will always be O(number of pointers in the root block). Cloning a file is done by walking the file metadata and taking a reference on each extent pointed to by the file. The file data is neve= r read in, but all of the file metadata is read in. >=20 >=20 > > Also I tried the above twice and both times got: > > http://www.kerneloops.org/submitresult.php?number=3D578993 >=20 > I didn't get these errors. I am using the btrfs git version. These have been fixed. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-29 13:01 ` Chris Mason @ 2009-07-29 14:14 ` Pádraig Brady 2009-07-29 16:10 ` Chris Mason 0 siblings, 1 reply; 21+ messages in thread From: Pádraig Brady @ 2009-07-29 14:14 UTC (permalink / raw) To: Chris Mason, Giuseppe Scrivano, Jim Meyering, bug-coreutils, linux-btrfs Chris Mason wrote: > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: >> >> I can't replicate it now, all tests I am doing report that blocks us= ed >> before and after the clone are the same. Probably yesterday the >> difference I noticed was in reality the original file flushed to the >> disk. >=20 > The clone will use some additional space for the metadata required to > point to the cloned blocks. It isn't exactly O(1) it is O(metadata f= or > the file). Thanks for the clarification Chris. So the just committed change in cp will link the destination file to the extents of the source. We may need to play around with fallocate() if we want to get back to the original cp semantics of actually allocating space on the file system for the new file. I'll test this when I get an up to date btrfs and when the fallocate interface in glibc settles down. cheers, P=E1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-29 14:14 ` Pádraig Brady @ 2009-07-29 16:10 ` Chris Mason 2009-07-29 16:18 ` Chris Mason 2009-07-29 18:14 ` Pádraig Brady 0 siblings, 2 replies; 21+ messages in thread From: Chris Mason @ 2009-07-29 16:10 UTC (permalink / raw) To: Pádraig Brady Cc: Giuseppe Scrivano, Jim Meyering, bug-coreutils, linux-btrfs On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote: > Chris Mason wrote: > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote: > >> > >> I can't replicate it now, all tests I am doing report that blocks = used > >> before and after the clone are the same. Probably yesterday the > >> difference I noticed was in reality the original file flushed to t= he > >> disk. > >=20 > > The clone will use some additional space for the metadata required = to > > point to the cloned blocks. It isn't exactly O(1) it is O(metadata= for > > the file). >=20 > Thanks for the clarification Chris. > So the just committed change in cp will > link the destination file to the extents of the source. >=20 > We may need to play around with fallocate() > if we want to get back to the original > cp semantics of actually allocating space > on the file system for the new file. Well, best to just use the original cp code. I was talking with Giuseppe about this as well, I think we should the option to do regular cp via a flag. There will soon be a reflink system call that can be used on ocfs2 and btrfs as well. Thanks for adding this to glibc! -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-29 16:10 ` Chris Mason @ 2009-07-29 16:18 ` Chris Mason 2009-07-29 18:14 ` Pádraig Brady 1 sibling, 0 replies; 21+ messages in thread From: Chris Mason @ 2009-07-29 16:18 UTC (permalink / raw) To: Pádraig Brady, Giuseppe Scrivano, Jim Meyering, bug-coreutils, linux-btrfs On Wed, Jul 29, 2009 at 12:10:14PM -0400, Chris Mason wrote: > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote: > > Chris Mason wrote: > > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote= : > > >> > > >> I can't replicate it now, all tests I am doing report that block= s used > > >> before and after the clone are the same. Probably yesterday the > > >> difference I noticed was in reality the original file flushed to= the > > >> disk. > > >=20 > > > The clone will use some additional space for the metadata require= d to > > > point to the cloned blocks. It isn't exactly O(1) it is O(metada= ta for > > > the file). > >=20 > > Thanks for the clarification Chris. > > So the just committed change in cp will > > link the destination file to the extents of the source. > >=20 > > We may need to play around with fallocate() > > if we want to get back to the original > > cp semantics of actually allocating space > > on the file system for the new file. >=20 > Well, best to just use the original cp code. I was talking with > Giuseppe about this as well, I think we should the option to do regul= ar > cp via a flag. >=20 > There will soon be a reflink system call that can be used on ocfs2 an= d > btrfs as well. Thanks for adding this to glibc! Um, cp, not glibc, sorry ;) -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-29 16:10 ` Chris Mason 2009-07-29 16:18 ` Chris Mason @ 2009-07-29 18:14 ` Pádraig Brady 2009-07-30 0:57 ` Joel Becker 1 sibling, 1 reply; 21+ messages in thread From: Pádraig Brady @ 2009-07-29 18:14 UTC (permalink / raw) To: Chris Mason, Pádraig Brady, Giuseppe Scrivano, Jim Meyering Chris Mason wrote: > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote: >> >> We may need to play around with fallocate() >> if we want to get back to the original >> cp semantics of actually allocating space >> on the file system for the new file. >=20 > Well, best to just use the original cp code. I was talking with > Giuseppe about this as well, I think we should the option to do regul= ar > cp via a flag. Right. Well we can turn off this cloning by doing --sparse=3D{never,alw= ays} but that has side effects. If we need an option then maybe we should ha= ve it turn on cloning rather than restore default cp behaviour? The side effects I thought of earlier, of COW without corresponding all= ocation were possible fragmentation on write or unexpected/mishandled ENOSPC. Also for endangered mechanical disks, subsequent processing could be slowed as the head seeks between the old and new data to be copied. Perhaps these are a small price to pay, especially considering that solid state disks will only be affected by the write()=3DENOSPC issue. At the moment we have these linking options: cp -l, --link #for hardlinks cp -s, --symbolic-link #for symlinks So perhaps we should support: cp --link=3D{soft,hard,cow} for symlink(), link() and reflink() respectively? I.E. link to the name, inode or extents respectively. > There will soon be a reflink system call that can be used on ocfs2 an= d > btrfs as well. Thanks for adding this to glibc! I was thinking there would be a generic syscall for this. So cp should call reflink() instead when it becomes available. thanks for the info! P=E1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-29 18:14 ` Pádraig Brady @ 2009-07-30 0:57 ` Joel Becker 2009-07-30 7:39 ` Jim Meyering 0 siblings, 1 reply; 21+ messages in thread From: Joel Becker @ 2009-07-30 0:57 UTC (permalink / raw) To: Pádraig Brady Cc: Chris Mason, Giuseppe Scrivano, Jim Meyering, bug-coreutils, linux-btrfs On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=E1draig Brady wrote: > Chris Mason wrote: > > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote: > >> > >> We may need to play around with fallocate() > >> if we want to get back to the original > >> cp semantics of actually allocating space > >> on the file system for the new file. > > > > Well, best to just use the original cp code. I was talking with > > Giuseppe about this as well, I think we should the option to do reg= ular > > cp via a flag. >=20 > Right. Well we can turn off this cloning by doing --sparse=3D{never,a= lways} > but that has side effects. If we need an option then maybe we should = have > it turn on cloning rather than restore default cp behaviour? > The side effects I thought of earlier, of COW without corresponding a= llocation > were possible fragmentation on write or unexpected/mishandled ENOSPC. > Also for endangered mechanical disks, subsequent processing could > be slowed as the head seeks between the old and new data to be copied= =2E > Perhaps these are a small price to pay, especially considering that > solid state disks will only be affected by the write()=3DENOSPC issue= =2E >=20 > At the moment we have these linking options: >=20 > cp -l, --link #for hardlinks > cp -s, --symbolic-link #for symlinks >=20 > So perhaps we should support: >=20 > cp --link=3D{soft,hard,cow} > for symlink(), link() and reflink() respectively? > I.E. link to the name, inode or extents respectively. I've cooked up 'ln -r' for reflinks, which works for ln(1) but not for cp(1). I have a git tree with the (in-flux) code on oss.oracle.com: [View] http://oss.oracle.com/git/?p=3Djlbec/reflink.git;a=3Dsummary [Pull] git://oss.oracle.com/git/jlbec/reflink.git master This repository isn't designed to be an authorative patch for coreutils. Instead it provides a reflink(1) program that is actually l= n -r in disguise. Later work would be to get coreutils updated "properly". Joel --=20 "This is the end, beautiful friend. This is the end, my only friend the end Of our elaborate plans, the end Of everything that stands, the end No safety or surprise, the end I'll never look into your eyes again." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 0:57 ` Joel Becker @ 2009-07-30 7:39 ` Jim Meyering 2009-07-30 8:21 ` Joel Becker ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Jim Meyering @ 2009-07-30 7:39 UTC (permalink / raw) To: Pádraig Brady Cc: linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason Joel Becker wrote: > On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote: >> Chris Mason wrote: >> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=C3=A1draig Brady wrote: >> >> >> >> We may need to play around with fallocate() >> >> if we want to get back to the original >> >> cp semantics of actually allocating space >> >> on the file system for the new file. >> > >> > Well, best to just use the original cp code. I was talking with >> > Giuseppe about this as well, I think we should the option to do regular >> > cp via a flag. >> >> Right. Well we can turn off this cloning by doing --sparse=3D{never,alwa= ys} >> but that has side effects. If we need an option then maybe we should have >> it turn on cloning rather than restore default cp behaviour? >> The side effects I thought of earlier, of COW without corresponding allo= cation >> were possible fragmentation on write or unexpected/mishandled ENOSPC. >> Also for endangered mechanical disks, subsequent processing could >> be slowed as the head seeks between the old and new data to be copied. >> Perhaps these are a small price to pay, especially considering that >> solid state disks will only be affected by the write()=3DENOSPC issue. >> >> At the moment we have these linking options: >> >> cp -l, --link #for hardlinks >> cp -s, --symbolic-link #for symlinks >> >> So perhaps we should support: >> >> cp --link=3D{soft,hard,cow} >> for symlink(), link() and reflink() respectively? >> I.E. link to the name, inode or extents respectively. > > I've cooked up 'ln -r' for reflinks, which works for ln(1) but > not for cp(1). Thanks. I haven't looked, but after reading about the reflink syscall [http://lwn.net/Articles/332802/] had come to the same conclusion: this feature belongs with ln rather than with cp. Besides, putting the new behavior on a new option avoids the current semantic change we would otherwise induce in cp. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 7:39 ` Jim Meyering @ 2009-07-30 8:21 ` Joel Becker 2009-07-30 8:40 ` Pádraig Brady 2009-07-30 9:26 ` Andi Kleen 2 siblings, 0 replies; 21+ messages in thread From: Joel Becker @ 2009-07-30 8:21 UTC (permalink / raw) To: Jim Meyering Cc: bug-coreutils, linux-btrfs, Pádraig Brady, Giuseppe Scrivano, Chris Mason On Thu, Jul 30, 2009 at 09:39:17AM +0200, Jim Meyering wrote: > Joel Becker wrote: > > I've cooked up 'ln -r' for reflinks, which works for ln(1) but > > not for cp(1). > > Thanks. I haven't looked, but after reading about the reflink syscall > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp. > > Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp. Well, I don't see any reason cp(1) can't take advantage of reflink(2). I just think that cp(1) should look at reflink(2) as an optimization, not a specific methodology. What do I mean? If you want to say "I know what a reflink is, and that's exactly what I want", you want "ln -r". But say you want a "cp --snap" that tries to take a snapshot regardless of the backend. It could use reflink(2) on filesystems that support it, or perhaps a passthrough call to the underlying storage, or who knows what. I can also imagine a "cp --shallow" that is "if you can cow, do it, otherwise do a normal cp". Joel -- "I think it would be a good idea." - Mahatma Ghandi, when asked what he thought of Western civilization Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 7:39 ` Jim Meyering 2009-07-30 8:21 ` Joel Becker @ 2009-07-30 8:40 ` Pádraig Brady 2009-07-30 16:28 ` Ric Wheeler 2009-07-30 9:26 ` Andi Kleen 2 siblings, 1 reply; 21+ messages in thread From: Pádraig Brady @ 2009-07-30 8:40 UTC (permalink / raw) To: Jim Meyering; +Cc: Chris Mason, Giuseppe Scrivano, bug-coreutils, linux-btrfs Jim Meyering wrote: > Joel Becker wrote: >=20 >> On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote: >>> >>> At the moment we have these linking options: >>> >>> cp -l, --link #for hardlinks >>> cp -s, --symbolic-link #for symlinks >>> >>> So perhaps we should support: >>> >>> cp --link=3D{soft,hard,cow} >>> for symlink(), link() and reflink() respectively? >>> I.E. link to the name, inode or extents respectively. >> >> I've cooked up 'ln -r' for reflinks, which works for ln(1) but >> not for cp(1). >=20 > Thanks. I haven't looked, but after reading about the reflink syscal= l > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp. Right. It definitely should be in ln anyway. > Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp. Yes doing reflink() in cp by default currently can be problematic as discussed, especially on mechanical hard disks. Though in future I can see most users of cp preferring reflink() to be done, rather than read()/write(). Ponder... In any case putting --link=3Dcow or --reflink or whatever in cp could be very useful for creating writeable snapshot branches. cheers, P=C3=A1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 8:40 ` Pádraig Brady @ 2009-07-30 16:28 ` Ric Wheeler 2009-07-30 16:48 ` Jim Meyering 0 siblings, 1 reply; 21+ messages in thread From: Ric Wheeler @ 2009-07-30 16:28 UTC (permalink / raw) To: Pádraig Brady Cc: Jim Meyering, Chris Mason, Giuseppe Scrivano, bug-coreutils, linux-btrfs On 07/30/2009 04:40 AM, P=C3=A1draig Brady wrote: > Jim Meyering wrote: > =20 >> Joel Becker wrote: >> >> =20 >>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote: >>> =20 >>>> At the moment we have these linking options: >>>> >>>> cp -l, --link #for hardlinks >>>> cp -s, --symbolic-link #for symlinks >>>> >>>> So perhaps we should support: >>>> >>>> cp --link=3D{soft,hard,cow} >>>> for symlink(), link() and reflink() respectively? >>>> I.E. link to the name, inode or extents respectively. >>>> =20 >>> I've cooked up 'ln -r' for reflinks, which works for ln(1) but >>> not for cp(1). >>> =20 >> Thanks. I haven't looked, but after reading about the reflink sysca= ll >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. >> =20 > > Right. It definitely should be in ln anyway. > > =20 >> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. >> =20 > > Yes doing reflink() in cp by default currently can > be problematic as discussed, especially on mechanical hard disks. > Though in future I can see most users of cp preferring > reflink() to be done, rather than read()/write(). Ponder... > > =20 I think that doing reflink by default would be a horrible idea - one=20 good reason to copy a file is to increase your level of fault tolerance= =20 and reflink magically avoids that :-) reflink is a neat feature, but should be used on purpose in my opinion, ric > In any case putting --link=3Dcow or --reflink or whatever in cp > could be very useful for creating writeable snapshot branches. > > cheers, > P=C3=A1draig. > =20 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 16:28 ` Ric Wheeler @ 2009-07-30 16:48 ` Jim Meyering 0 siblings, 0 replies; 21+ messages in thread From: Jim Meyering @ 2009-07-30 16:48 UTC (permalink / raw) To: Ric Wheeler Cc: Pádraig Brady, Chris Mason, Giuseppe Scrivano, bug-coreutils, linux-btrfs Ric Wheeler wrote: > I think that doing reflink by default would be a horrible idea - one > good reason to copy a file is to increase your level of fault > tolerance and reflink magically avoids that :-) Good point. This would constitute another user-visible semantic change in cp: a disk fault that affects any non-metadata block of a ref-linked file affects both copies. GNU cp will soon attempt this only when a --reflink option is specified. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 7:39 ` Jim Meyering 2009-07-30 8:21 ` Joel Becker 2009-07-30 8:40 ` Pádraig Brady @ 2009-07-30 9:26 ` Andi Kleen 2009-07-30 10:02 ` Pádraig Brady 2009-07-30 10:16 ` Jim Meyering 2 siblings, 2 replies; 21+ messages in thread From: Andi Kleen @ 2009-07-30 9:26 UTC (permalink / raw) To: Jim Meyering Cc: Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason Jim Meyering <jim@meyering.net> writes: > > Thanks. I haven't looked, but after reading about the reflink syscall > [http://lwn.net/Articles/332802/] had come to the same conclusion: > this feature belongs with ln rather than with cp. cp already has -l so it would make sense to extend that too. > Besides, putting the new behavior on a new option avoids > the current semantic change we would otherwise induce in cp. I don't see how semantics change in a user visible way. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 9:26 ` Andi Kleen @ 2009-07-30 10:02 ` Pádraig Brady 2009-07-30 10:16 ` Jim Meyering 1 sibling, 0 replies; 21+ messages in thread From: Pádraig Brady @ 2009-07-30 10:02 UTC (permalink / raw) To: Andi Kleen Cc: Jim Meyering, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason Andi Kleen wrote: > Jim Meyering <jim@meyering.net> writes: >> Thanks. I haven't looked, but after reading about the reflink sysca= ll >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. >=20 > cp already has -l so it would make sense to extend that too. >=20 >> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. >=20 > I don't see how semantics change in a user visible way. I was thinking that doing reflink() in cp has the following user visible advantages/disadvantages: Advantages: very quick copy less space used Disadvantages: disk head seeking deferred to modification process possible fragmentation on write possible ENOSPC on write The disk head seeking issue will go away with time. I'm not sure if the other disadvantages exist or whether they could be alleviated with fallocate() or something. cheers, P=E1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 9:26 ` Andi Kleen 2009-07-30 10:02 ` Pádraig Brady @ 2009-07-30 10:16 ` Jim Meyering 2009-07-30 10:21 ` Tomasz Chmielewski 2009-07-30 10:54 ` Andi Kleen 1 sibling, 2 replies; 21+ messages in thread From: Jim Meyering @ 2009-07-30 10:16 UTC (permalink / raw) To: Andi Kleen Cc: Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason Andi Kleen wrote: > Jim Meyering <jim@meyering.net> writes: >> >> Thanks. I haven't looked, but after reading about the reflink syscall >> [http://lwn.net/Articles/332802/] had come to the same conclusion: >> this feature belongs with ln rather than with cp. > > cp already has -l so it would make sense to extend that too. Good point. >> Besides, putting the new behavior on a new option avoids >> the current semantic change we would otherwise induce in cp. > > I don't see how semantics change in a user visible way. With classic cp, if I copy a 1GB non-sparse file and there's less space than that available, cp fails with ENOSPC. With this new feature, it succeeds even if there are just a few blocks available. Also, consider (buggy!) code that then depends on being able to modify that file in-place, and that "knows" it doesn't need to check for ENOSPC. Sure, they should always check for write failure, but still. It is a change. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 10:16 ` Jim Meyering @ 2009-07-30 10:21 ` Tomasz Chmielewski 2009-07-30 10:54 ` Andi Kleen 1 sibling, 0 replies; 21+ messages in thread From: Tomasz Chmielewski @ 2009-07-30 10:21 UTC (permalink / raw) To: Jim Meyering Cc: Andi Kleen, Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason Jim Meyering wrote: > With classic cp, if I copy a 1GB non-sparse file and there's less > space than that available, cp fails with ENOSPC. > With this new feature, it succeeds even if there are > just a few blocks available. Is it good or bad? > Also, consider (buggy!) code that then depends on being able to modify > that file in-place, and that "knows" it doesn't need to check for ENOSPC. > Sure, they should always check for write failure, but still. It is > a change. On a multiuser system, that (buggy) tool would fail anyway if something else adds enough new data to the filesystem in the meantime. But sure, it's a change. -- Tomasz Chmielewski http://wpkg.org ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 10:16 ` Jim Meyering 2009-07-30 10:21 ` Tomasz Chmielewski @ 2009-07-30 10:54 ` Andi Kleen 2009-07-30 18:05 ` Joel Becker 1 sibling, 1 reply; 21+ messages in thread From: Andi Kleen @ 2009-07-30 10:54 UTC (permalink / raw) To: Jim Meyering Cc: Andi Kleen, Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason > > With classic cp, if I copy a 1GB non-sparse file and there's less > space than that available, cp fails with ENOSPC. > With this new feature, it succeeds even if there are > just a few blocks available. > > Also, consider (buggy!) code that then depends on being able to modify > that file in-place, and that "knows" it doesn't need to check for ENOSPC. > Sure, they should always check for write failure, but still. It is > a change. Fair point, although I suspect there are cases where ENOSPC on non extending write can already happen on specific file systems. e.g. on btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2 when the garbage collector doesn't run in time. Wouldn't surprise me if there weren't more cases already. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 10:54 ` Andi Kleen @ 2009-07-30 18:05 ` Joel Becker 2009-07-30 23:28 ` Pádraig Brady 0 siblings, 1 reply; 21+ messages in thread From: Joel Becker @ 2009-07-30 18:05 UTC (permalink / raw) To: Andi Kleen Cc: Jim Meyering, Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason On Thu, Jul 30, 2009 at 12:54:16PM +0200, Andi Kleen wrote: > > With classic cp, if I copy a 1GB non-sparse file and there's less > > space than that available, cp fails with ENOSPC. > > With this new feature, it succeeds even if there are > > just a few blocks available. > > > > Also, consider (buggy!) code that then depends on being able to modify > > that file in-place, and that "knows" it doesn't need to check for ENOSPC. > > Sure, they should always check for write failure, but still. It is > > a change. > > Fair point, although I suspect there are cases where ENOSPC > on non extending write can already happen on specific file systems. e.g. on > btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2 > when the garbage collector doesn't run in time. Wouldn't surprise > me if there weren't more cases already. In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees enabled, or any other CoW-ish filesystem is a tacit approval of the delayed ENOSPC. The same can be said of "thin provisioning" LUNs. However, the other concerns are still valid. A user invoking vanilla cp(1) expects two independent storage regions for the data. (Oh, and what about future support of de-duping in filesystems? :-) Joel -- "Anything that is too stupid to be spoken is sung." - Voltaire Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: BTRFS file clone support for cp 2009-07-30 18:05 ` Joel Becker @ 2009-07-30 23:28 ` Pádraig Brady 0 siblings, 0 replies; 21+ messages in thread From: Pádraig Brady @ 2009-07-30 23:28 UTC (permalink / raw) To: Andi Kleen, Jim Meyering, linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason <chris.mason@ Joel Becker wrote: > In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees > enabled, or any other CoW-ish filesystem is a tacit approval of the > delayed ENOSPC. The same can be said of "thin provisioning" LUNs. > However, the other concerns are still valid. A user invoking vanilla > cp(1) expects two independent storage regions for the data. > (Oh, and what about future support of de-duping in filesystems? > :-) I maintain an app to de-dupe at http://www.pixelbeat.org/fslint/ and I'll be adding reflink support as soon as it becomes available. =46rom a filesystem point of view, one thing that would help speed this up (and many other things like rsync etc.) would be to allow one to associate say a sha-3 hash or whatever with the file, which the filesystem would automatically clear when the file data changes. So in general having a special set of extended attributes that were auto cleared on file modification would be very useful for lots of stuff. cheers, P=E1draig. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <87k51r9sxh.fsf@master.homenet>]
[parent not found: <4A70C4E0.9030104@draigBrady.com>]
* Re: BTRFS file clone support for cp [not found] ` <4A70C4E0.9030104@draigBrady.com> @ 2009-07-30 17:28 ` Giuseppe Scrivano 0 siblings, 0 replies; 21+ messages in thread From: Giuseppe Scrivano @ 2009-07-30 17:28 UTC (permalink / raw) To: Pádraig Brady; +Cc: bug-coreutils, linux-btrfs Hi P=C3=A1draig, thanks for the comments. P=C3=A1draig Brady <P@draigBrady.com> writes: > # 300MB seems to be the minimum size for a btrfs with default > parameters. Actually, it seems the minimum space required is 256MB. Using a 255MB image I get: "device btrfs.img is too small (must be at least 256 MB)" > # FIXME: use `truncate --allocate` when it becomes available, which > # may allow unmarking this as an expensive test. Are you sure that this feature will make the test less expensive? Stil= l the test files must be written there, so in the best case (considering the fallocate done in 0s) only the dd cost will be saved but still it looks like an expensive test. In the version I attached, I am using a sparse file (truncate --size) and it seems to work fine. Is it correct or am I missing something? I haven't looked yet but probably there are other tests that can take advantage of sparse files instead of using "dd". I am also considering the Jim's note doing the umount in the cleanup_ function. Cheers, Giuseppe =46rom 7add4b337b7db0a63bca0dd0fe0f146f175163f8 Mon Sep 17 00:00:00 200= 1 =46rom: Giuseppe Scrivano <gscrivano@gnu.org> Date: Wed, 29 Jul 2009 20:31:20 +0200 Subject: [PATCH] tests: add a test for btrfs' copy-on-write file clone = operation * tests/Makefile.am: Consider the new test. * tests/cp/file-clone: New file. --- tests/Makefile.am | 1 + tests/cp/file-clone | 58 +++++++++++++++++++++++++++++++++++++++++++= ++++++++ 2 files changed, 59 insertions(+), 0 deletions(-) create mode 100755 tests/cp/file-clone diff --git a/tests/Makefile.am b/tests/Makefile.am index 59737a0..9841aa3 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -20,6 +20,7 @@ EXTRA_DIST =3D \ =20 root_tests =3D \ chown/basic \ + cp/file-clone \ cp/cp-a-selinux \ cp/preserve-gid \ cp/special-bits \ diff --git a/tests/cp/file-clone b/tests/cp/file-clone new file mode 100755 index 0000000..c65b9cb --- /dev/null +++ b/tests/cp/file-clone @@ -0,0 +1,58 @@ +#!/bin/sh +# Make sure file-clone on a btrfs file system works properly. + +# Copyright (C) 2009 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>= =2E + + +if test "$VERBOSE" =3D yes; then + set -x + cp --version +fi + +. $srcdir/test-lib.sh + +require_root_ +require_sparse_support_ +#expensive_ + +cleanup_(){ umount btrfs; } + +fail=3D0 + +mkfs.btrfs --version || skip_test_ "btrfs userland tools not installed= " + +# 256MB seems to be the minimum size for a btrfs with default paramete= rs. +truncate --size=3D256M btrfs.img || framework_failure + +mkfs.btrfs btrfs.img || framework_failure + +mkdir btrfs || framework_failure + +mount -t btrfs -o loop btrfs.img btrfs || framework_failure + +dd bs=3D1M count=3D200 if=3D/dev/zero of=3Dbtrfs/alloc.test || framewo= rk_failure + +# If the file is cloned, only additional space for metadata is require= d. +# Two 200MB files can be present even if the total file system space i= s 256MB. +cp btrfs/alloc.test btrfs/clone.test || fail=3D1 +rm btrfs/clone.test + +# When --sparse=3D{always,never} is used, the file is copied without a= ny cloning. +# Use --sparse=3Dnever to be sure the file is copied without holes and= it is not +# possible since there is not enough free space. +cp --sparse=3Dnever btrfs/alloc.test btrfs/clone.test && fail=3D1 + +Exit $fail --=20 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" = in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-07-30 23:28 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <87d47o3fip.fsf@master.homenet>
[not found] ` <4A6CEA48.5050208@draigBrady.com>
[not found] ` <8763defuvq.fsf@meyering.net>
[not found] ` <87ws5tvrq8.fsf@master.homenet>
2009-07-27 23:40 ` BTRFS file clone support for cp Pádraig Brady
2009-07-28 20:06 ` Giuseppe Scrivano
2009-07-29 13:01 ` Chris Mason
2009-07-29 14:14 ` Pádraig Brady
2009-07-29 16:10 ` Chris Mason
2009-07-29 16:18 ` Chris Mason
2009-07-29 18:14 ` Pádraig Brady
2009-07-30 0:57 ` Joel Becker
2009-07-30 7:39 ` Jim Meyering
2009-07-30 8:21 ` Joel Becker
2009-07-30 8:40 ` Pádraig Brady
2009-07-30 16:28 ` Ric Wheeler
2009-07-30 16:48 ` Jim Meyering
2009-07-30 9:26 ` Andi Kleen
2009-07-30 10:02 ` Pádraig Brady
2009-07-30 10:16 ` Jim Meyering
2009-07-30 10:21 ` Tomasz Chmielewski
2009-07-30 10:54 ` Andi Kleen
2009-07-30 18:05 ` Joel Becker
2009-07-30 23:28 ` Pádraig Brady
[not found] ` <87k51r9sxh.fsf@master.homenet>
[not found] ` <4A70C4E0.9030104@draigBrady.com>
2009-07-30 17:28 ` Giuseppe Scrivano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox