public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: BTRFS file clone support for cp
       [not found]     ` <87ws5tvrq8.fsf@master.homenet>
@ 2009-07-27 23:40       ` Pádraig Brady
  2009-07-28 20:06         ` Giuseppe Scrivano
       [not found]         ` <87k51r9sxh.fsf@master.homenet>
  0 siblings, 2 replies; 21+ messages in thread
From: Pádraig Brady @ 2009-07-27 23:40 UTC (permalink / raw)
  To: Giuseppe Scrivano; +Cc: bug-coreutils, Jim Meyering, linux-btrfs

Giuseppe Scrivano wrote:
> Jim Meyering <jim@meyering.net> writes:
>=20
>>> Another possible issue with this I can think of is
>>> depending on the modification pattern of the COW files,
>>> the modification processes could fragment the file or
>>> more seriously be given ENOSPC errors.
>> I hope btrfs takes care of this behind the scene.
>>
>> How does the clone work wrt to space consumed, a la df?
>> If copying a 1GB file this way does not update usage
>> stats to reflect the additional 1GB of space used, ...
>=20
> I tried to clone a big file and df reported a different "used blocks"
> stat that it was before the clone operation.

How different exactly?
OK I tried this myself on F11 with inconclusive results.

$ uname -r
2.6.29.6-213.fc11.i586
$ sudo yum install btrfs-progs
# dd bs=3D1M count=3D300 if=3D/dev/zero of=3D/btrfs.img #min size?
# mkfs.btrfs /btrfs.img
# mkdir /btrfs
# mount -o loop /btrfs.img /btrfs
# cd /btrfs
# dd bs=3D1M count=3D100 if=3D/dev/zero of=3Dalloc.test
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M   28K  300M   1% /btrfs
# df -h . #only allocated about 30s later
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  101M  200M  34% /btrfs
# /home/padraig/clone_file alloc.test alloc.test.clone
# umount /btrfs
# mount -o loop /btrfs.img /btrfs
# cd btrfs
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  101M  200M  34% /btrfs

OK the above suggests that the clone doesn't take
any space as I would expect. Then it starts getting confusing...

# du -h *
100M    alloc.test
244M    alloc.test.clone #wha?
# dd bs=3D1M count=3D200 if=3D/dev/zero of=3Duse.space
dd: writing `use.space': No space left on device
101+0 records in
100+0 records out
# ls -l
total 454656
-rw-r--r-- 1 root root 104857600 2009-07-28 00:06 alloc.test
-rw-r--r-- 1 root root 104857600 2009-07-28 00:07 alloc.test.clone
-rw-r--r-- 1 root root 104857600 2009-07-28 00:18 use.space
# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            300M  184M  117M  62% /btrfs

The above suggests that the clone does actually allocate space
but btrfs isn't reporting it through statvfs correctly?
If the clone does allocate space, then how can one
clone without allocation which could be very useful
for snapshotting for example?

Also I tried the above twice and both times got:
http://www.kerneloops.org/submitresult.php?number=3D578993

cheers,
P=E1draig.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-27 23:40       ` BTRFS file clone support for cp Pádraig Brady
@ 2009-07-28 20:06         ` Giuseppe Scrivano
  2009-07-29 13:01           ` Chris Mason
       [not found]         ` <87k51r9sxh.fsf@master.homenet>
  1 sibling, 1 reply; 21+ messages in thread
From: Giuseppe Scrivano @ 2009-07-28 20:06 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: bug-coreutils, Jim Meyering, linux-btrfs

Hi P=C3=A1draig,


P=C3=A1draig Brady <P@draigBrady.com> writes:

> How different exactly?
> OK I tried this myself on F11 with inconclusive results.

I can't replicate it now, all tests I am doing report that blocks used
before and after the clone are the same.  Probably yesterday the
difference I noticed was in reality the original file flushed to the
disk.


> The above suggests that the clone does actually allocate space
> but btrfs isn't reporting it through statvfs correctly?

The same message appeared here too some days ago, though I cloned only
few Kb files, not much to fill the entire partition.


> If the clone does allocate space, then how can one
> clone without allocation which could be very useful
> for snapshotting for example?

I don't know if snapshotting is handled in the same way as a "clone",
but in this case it seems more obvious to me that no additional space
should be reported.


> Also I tried the above twice and both times got:
> http://www.kerneloops.org/submitresult.php?number=3D578993

I didn't get these errors.  I am using the btrfs git version.


Regards,
Giuseppe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-28 20:06         ` Giuseppe Scrivano
@ 2009-07-29 13:01           ` Chris Mason
  2009-07-29 14:14             ` Pádraig Brady
  0 siblings, 1 reply; 21+ messages in thread
From: Chris Mason @ 2009-07-29 13:01 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: Pádraig Brady, Jim Meyering, bug-coreutils, linux-btrfs

On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:
> Hi P=E1draig,
>=20
>=20
> P=E1draig Brady <P@draigBrady.com> writes:
>=20
> > How different exactly?
> > OK I tried this myself on F11 with inconclusive results.
>=20
> I can't replicate it now, all tests I am doing report that blocks use=
d
> before and after the clone are the same.  Probably yesterday the
> difference I noticed was in reality the original file flushed to the
> disk.

The clone will use some additional space for the metadata required to
point to the cloned blocks.  It isn't exactly O(1) it is O(metadata for
the file).

>=20
>=20
> > The above suggests that the clone does actually allocate space
> > but btrfs isn't reporting it through statvfs correctly?
>=20
> The same message appeared here too some days ago, though I cloned onl=
y
> few Kb files, not much to fill the entire partition.
>=20
>=20
> > If the clone does allocate space, then how can one
> > clone without allocation which could be very useful
> > for snapshotting for example?
>=20
> I don't know if snapshotting is handled in the same way as a "clone",
> but in this case it seems more obvious to me that no additional space
> should be reported.

The COW for snapshotting and a clone are the same, but the way we get
there is a little different.  For a snapshot, we have two btree roots
pointing to the same nodes, and we've incremented the reference count o=
n
each of the nodes they both point to.  No matter how big the subvolume
is, this will always be O(number of pointers in the root block).

Cloning a file is done by walking the file metadata and taking a
reference on each extent pointed to by the file.  The file data is neve=
r
read in, but all of the file metadata is read in.

>=20
>=20
> > Also I tried the above twice and both times got:
> > http://www.kerneloops.org/submitresult.php?number=3D578993
>=20
> I didn't get these errors.  I am using the btrfs git version.

These have been fixed.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-29 13:01           ` Chris Mason
@ 2009-07-29 14:14             ` Pádraig Brady
  2009-07-29 16:10               ` Chris Mason
  0 siblings, 1 reply; 21+ messages in thread
From: Pádraig Brady @ 2009-07-29 14:14 UTC (permalink / raw)
  To: Chris Mason, Giuseppe Scrivano, Jim Meyering, bug-coreutils,
	linux-btrfs

Chris Mason wrote:
> On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:
>>
>> I can't replicate it now, all tests I am doing report that blocks us=
ed
>> before and after the clone are the same.  Probably yesterday the
>> difference I noticed was in reality the original file flushed to the
>> disk.
>=20
> The clone will use some additional space for the metadata required to
> point to the cloned blocks.  It isn't exactly O(1) it is O(metadata f=
or
> the file).

Thanks for the clarification Chris.
So the just committed change in cp will
link the destination file to the extents of the source.

We may need to play around with fallocate()
if we want to get back to the original
cp semantics of actually allocating space
on the file system for the new file.

I'll test this when I get an up to date btrfs
and when the fallocate interface in glibc settles down.

cheers,
P=E1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-29 14:14             ` Pádraig Brady
@ 2009-07-29 16:10               ` Chris Mason
  2009-07-29 16:18                 ` Chris Mason
  2009-07-29 18:14                 ` Pádraig Brady
  0 siblings, 2 replies; 21+ messages in thread
From: Chris Mason @ 2009-07-29 16:10 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Giuseppe Scrivano, Jim Meyering, bug-coreutils, linux-btrfs

On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote:
> Chris Mason wrote:
> > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote:
> >>
> >> I can't replicate it now, all tests I am doing report that blocks =
used
> >> before and after the clone are the same.  Probably yesterday the
> >> difference I noticed was in reality the original file flushed to t=
he
> >> disk.
> >=20
> > The clone will use some additional space for the metadata required =
to
> > point to the cloned blocks.  It isn't exactly O(1) it is O(metadata=
 for
> > the file).
>=20
> Thanks for the clarification Chris.
> So the just committed change in cp will
> link the destination file to the extents of the source.
>=20
> We may need to play around with fallocate()
> if we want to get back to the original
> cp semantics of actually allocating space
> on the file system for the new file.

Well, best to just use the original cp code.  I was talking with
Giuseppe about this as well, I think we should the option to do regular
cp via a flag.

There will soon be a reflink system call that can be used on ocfs2 and
btrfs as well.  Thanks for adding this to glibc!

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-29 16:10               ` Chris Mason
@ 2009-07-29 16:18                 ` Chris Mason
  2009-07-29 18:14                 ` Pádraig Brady
  1 sibling, 0 replies; 21+ messages in thread
From: Chris Mason @ 2009-07-29 16:18 UTC (permalink / raw)
  To: Pádraig Brady, Giuseppe Scrivano, Jim Meyering,
	bug-coreutils, linux-btrfs

On Wed, Jul 29, 2009 at 12:10:14PM -0400, Chris Mason wrote:
> On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote:
> > Chris Mason wrote:
> > > On Tue, Jul 28, 2009 at 10:06:35PM +0200, Giuseppe Scrivano wrote=
:
> > >>
> > >> I can't replicate it now, all tests I am doing report that block=
s used
> > >> before and after the clone are the same.  Probably yesterday the
> > >> difference I noticed was in reality the original file flushed to=
 the
> > >> disk.
> > >=20
> > > The clone will use some additional space for the metadata require=
d to
> > > point to the cloned blocks.  It isn't exactly O(1) it is O(metada=
ta for
> > > the file).
> >=20
> > Thanks for the clarification Chris.
> > So the just committed change in cp will
> > link the destination file to the extents of the source.
> >=20
> > We may need to play around with fallocate()
> > if we want to get back to the original
> > cp semantics of actually allocating space
> > on the file system for the new file.
>=20
> Well, best to just use the original cp code.  I was talking with
> Giuseppe about this as well, I think we should the option to do regul=
ar
> cp via a flag.
>=20
> There will soon be a reflink system call that can be used on ocfs2 an=
d
> btrfs as well.  Thanks for adding this to glibc!

Um, cp, not glibc, sorry ;)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-29 16:10               ` Chris Mason
  2009-07-29 16:18                 ` Chris Mason
@ 2009-07-29 18:14                 ` Pádraig Brady
  2009-07-30  0:57                   ` Joel Becker
  1 sibling, 1 reply; 21+ messages in thread
From: Pádraig Brady @ 2009-07-29 18:14 UTC (permalink / raw)
  To: Chris Mason, Pádraig Brady, Giuseppe Scrivano, Jim Meyering

Chris Mason wrote:
> On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote:
>>
>> We may need to play around with fallocate()
>> if we want to get back to the original
>> cp semantics of actually allocating space
>> on the file system for the new file.
>=20
> Well, best to just use the original cp code.  I was talking with
> Giuseppe about this as well, I think we should the option to do regul=
ar
> cp via a flag.

Right. Well we can turn off this cloning by doing --sparse=3D{never,alw=
ays}
but that has side effects. If we need an option then maybe we should ha=
ve
it turn on cloning rather than restore default cp behaviour?
The side effects I thought of earlier, of COW without corresponding all=
ocation
were possible fragmentation on write or unexpected/mishandled ENOSPC.
Also for endangered mechanical disks, subsequent processing could
be slowed as the head seeks between the old and new data to be copied.
Perhaps these are a small price to pay, especially considering that
solid state disks will only be affected by the write()=3DENOSPC issue.

At the moment we have these linking options:

cp -l, --link #for hardlinks
cp -s, --symbolic-link #for symlinks

So perhaps we should support:

cp --link=3D{soft,hard,cow}
for symlink(), link() and reflink() respectively?
I.E. link to the name, inode or extents respectively.

> There will soon be a reflink system call that can be used on ocfs2 an=
d
> btrfs as well.  Thanks for adding this to glibc!

I was thinking there would be a generic syscall for this.
So cp should call reflink() instead when it becomes available.

thanks for the info!
P=E1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-29 18:14                 ` Pádraig Brady
@ 2009-07-30  0:57                   ` Joel Becker
  2009-07-30  7:39                     ` Jim Meyering
  0 siblings, 1 reply; 21+ messages in thread
From: Joel Becker @ 2009-07-30  0:57 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Chris Mason, Giuseppe Scrivano, Jim Meyering, bug-coreutils,
	linux-btrfs

On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=E1draig Brady wrote:
> Chris Mason wrote:
> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=E1draig Brady wrote:
> >>
> >> We may need to play around with fallocate()
> >> if we want to get back to the original
> >> cp semantics of actually allocating space
> >> on the file system for the new file.
> >
> > Well, best to just use the original cp code.  I was talking with
> > Giuseppe about this as well, I think we should the option to do reg=
ular
> > cp via a flag.
>=20
> Right. Well we can turn off this cloning by doing --sparse=3D{never,a=
lways}
> but that has side effects. If we need an option then maybe we should =
have
> it turn on cloning rather than restore default cp behaviour?
> The side effects I thought of earlier, of COW without corresponding a=
llocation
> were possible fragmentation on write or unexpected/mishandled ENOSPC.
> Also for endangered mechanical disks, subsequent processing could
> be slowed as the head seeks between the old and new data to be copied=
=2E
> Perhaps these are a small price to pay, especially considering that
> solid state disks will only be affected by the write()=3DENOSPC issue=
=2E
>=20
> At the moment we have these linking options:
>=20
> cp -l, --link #for hardlinks
> cp -s, --symbolic-link #for symlinks
>=20
> So perhaps we should support:
>=20
> cp --link=3D{soft,hard,cow}
> for symlink(), link() and reflink() respectively?
> I.E. link to the name, inode or extents respectively.

	I've cooked up 'ln -r' for reflinks, which works for ln(1) but
not for cp(1).  I have a git tree with the (in-flux) code on
oss.oracle.com:

[View]
http://oss.oracle.com/git/?p=3Djlbec/reflink.git;a=3Dsummary
[Pull]
git://oss.oracle.com/git/jlbec/reflink.git master

	This repository isn't designed to be an authorative patch for
coreutils.  Instead it provides a reflink(1) program that is actually l=
n
-r in disguise.  Later work would be to get coreutils updated
"properly".

Joel

--=20

"This is the end, beautiful friend.
 This is the end, my only friend the end
 Of our elaborate plans, the end
 Of everything that stands, the end
 No safety or surprise, the end
 I'll never look into your eyes again."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  0:57                   ` Joel Becker
@ 2009-07-30  7:39                     ` Jim Meyering
  2009-07-30  8:21                       ` Joel Becker
                                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jim Meyering @ 2009-07-30  7:39 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: linux-btrfs, bug-coreutils, Giuseppe Scrivano, Chris Mason

Joel Becker wrote:

> On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote:
>> Chris Mason wrote:
>> > On Wed, Jul 29, 2009 at 03:14:49PM +0100, P=C3=A1draig Brady wrote:
>> >>
>> >> We may need to play around with fallocate()
>> >> if we want to get back to the original
>> >> cp semantics of actually allocating space
>> >> on the file system for the new file.
>> >
>> > Well, best to just use the original cp code.  I was talking with
>> > Giuseppe about this as well, I think we should the option to do regular
>> > cp via a flag.
>>
>> Right. Well we can turn off this cloning by doing --sparse=3D{never,alwa=
ys}
>> but that has side effects. If we need an option then maybe we should have
>> it turn on cloning rather than restore default cp behaviour?
>> The side effects I thought of earlier, of COW without corresponding allo=
cation
>> were possible fragmentation on write or unexpected/mishandled ENOSPC.
>> Also for endangered mechanical disks, subsequent processing could
>> be slowed as the head seeks between the old and new data to be copied.
>> Perhaps these are a small price to pay, especially considering that
>> solid state disks will only be affected by the write()=3DENOSPC issue.
>>
>> At the moment we have these linking options:
>>
>> cp -l, --link #for hardlinks
>> cp -s, --symbolic-link #for symlinks
>>
>> So perhaps we should support:
>>
>> cp --link=3D{soft,hard,cow}
>> for symlink(), link() and reflink() respectively?
>> I.E. link to the name, inode or extents respectively.
>
> 	I've cooked up 'ln -r' for reflinks, which works for ln(1) but
> not for cp(1).

Thanks.  I haven't looked, but after reading about the reflink syscall
[http://lwn.net/Articles/332802/] had come to the same conclusion:
this feature belongs with ln rather than with cp.

Besides, putting the new behavior on a new option avoids
the current semantic change we would otherwise induce in cp.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  7:39                     ` Jim Meyering
@ 2009-07-30  8:21                       ` Joel Becker
  2009-07-30  8:40                       ` Pádraig Brady
  2009-07-30  9:26                       ` Andi Kleen
  2 siblings, 0 replies; 21+ messages in thread
From: Joel Becker @ 2009-07-30  8:21 UTC (permalink / raw)
  To: Jim Meyering
  Cc: bug-coreutils, linux-btrfs, Pádraig Brady, Giuseppe Scrivano,
	Chris Mason

On Thu, Jul 30, 2009 at 09:39:17AM +0200, Jim Meyering wrote:
> Joel Becker wrote:
> > 	I've cooked up 'ln -r' for reflinks, which works for ln(1) but
> > not for cp(1).
> 
> Thanks.  I haven't looked, but after reading about the reflink syscall
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.
> 
> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.

	Well, I don't see any reason cp(1) can't take advantage of
reflink(2).  I just think that cp(1) should look at reflink(2) as an
optimization, not a specific methodology.
	What do I mean?  If you want to say "I know what a reflink is,
and that's exactly what I want", you want "ln -r".  But say you want a
"cp --snap" that tries to take a snapshot regardless of the backend.  It
could use reflink(2) on filesystems that support it, or perhaps a
passthrough call to the underlying storage, or who knows what.  I can
also imagine a "cp --shallow" that is "if you can cow, do it, otherwise
do a normal cp".

Joel

-- 

"I think it would be a good idea."  
        - Mahatma Ghandi, when asked what he thought of Western
          civilization

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  7:39                     ` Jim Meyering
  2009-07-30  8:21                       ` Joel Becker
@ 2009-07-30  8:40                       ` Pádraig Brady
  2009-07-30 16:28                         ` Ric Wheeler
  2009-07-30  9:26                       ` Andi Kleen
  2 siblings, 1 reply; 21+ messages in thread
From: Pádraig Brady @ 2009-07-30  8:40 UTC (permalink / raw)
  To: Jim Meyering; +Cc: Chris Mason, Giuseppe Scrivano, bug-coreutils, linux-btrfs

Jim Meyering wrote:
> Joel Becker wrote:
>=20
>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote:
>>>
>>> At the moment we have these linking options:
>>>
>>> cp -l, --link #for hardlinks
>>> cp -s, --symbolic-link #for symlinks
>>>
>>> So perhaps we should support:
>>>
>>> cp --link=3D{soft,hard,cow}
>>> for symlink(), link() and reflink() respectively?
>>> I.E. link to the name, inode or extents respectively.
>>
>> 	I've cooked up 'ln -r' for reflinks, which works for ln(1) but
>> not for cp(1).
>=20
> Thanks.  I haven't looked, but after reading about the reflink syscal=
l
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.

Right. It definitely should be in ln anyway.

> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.

Yes doing reflink() in cp by default currently can
be problematic as discussed, especially on mechanical hard disks.
Though in future I can see most users of cp preferring
reflink() to be done, rather than read()/write(). Ponder...

In any case putting --link=3Dcow or --reflink or whatever in cp
could be very useful for creating writeable snapshot branches.

cheers,
P=C3=A1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  7:39                     ` Jim Meyering
  2009-07-30  8:21                       ` Joel Becker
  2009-07-30  8:40                       ` Pádraig Brady
@ 2009-07-30  9:26                       ` Andi Kleen
  2009-07-30 10:02                         ` Pádraig Brady
  2009-07-30 10:16                         ` Jim Meyering
  2 siblings, 2 replies; 21+ messages in thread
From: Andi Kleen @ 2009-07-30  9:26 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano,
	Chris Mason

Jim Meyering <jim@meyering.net> writes:
>
> Thanks.  I haven't looked, but after reading about the reflink syscall
> [http://lwn.net/Articles/332802/] had come to the same conclusion:
> this feature belongs with ln rather than with cp.

cp already has -l so it would make sense to extend that too.

> Besides, putting the new behavior on a new option avoids
> the current semantic change we would otherwise induce in cp.

I don't see how semantics change in a user visible way.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  9:26                       ` Andi Kleen
@ 2009-07-30 10:02                         ` Pádraig Brady
  2009-07-30 10:16                         ` Jim Meyering
  1 sibling, 0 replies; 21+ messages in thread
From: Pádraig Brady @ 2009-07-30 10:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jim Meyering, linux-btrfs, bug-coreutils, Giuseppe Scrivano,
	Chris Mason

Andi Kleen wrote:
> Jim Meyering <jim@meyering.net> writes:
>> Thanks.  I haven't looked, but after reading about the reflink sysca=
ll
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
>=20
> cp already has -l so it would make sense to extend that too.
>=20
>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
>=20
> I don't see how semantics change in a user visible way.

I was thinking that doing reflink() in cp has the following
user visible advantages/disadvantages:

Advantages:
  very quick copy
  less space used

Disadvantages:
  disk head seeking deferred to modification process
  possible fragmentation on write
  possible ENOSPC on write

The disk head seeking issue will go away with time.
I'm not sure if the other disadvantages exist or whether
they could be alleviated with fallocate() or something.

cheers,
P=E1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  9:26                       ` Andi Kleen
  2009-07-30 10:02                         ` Pádraig Brady
@ 2009-07-30 10:16                         ` Jim Meyering
  2009-07-30 10:21                           ` Tomasz Chmielewski
  2009-07-30 10:54                           ` Andi Kleen
  1 sibling, 2 replies; 21+ messages in thread
From: Jim Meyering @ 2009-07-30 10:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Pádraig Brady, linux-btrfs, bug-coreutils, Giuseppe Scrivano,
	Chris Mason

Andi Kleen wrote:

> Jim Meyering <jim@meyering.net> writes:
>>
>> Thanks.  I haven't looked, but after reading about the reflink syscall
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
>
> cp already has -l so it would make sense to extend that too.

Good point.

>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
>
> I don't see how semantics change in a user visible way.

With classic cp, if I copy a 1GB non-sparse file and there's less
space than that available, cp fails with ENOSPC.
With this new feature, it succeeds even if there are
just a few blocks available.

Also, consider (buggy!) code that then depends on being able to modify
that file in-place, and that "knows" it doesn't need to check for ENOSPC.
Sure, they should always check for write failure, but still.  It is
a change.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30 10:16                         ` Jim Meyering
@ 2009-07-30 10:21                           ` Tomasz Chmielewski
  2009-07-30 10:54                           ` Andi Kleen
  1 sibling, 0 replies; 21+ messages in thread
From: Tomasz Chmielewski @ 2009-07-30 10:21 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Andi Kleen, Pádraig Brady, linux-btrfs, bug-coreutils,
	Giuseppe Scrivano, Chris Mason

Jim Meyering wrote:

> With classic cp, if I copy a 1GB non-sparse file and there's less
> space than that available, cp fails with ENOSPC.
> With this new feature, it succeeds even if there are
> just a few blocks available.

Is it good or bad?


> Also, consider (buggy!) code that then depends on being able to modify
> that file in-place, and that "knows" it doesn't need to check for ENOSPC.
> Sure, they should always check for write failure, but still.  It is
> a change.

On a multiuser system, that (buggy) tool would fail anyway if something 
else adds enough new data to the filesystem in the meantime.

But sure, it's a change.


-- 
Tomasz Chmielewski
http://wpkg.org

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30 10:16                         ` Jim Meyering
  2009-07-30 10:21                           ` Tomasz Chmielewski
@ 2009-07-30 10:54                           ` Andi Kleen
  2009-07-30 18:05                             ` Joel Becker
  1 sibling, 1 reply; 21+ messages in thread
From: Andi Kleen @ 2009-07-30 10:54 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Andi Kleen, Pádraig Brady, linux-btrfs, bug-coreutils,
	Giuseppe Scrivano, Chris Mason

> 
> With classic cp, if I copy a 1GB non-sparse file and there's less
> space than that available, cp fails with ENOSPC.
> With this new feature, it succeeds even if there are
> just a few blocks available.
> 
> Also, consider (buggy!) code that then depends on being able to modify
> that file in-place, and that "knows" it doesn't need to check for ENOSPC.
> Sure, they should always check for write failure, but still.  It is
> a change.

Fair point, although I suspect there are cases where ENOSPC
on non extending write can already happen on specific file systems. e.g. on 
btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2
when the garbage collector doesn't run in time. Wouldn't surprise 
me if there weren't more cases already.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30  8:40                       ` Pádraig Brady
@ 2009-07-30 16:28                         ` Ric Wheeler
  2009-07-30 16:48                           ` Jim Meyering
  0 siblings, 1 reply; 21+ messages in thread
From: Ric Wheeler @ 2009-07-30 16:28 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: Jim Meyering, Chris Mason, Giuseppe Scrivano, bug-coreutils,
	linux-btrfs

On 07/30/2009 04:40 AM, P=C3=A1draig Brady wrote:
> Jim Meyering wrote:
>   =20
>> Joel Becker wrote:
>>
>>     =20
>>> On Wed, Jul 29, 2009 at 07:14:37PM +0100, P=C3=A1draig Brady wrote:
>>>       =20
>>>> At the moment we have these linking options:
>>>>
>>>> cp -l, --link #for hardlinks
>>>> cp -s, --symbolic-link #for symlinks
>>>>
>>>> So perhaps we should support:
>>>>
>>>> cp --link=3D{soft,hard,cow}
>>>> for symlink(), link() and reflink() respectively?
>>>> I.E. link to the name, inode or extents respectively.
>>>>         =20
>>> 	I've cooked up 'ln -r' for reflinks, which works for ln(1) but
>>> not for cp(1).
>>>       =20
>> Thanks.  I haven't looked, but after reading about the reflink sysca=
ll
>> [http://lwn.net/Articles/332802/] had come to the same conclusion:
>> this feature belongs with ln rather than with cp.
>>     =20
>
> Right. It definitely should be in ln anyway.
>
>   =20
>> Besides, putting the new behavior on a new option avoids
>> the current semantic change we would otherwise induce in cp.
>>     =20
>
> Yes doing reflink() in cp by default currently can
> be problematic as discussed, especially on mechanical hard disks.
> Though in future I can see most users of cp preferring
> reflink() to be done, rather than read()/write(). Ponder...
>
>   =20

I think that doing reflink by default would be a horrible idea - one=20
good reason to copy a file is to increase your level of fault tolerance=
=20
and reflink magically avoids that :-)

reflink is a neat feature, but should be used on purpose in my opinion,

ric

> In any case putting --link=3Dcow or --reflink or whatever in cp
> could be very useful for creating writeable snapshot branches.
>
> cheers,
> P=C3=A1draig.
>   =20

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30 16:28                         ` Ric Wheeler
@ 2009-07-30 16:48                           ` Jim Meyering
  0 siblings, 0 replies; 21+ messages in thread
From: Jim Meyering @ 2009-07-30 16:48 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Pádraig Brady, Chris Mason, Giuseppe Scrivano, bug-coreutils,
	linux-btrfs

Ric Wheeler wrote:
> I think that doing reflink by default would be a horrible idea - one
> good reason to copy a file is to increase your level of fault
> tolerance and reflink magically avoids that :-)

Good point.
This would constitute another user-visible semantic change in cp:
a disk fault that affects any non-metadata block of a ref-linked file
affects both copies.

GNU cp will soon attempt this only when a --reflink option is specified.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
       [not found]           ` <4A70C4E0.9030104@draigBrady.com>
@ 2009-07-30 17:28             ` Giuseppe Scrivano
  0 siblings, 0 replies; 21+ messages in thread
From: Giuseppe Scrivano @ 2009-07-30 17:28 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: bug-coreutils, linux-btrfs

Hi P=C3=A1draig,

thanks for the comments.

P=C3=A1draig Brady <P@draigBrady.com> writes:

> # 300MB seems to be the minimum size for a btrfs with default
> parameters.

Actually, it seems the minimum space required is 256MB.  Using a 255MB
image I get: "device btrfs.img is too small (must be at least 256 MB)"


> # FIXME: use `truncate --allocate` when it becomes available, which
> # may allow unmarking this as an expensive test.

Are you sure that this feature will make the test less expensive?  Stil=
l
the test files must be written there, so in the best case (considering
the fallocate done in 0s) only the dd cost will be saved but still it
looks like an expensive test.

In the version I attached, I am using a sparse file (truncate --size)
and it seems to work fine.  Is it correct or am I missing something?

I haven't looked yet but probably there are other tests that can take
advantage of sparse files instead of using "dd".

I am also considering the Jim's note doing the umount in the cleanup_
function.

Cheers,
Giuseppe


=46rom 7add4b337b7db0a63bca0dd0fe0f146f175163f8 Mon Sep 17 00:00:00 200=
1
=46rom: Giuseppe Scrivano <gscrivano@gnu.org>
Date: Wed, 29 Jul 2009 20:31:20 +0200
Subject: [PATCH] tests: add a test for btrfs' copy-on-write file clone =
operation

* tests/Makefile.am: Consider the new test.
* tests/cp/file-clone: New file.
---
 tests/Makefile.am   |    1 +
 tests/cp/file-clone |   58 +++++++++++++++++++++++++++++++++++++++++++=
++++++++
 2 files changed, 59 insertions(+), 0 deletions(-)
 create mode 100755 tests/cp/file-clone

diff --git a/tests/Makefile.am b/tests/Makefile.am
index 59737a0..9841aa3 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -20,6 +20,7 @@ EXTRA_DIST =3D		\
=20
 root_tests =3D					\
   chown/basic					\
+  cp/file-clone				\
   cp/cp-a-selinux				\
   cp/preserve-gid				\
   cp/special-bits				\
diff --git a/tests/cp/file-clone b/tests/cp/file-clone
new file mode 100755
index 0000000..c65b9cb
--- /dev/null
+++ b/tests/cp/file-clone
@@ -0,0 +1,58 @@
+#!/bin/sh
+# Make sure file-clone on a btrfs file system works properly.
+
+# Copyright (C) 2009 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>=
=2E
+
+
+if test "$VERBOSE" =3D yes; then
+  set -x
+  cp --version
+fi
+
+. $srcdir/test-lib.sh
+
+require_root_
+require_sparse_support_
+#expensive_
+
+cleanup_(){ umount btrfs; }
+
+fail=3D0
+
+mkfs.btrfs --version || skip_test_ "btrfs userland tools not installed=
"
+
+# 256MB seems to be the minimum size for a btrfs with default paramete=
rs.
+truncate --size=3D256M btrfs.img  || framework_failure
+
+mkfs.btrfs btrfs.img  || framework_failure
+
+mkdir btrfs || framework_failure
+
+mount -t btrfs -o loop btrfs.img btrfs || framework_failure
+
+dd bs=3D1M count=3D200 if=3D/dev/zero of=3Dbtrfs/alloc.test || framewo=
rk_failure
+
+# If the file is cloned, only additional space for metadata is require=
d.
+# Two 200MB files can be present even if the total file system space i=
s 256MB.
+cp btrfs/alloc.test btrfs/clone.test || fail=3D1
+rm btrfs/clone.test
+
+# When --sparse=3D{always,never} is used, the file is copied without a=
ny cloning.
+# Use --sparse=3Dnever to be sure the file is copied without holes and=
 it is not
+# possible since there is not enough free space.
+cp --sparse=3Dnever btrfs/alloc.test btrfs/clone.test && fail=3D1
+
+Exit $fail
--=20
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30 10:54                           ` Andi Kleen
@ 2009-07-30 18:05                             ` Joel Becker
  2009-07-30 23:28                               ` Pádraig Brady
  0 siblings, 1 reply; 21+ messages in thread
From: Joel Becker @ 2009-07-30 18:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jim Meyering, Pádraig Brady, linux-btrfs, bug-coreutils,
	Giuseppe Scrivano, Chris Mason

On Thu, Jul 30, 2009 at 12:54:16PM +0200, Andi Kleen wrote:
> > With classic cp, if I copy a 1GB non-sparse file and there's less
> > space than that available, cp fails with ENOSPC.
> > With this new feature, it succeeds even if there are
> > just a few blocks available.
> > 
> > Also, consider (buggy!) code that then depends on being able to modify
> > that file in-place, and that "knows" it doesn't need to check for ENOSPC.
> > Sure, they should always check for write failure, but still.  It is
> > a change.
> 
> Fair point, although I suspect there are cases where ENOSPC
> on non extending write can already happen on specific file systems. e.g. on 
> btrfs it might happen when the tree gets rebalanced? Or perhaps on nilfs2
> when the garbage collector doesn't run in time. Wouldn't surprise 
> me if there weren't more cases already.

	In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees
enabled, or any other CoW-ish filesystem is a tacit approval of the
delayed ENOSPC.  The same can be said of "thin provisioning" LUNs.
However, the other concerns are still valid.  A user invoking vanilla
cp(1) expects two independent storage regions for the data.
	(Oh, and what about future support of de-duping in filesystems?
:-)

Joel

-- 

"Anything that is too stupid to be spoken is sung."  
        - Voltaire

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: BTRFS file clone support for cp
  2009-07-30 18:05                             ` Joel Becker
@ 2009-07-30 23:28                               ` Pádraig Brady
  0 siblings, 0 replies; 21+ messages in thread
From: Pádraig Brady @ 2009-07-30 23:28 UTC (permalink / raw)
  To: Andi Kleen, Jim Meyering, linux-btrfs, bug-coreutils,
	Giuseppe Scrivano, Chris Mason <chris.mason@

Joel Becker wrote:
> 	In some sense, using btrfs, nilfs2i, ocfs2 with refcount trees
> enabled, or any other CoW-ish filesystem is a tacit approval of the
> delayed ENOSPC.  The same can be said of "thin provisioning" LUNs.
> However, the other concerns are still valid.  A user invoking vanilla
> cp(1) expects two independent storage regions for the data.
> 	(Oh, and what about future support of de-duping in filesystems?
> :-)

I maintain an app to de-dupe at http://www.pixelbeat.org/fslint/
and I'll be adding reflink support as soon as it becomes available.
=46rom a filesystem point of view, one thing that would help speed
this up (and many other things like rsync etc.) would be to allow
one to associate say a sha-3 hash or whatever with the file, which
the filesystem would automatically clear when the file data changes.
So in general having a special set of extended attributes that
were auto cleared on file modification would be very useful for
lots of stuff.

cheers,
P=E1draig.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2009-07-30 23:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <87d47o3fip.fsf@master.homenet>
     [not found] ` <4A6CEA48.5050208@draigBrady.com>
     [not found]   ` <8763defuvq.fsf@meyering.net>
     [not found]     ` <87ws5tvrq8.fsf@master.homenet>
2009-07-27 23:40       ` BTRFS file clone support for cp Pádraig Brady
2009-07-28 20:06         ` Giuseppe Scrivano
2009-07-29 13:01           ` Chris Mason
2009-07-29 14:14             ` Pádraig Brady
2009-07-29 16:10               ` Chris Mason
2009-07-29 16:18                 ` Chris Mason
2009-07-29 18:14                 ` Pádraig Brady
2009-07-30  0:57                   ` Joel Becker
2009-07-30  7:39                     ` Jim Meyering
2009-07-30  8:21                       ` Joel Becker
2009-07-30  8:40                       ` Pádraig Brady
2009-07-30 16:28                         ` Ric Wheeler
2009-07-30 16:48                           ` Jim Meyering
2009-07-30  9:26                       ` Andi Kleen
2009-07-30 10:02                         ` Pádraig Brady
2009-07-30 10:16                         ` Jim Meyering
2009-07-30 10:21                           ` Tomasz Chmielewski
2009-07-30 10:54                           ` Andi Kleen
2009-07-30 18:05                             ` Joel Becker
2009-07-30 23:28                               ` Pádraig Brady
     [not found]         ` <87k51r9sxh.fsf@master.homenet>
     [not found]           ` <4A70C4E0.9030104@draigBrady.com>
2009-07-30 17:28             ` Giuseppe Scrivano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox