difference between -c and -p for send-receive?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* difference between -c and -p for send-receive?
@ 2017-09-19  0:41 Dave
  2017-09-19  4:40 ` Duncan
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave @ 2017-09-19  0:41 UTC (permalink / raw)
  To: linux-btrfs

new subject for new question

On Mon, Sep 18, 2017 at 1:37 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:

> >> What scenarios can lead to "ERROR: parent determination failed"?
> >
> > The man page for btrfs-send is reasonably clear on the requirements
> > btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
> > options) then the specified snapshots must exist on both the source and
> > destination. If you don't have a suitable existing snapshot then don't
> > use -c or -p and just do a full send.
> >
>
> Well, I do not immediately see why -c must imply incremental send. We
> want to reduce amount of data that is transferred, so reuse data from
> existing snapshots, but it is really orthogonal to whether we send full
> subvolume or just changes since another snapshot.
>

Starting months ago when I began using btrfs serious, I have been
reading, rereading and trying to understand this:

FAQ - btrfs Wiki
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F

The comment above suddenly gives me another clue...

However, I still don't understand terms like "clone range ioctl",
although I can guess it is something like a hard link.

Would it be correct to say the following?

1. "-c" causes (appropriate) files in the newly transferred snapshot
to be "hard linked" to existing files in another snapshot on the
destination. Doesn't "-p" do something equivalent though?

2. The -c and -p options can be used together or individually.

Questions:

If "-c" "will send all of the metadata of @B.1, but will leave out the
data for @B.1/bigfile, because it's already in the backups filesystem,
and can be reflinked from there" what will -p do in contrast?

Will "-p" not send all the metadata?

Will "-p" also leave out the data for @B.1/bigfile, when it's also
already in the backups?

What would make me choose one of these options over the other? I still
struggle to see the difference.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-19  0:41 difference between -c and -p for send-receive? Dave
@ 2017-09-19  4:40 ` Duncan
  2017-09-19 10:24 ` Graham Cobb
  2017-09-20  4:06 ` Andrei Borzenkov
  2 siblings, 0 replies; 7+ messages in thread
From: Duncan @ 2017-09-19  4:40 UTC (permalink / raw)
  To: linux-btrfs

Dave posted on Mon, 18 Sep 2017 20:41:45 -0400 as excerpted:

>> Well, I do not immediately see why -c must imply incremental send. We
>> want to reduce amount of data that is transferred, so reuse data from
>> existing snapshots, but it is really orthogonal to whether we send full
>> subvolume or just changes since another snapshot.
>>
>>
> Starting months ago when I began using btrfs serious, I have been
> reading,
> rereading and trying to understand this:
> 
> FAQ - btrfs Wiki
> https://btrfs.wiki.kernel.org/index.php/
FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F
> 
> The comment above suddenly gives me another clue...
> 
> However, I still don't understand terms like "clone range ioctl",
> although I can guess it is something like a hard link.
> 
> Would it be correct to say the following?
> 
> 1. "-c" causes (appropriate) files in the newly transferred snapshot to
> be "hard linked" to existing files in another snapshot on the
> destination.

Technically, it's not a hard link but a reflink.  However, it's a 
reasonably accurate analogy for understanding the process, it's just at a 
different layer.

> Doesn't "-p" do something equivalent though?

Yes.  See below for the difference.

> 2. The -c and -p options can be used together or individually.

Yes.

> Questions:
> 
> If "-c" "will send all of the metadata of @B.1, but will leave out the
> data for @B.1/bigfile, because it's already in the backups filesystem,
> and can be reflinked from there" what will -p do in contrast?
> 
> Will "-p" not send all the metadata?
> 
> Will "-p" also leave out the data for @B.1/bigfile, when it's also
> already in the backups?

-c is less strict than -p, and sends more metadata over the wire as a 
result, but where the data is the same (reflink points to the same 
extent), it won't be sent in either case.  See below.

> What would make me choose one of these options over the other? I still
> struggle to see the difference.

What -p does is tell send that the named snapshot is a snapshot of an 
earlier state of the snapshot being sent, and that said earlier-state 
snapshot exists on both the send and receive end, so only the changes 
(both data and metadata) from the earlier snapshot must be sent.

Put a different way, the snapshot being sent is the parent, plus any 
changes since then, so to recreate the new snapshot, only the operations 
needed to update the state from the previous to the new state must be 
sent, and done by receive on the other end.

-c is less strict than -p.  It doesn't consider the named snapshot to be 
an earlier state of the snapshot being sent, but simply says that the two 
may have some data in common, as defined by reflinks to the same shared 
extents.

So -c will send more over the wire, in particular, it'll send much more 
metadata, I believe (being no dev or expert, just a list regular) 
essentially all metadata, because no claim as to the relationship of the 
metadata between the snapshot being sent and the clone is assumed.  But 
it can and does still assume that any extents reflinked in common can be 
simply sent by reference, instead of sending the literal data in that 
extent, because -c says the other end already has the snapshot named as a 
clone and that it can simply reflink it there, as well.

The wording of the manpage description for -c suggests that it picks one 
(and only one if there's more than one) -c clone and considers it a 
parent, which would allow it to shortcut sending the metadata in common 
for it as well, but not being a dev, I haven't looked at the code to be 
sure, and in any case, there can be only one parent, so it can do it for 
only one clone, even if there's more than one -c snapshot supplied.

So -p is primarily for the case where the named snapshot is an earlier 
state of the one being sent, and should be much more efficient than -c in 
that case.  However, the less strict -c should also work, and if the 
wording of the manpage can be believed, a single named -c snapshot will 
be treated as -p anyway.  But -c can also be used for snapshots that 
aren't related with one being an earlier state of the other, where 
there's simply some reflinks in common, perhaps due to dedup.  It should 
still result in the data with the common reflinks being only sent by 
reference, but much more metadata will be sent, and if there's not a lot 
of reflinks in common, it's likely to require enough additional 
processing that the relatively trivial amount of common reflinked data it 
might save may not be worth it, compared to simply sending a full non-
incremental snapshot.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-19  0:41 difference between -c and -p for send-receive? Dave
  2017-09-19  4:40 ` Duncan
@ 2017-09-19 10:24 ` Graham Cobb
  2017-09-19 11:30   ` Andrei Borzenkov
  2017-09-20  4:06 ` Andrei Borzenkov
  2 siblings, 1 reply; 7+ messages in thread
From: Graham Cobb @ 2017-09-19 10:24 UTC (permalink / raw)
  To: linux-btrfs

On 19/09/17 01:41, Dave wrote:
> Would it be correct to say the following?

Like Duncan, I am just a user, and I haven't checked the code. I
recommend Duncan's explanation, but in case you are looking for
something simpler, how about thinking with the following analogy...

Think of -p as like doing an incremental backup: it tells send to just
send the instructions for the changes to get from the "parent" subvolume
to the current subvolume. Without -p it is like a full backup:
everything in the current subvolume is sent.

-c is different: it says "and by the way, these files also already exist
on the destination so they might be useful to skip actually sending some
of the file contents". Imagine that whenever a file content is about to
be sent (whether incremental or full), btrfs-send checks to see if the
data is in one of the -c subvolumes and, if it is, it sends "get the
data by reflinking to this file over here" instead of sending the data
itself. -c is really just an optimisation to save sending data if you
know the data is already available somewhere else on the destination.

Be aware that this is really just an analogy (like "hard linking" is an
analogy for reflinking using the clone range ioctl). Duncan's email
provides more real details.

In particular, this analogy doesn't explain the original questioner's
problem. In the analogy, -c might work without the files actually being
present on the source (as long as they are on the destination). But, in
reality, because the underlying mechanism is extent range cloning, the
files have to be present on **both** the source and the destination in
order for btrfs-send to work out what commands to send.

By the way, like Duncan, I was surprised that the man page suggests that
-c without -p causes one of the clones to be treated as a parent. I have
not checked the code to see if that is actually how it works.

Graham

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-19 10:24 ` Graham Cobb
@ 2017-09-19 11:30   ` Andrei Borzenkov
  0 siblings, 0 replies; 7+ messages in thread
From: Andrei Borzenkov @ 2017-09-19 11:30 UTC (permalink / raw)
  To: Graham Cobb; +Cc: Btrfs BTRFS

On Tue, Sep 19, 2017 at 1:24 PM, Graham Cobb <g.btrfs@cobb.uk.net> wrote:
> On 19/09/17 01:41, Dave wrote:
>> Would it be correct to say the following?
>
> Like Duncan, I am just a user, and I haven't checked the code. I
> recommend Duncan's explanation, but in case you are looking for
> something simpler, how about thinking with the following analogy...
>
> Think of -p as like doing an incremental backup: it tells send to just
> send the instructions for the changes to get from the "parent" subvolume
> to the current subvolume. Without -p it is like a full backup:
> everything in the current subvolume is sent.
>
> -c is different:

It is not really different - it is extra. You have -p and optionally
-c which modifies its behavior.

> it says "and by the way, these files also already exist
> on the destination so they might be useful to skip actually sending some
> of the file contents". Imagine that whenever a file content is about to
> be sent (whether incremental or full), btrfs-send checks to see if the
> data is in one of the -c subvolumes and, if it is, it sends "get the
> data by reflinking to this file over here" instead of sending the data
> itself. -c is really just an optimisation to save sending data if you
> know the data is already available somewhere else on the destination.
>
> Be aware that this is really just an analogy (like "hard linking" is an
> analogy for reflinking using the clone range ioctl). Duncan's email
> provides more real details.
>
> In particular, this analogy doesn't explain the original questioner's
> problem. In the analogy, -c might work without the files actually being
> present on the source (as long as they are on the destination). But, in
> reality, because the underlying mechanism is extent range cloning, the
> files have to be present on **both** the source and the destination in
> order for btrfs-send to work out what commands to send.
>

Yes. Decision whether to send full data or reflink is taken on source,
so data must be present on source.

> By the way, like Duncan, I was surprised that the man page suggests that
> -c without -p causes one of the clones to be treated as a parent. I have
> not checked the code to see if that is actually how it works.
>

It is. As implemented, -c *requires* parent snapshot, either
explicitly via -p option or implicitly. What it does:

a) checks that both snapshot to transfer and all snapshots given as
arguments to -c have the same parent uuid;
b) selects "best match" by comparing how close snapshots from -c
option are to parent. As far as I can tell it chooses the oldest
snapshot (with minimal difference to the parent) as base (implicit
-p).

Which implies that "btrfs send -c foo bar" is entirely equivalent to
"btrfs send -p foo bar".

Which still does not explain why script fails. As mentioned, as
snapshots created by snapper should have the same parent uuid, which
leaves only possibility of non-existent subvolume, but then script
should have failed much earlier.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-19  0:41 difference between -c and -p for send-receive? Dave
  2017-09-19  4:40 ` Duncan
  2017-09-19 10:24 ` Graham Cobb
@ 2017-09-20  4:06 ` Andrei Borzenkov
  2017-09-20 19:05   ` Antoine Belvire
  2 siblings, 1 reply; 7+ messages in thread
From: Andrei Borzenkov @ 2017-09-20  4:06 UTC (permalink / raw)
  To: Dave, linux-btrfs

19.09.2017 03:41, Dave пишет:
> new subject for new question
> 
> On Mon, Sep 18, 2017 at 1:37 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> 
>>>> What scenarios can lead to "ERROR: parent determination failed"?
>>>
>>> The man page for btrfs-send is reasonably clear on the requirements
>>> btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
>>> options) then the specified snapshots must exist on both the source and
>>> destination. If you don't have a suitable existing snapshot then don't
>>> use -c or -p and just do a full send.
>>>
>>
>> Well, I do not immediately see why -c must imply incremental send. We
>> want to reduce amount of data that is transferred, so reuse data from
>> existing snapshots, but it is really orthogonal to whether we send full
>> subvolume or just changes since another snapshot.
>>
> 
> Starting months ago when I began using btrfs serious, I have been
> reading, rereading and trying to understand this:
> 
> FAQ - btrfs Wiki
> https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F
> 

This wiki entry is wrong (and as long as I can believe git, it has
always been wrong).

First, "btrfs send -c" does not start with blank subvolume; it starts
with "best parent" which is determined automatically. Actually if you
look at the help output in the very first version of send command:

        "By default, this will send the whole subvolume. To do",
        "an incremental send, one or multiple '-i <clone_source>'",
        "arguments have to be specified. A 'clone source' is",
        "a subvolume that is known to exist on the receiving",
        "side in exactly the same state as on the sending side.\n",
        "Normally, a good snapshot parent is searched automatically",
        "in the list of 'clone sources'. To override this, use",
        "'-p <parent>' to manually specify a snapshot parent.",

it explains fat better what -c and -p do (ignore -i, this is error that
was fixed later, it means -c).

Second, example in wiki simply does not work. All snapshots listed in -c
options and snapshot that we want to transfer must have the same parent
uuid, unless -p is explicitly provided. Example shows snapshots of two
different subvolumes. I could not make it work even if A and B
themselves are cloned from common subvolume.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-20  4:06 ` Andrei Borzenkov
@ 2017-09-20 19:05   ` Antoine Belvire
  2017-09-20 19:21     ` Andrei Borzenkov
  0 siblings, 1 reply; 7+ messages in thread
From: Antoine Belvire @ 2017-09-20 19:05 UTC (permalink / raw)
  To: linux-btrfs, arvidjaar

Hello,

 > All snapshots listed in -c options and snapshot that we want to
 > transfer must have the same parent uuid, unless -p is explicitly
 > provided.

It's rather the same mount point than the same parent uuid, like cp 
--reflink, isn't it?

~# btrfs subvolume create /test2/
Create subvolume '//test2'
~# btrfs subvolume create /test2/foo
Create subvolume '/test2/foo'
~# cd /test2	
~# btrfs subvolume snapshot -r . .1
Create a readonly snapshot of '.' in './.1'
~#
~# # a: 40 MiB in /test2/
~# dd if=/dev/urandom of=a bs=4k count=10k
10240+0 records in
10240+0 records out
41943040 bytes (42 MB, 40 MiB) copied, 0.198961 s, 211 MB/s
~#
~# # b: 80 MiB in /test2/foo
~# dd if=/dev/urandom of=foo/b bs=4k count=20k
0480+0 records in
20480+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 0.393823 s, 213 MB/s
~#
~# # copy-clone /test2/foo/b to /test2/b
~# cp --reflink foo/b .
~#
~# btrfs subvolume -s . .2
Create a readonly snapshot of '.' in './.2'
~#
~# # Sending .2 with only .1 as parent (.1 already sent)
~# btrfs send -p .1 .2 | wc -c
At subvol .2
125909258 # 120 Mio = 'a' + 'b'
~#
~# # Sending .2 with .1 and foo as clone sources (.1 and foo already
~# # sent), .1 is automatically picked as parent
~# btrfs property set foo ro true
~# btrfs send -c .1 -c foo .2 | wc -c
At subvol .2
41970349 # 40 Mio, only 'a'
~#

UUIDs on the sending side:

~# btrfs subvolume list -uq / | grep test2
ID 6141 gen 454658 top level 6049 parent_uuid - uuid 
bbf936dd-ca84-f749-9b9b-09f7081879a2 path test2
ID 6142 gen 454658 top level 6141 parent_uuid - uuid 
54a7cdea-6198-424a-9349-8116172d0c17 path test2/foo
ID 6143 gen 454655 top level 6141 parent_uuid 
bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid 
28f1d7db-7341-f545-a2ac-d8819d22a5b5 path test2/.1
ID 6144 gen 454658 top level 6141 parent_uuid 
bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid 
db9ad2b1-aee1-544c-b368-d698b4a05119 path test2/.2
~#

On the receiving side, .1 is used as parent:

~# btrfs subvolume list -uq /var/run/media/antoine/backups/ | grep dest
ID 298 gen 443 top level 5 parent_uuid - uuid 
7695cba7-dfbf-2f44-bd79-18c9820fdb2f path dest/.1
ID 299 gen 443 top level 5 parent_uuid - uuid 
c32c06ec-0a17-cf42-9b04-3804ad72f836 path dest/foo
ID 300 gen 446 top level 5 parent_uuid 
7695cba7-dfbf-2f44-bd79-18c9820fdb2f uuid 
552b1d51-38bf-d546-a47f-4bc667ec4128 path dest/.2
~#

Regards,

--
Antoine

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: difference between -c and -p for send-receive?
  2017-09-20 19:05   ` Antoine Belvire
@ 2017-09-20 19:21     ` Andrei Borzenkov
  0 siblings, 0 replies; 7+ messages in thread
From: Andrei Borzenkov @ 2017-09-20 19:21 UTC (permalink / raw)
  To: Antoine Belvire, linux-btrfs

20.09.2017 22:05, Antoine Belvire пишет:
> Hello,
> 
>> All snapshots listed in -c options and snapshot that we want to
>> transfer must have the same parent uuid, unless -p is explicitly
>> provided.
> 
> It's rather the same mount point than the same parent uuid, like cp
> --reflink, isn't it?

Sorry, I do not understand this sentence. Could you rephrase?

> 
> ~# btrfs subvolume create /test2/
> Create subvolume '//test2'
> ~# btrfs subvolume create /test2/foo
> Create subvolume '/test2/foo'
> ~# cd /test2   
> ~# btrfs subvolume snapshot -r . .1
> Create a readonly snapshot of '.' in './.1'
> ~#
> ~# # a: 40 MiB in /test2/
> ~# dd if=/dev/urandom of=a bs=4k count=10k
> 10240+0 records in
> 10240+0 records out
> 41943040 bytes (42 MB, 40 MiB) copied, 0.198961 s, 211 MB/s
> ~#
> ~# # b: 80 MiB in /test2/foo
> ~# dd if=/dev/urandom of=foo/b bs=4k count=20k
> 0480+0 records in
> 20480+0 records out
> 83886080 bytes (84 MB, 80 MiB) copied, 0.393823 s, 213 MB/s
> ~#
> ~# # copy-clone /test2/foo/b to /test2/b
> ~# cp --reflink foo/b .
> ~#
> ~# btrfs subvolume -s . .2
> Create a readonly snapshot of '.' in './.2'
> ~#
> ~# # Sending .2 with only .1 as parent (.1 already sent)
> ~# btrfs send -p .1 .2 | wc -c
> At subvol .2
> 125909258 # 120 Mio = 'a' + 'b'
> ~#
> ~# # Sending .2 with .1 and foo as clone sources (.1 and foo already
> ~# # sent), .1 is automatically picked as parent
> ~# btrfs property set foo ro true
> ~# btrfs send -c .1 -c foo .2 | wc -c
> At subvol .2
> 41970349 # 40 Mio, only 'a'
> ~#
> 
> UUIDs on the sending side:
> 
> ~# btrfs subvolume list -uq / | grep test2
> ID 6141 gen 454658 top level 6049 parent_uuid - uuid
> bbf936dd-ca84-f749-9b9b-09f7081879a2 path test2
> ID 6142 gen 454658 top level 6141 parent_uuid - uuid
> 54a7cdea-6198-424a-9349-8116172d0c17 path test2/foo

Yes, sorry, I misread the code. We need at least one snapshot from -c
option that has the same parent uuid (i.e. is snapshot of the same
subvolume) as the snapshot we want to transfer, not all of them. In your
case

btrfs send -c .1 -c foo .2

it will select .1 as base snapshot and additionally try to clone from
foo if possible. In wiki example the only two snapshots are from
completely different subvolumes which will fail. In discussion that lead
to this mail snapshots probably did not have any parent uuid at all.

> ID 6143 gen 454655 top level 6141 parent_uuid
> bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid
> 28f1d7db-7341-f545-a2ac-d8819d22a5b5 path test2/.1
> ID 6144 gen 454658 top level 6141 parent_uuid
> bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid
> db9ad2b1-aee1-544c-b368-d698b4a05119 path test2/.2
> ~#
> 
> On the receiving side, .1 is used as parent:
> 
> ~# btrfs subvolume list -uq /var/run/media/antoine/backups/ | grep dest
> ID 298 gen 443 top level 5 parent_uuid - uuid
> 7695cba7-dfbf-2f44-bd79-18c9820fdb2f path dest/.1
> ID 299 gen 443 top level 5 parent_uuid - uuid
> c32c06ec-0a17-cf42-9b04-3804ad72f836 path dest/foo
> ID 300 gen 446 top level 5 parent_uuid
> 7695cba7-dfbf-2f44-bd79-18c9820fdb2f uuid
> 552b1d51-38bf-d546-a47f-4bc667ec4128 path dest/.2
> ~#
> 
> Regards,
> 
> -- 
> Antoine


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-20 19:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-19  0:41 difference between -c and -p for send-receive? Dave
2017-09-19  4:40 ` Duncan
2017-09-19 10:24 ` Graham Cobb
2017-09-19 11:30   ` Andrei Borzenkov
2017-09-20  4:06 ` Andrei Borzenkov
2017-09-20 19:05   ` Antoine Belvire
2017-09-20 19:21     ` Andrei Borzenkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).