* difference between -c and -p for send-receive?
@ 2017-09-19 0:41 Dave
2017-09-19 4:40 ` Duncan
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Dave @ 2017-09-19 0:41 UTC (permalink / raw)
To: linux-btrfs
new subject for new question
On Mon, Sep 18, 2017 at 1:37 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote:
> >> What scenarios can lead to "ERROR: parent determination failed"?
> >
> > The man page for btrfs-send is reasonably clear on the requirements
> > btrfs imposes. If you want to use incremental sends (i.e. the -c or -p
> > options) then the specified snapshots must exist on both the source and
> > destination. If you don't have a suitable existing snapshot then don't
> > use -c or -p and just do a full send.
> >
>
> Well, I do not immediately see why -c must imply incremental send. We
> want to reduce amount of data that is transferred, so reuse data from
> existing snapshots, but it is really orthogonal to whether we send full
> subvolume or just changes since another snapshot.
>
Starting months ago when I began using btrfs serious, I have been
reading, rereading and trying to understand this:
FAQ - btrfs Wiki
https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F
The comment above suddenly gives me another clue...
However, I still don't understand terms like "clone range ioctl",
although I can guess it is something like a hard link.
Would it be correct to say the following?
1. "-c" causes (appropriate) files in the newly transferred snapshot
to be "hard linked" to existing files in another snapshot on the
destination. Doesn't "-p" do something equivalent though?
2. The -c and -p options can be used together or individually.
Questions:
If "-c" "will send all of the metadata of @B.1, but will leave out the
data for @B.1/bigfile, because it's already in the backups filesystem,
and can be reflinked from there" what will -p do in contrast?
Will "-p" not send all the metadata?
Will "-p" also leave out the data for @B.1/bigfile, when it's also
already in the backups?
What would make me choose one of these options over the other? I still
struggle to see the difference.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: difference between -c and -p for send-receive? 2017-09-19 0:41 difference between -c and -p for send-receive? Dave @ 2017-09-19 4:40 ` Duncan 2017-09-19 10:24 ` Graham Cobb 2017-09-20 4:06 ` Andrei Borzenkov 2 siblings, 0 replies; 7+ messages in thread From: Duncan @ 2017-09-19 4:40 UTC (permalink / raw) To: linux-btrfs Dave posted on Mon, 18 Sep 2017 20:41:45 -0400 as excerpted: >> Well, I do not immediately see why -c must imply incremental send. We >> want to reduce amount of data that is transferred, so reuse data from >> existing snapshots, but it is really orthogonal to whether we send full >> subvolume or just changes since another snapshot. >> >> > Starting months ago when I began using btrfs serious, I have been > reading, > rereading and trying to understand this: > > FAQ - btrfs Wiki > https://btrfs.wiki.kernel.org/index.php/ FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F > > The comment above suddenly gives me another clue... > > However, I still don't understand terms like "clone range ioctl", > although I can guess it is something like a hard link. > > Would it be correct to say the following? > > 1. "-c" causes (appropriate) files in the newly transferred snapshot to > be "hard linked" to existing files in another snapshot on the > destination. Technically, it's not a hard link but a reflink. However, it's a reasonably accurate analogy for understanding the process, it's just at a different layer. > Doesn't "-p" do something equivalent though? Yes. See below for the difference. > 2. The -c and -p options can be used together or individually. Yes. > Questions: > > If "-c" "will send all of the metadata of @B.1, but will leave out the > data for @B.1/bigfile, because it's already in the backups filesystem, > and can be reflinked from there" what will -p do in contrast? > > Will "-p" not send all the metadata? > > Will "-p" also leave out the data for @B.1/bigfile, when it's also > already in the backups? -c is less strict than -p, and sends more metadata over the wire as a result, but where the data is the same (reflink points to the same extent), it won't be sent in either case. See below. > What would make me choose one of these options over the other? I still > struggle to see the difference. What -p does is tell send that the named snapshot is a snapshot of an earlier state of the snapshot being sent, and that said earlier-state snapshot exists on both the send and receive end, so only the changes (both data and metadata) from the earlier snapshot must be sent. Put a different way, the snapshot being sent is the parent, plus any changes since then, so to recreate the new snapshot, only the operations needed to update the state from the previous to the new state must be sent, and done by receive on the other end. -c is less strict than -p. It doesn't consider the named snapshot to be an earlier state of the snapshot being sent, but simply says that the two may have some data in common, as defined by reflinks to the same shared extents. So -c will send more over the wire, in particular, it'll send much more metadata, I believe (being no dev or expert, just a list regular) essentially all metadata, because no claim as to the relationship of the metadata between the snapshot being sent and the clone is assumed. But it can and does still assume that any extents reflinked in common can be simply sent by reference, instead of sending the literal data in that extent, because -c says the other end already has the snapshot named as a clone and that it can simply reflink it there, as well. The wording of the manpage description for -c suggests that it picks one (and only one if there's more than one) -c clone and considers it a parent, which would allow it to shortcut sending the metadata in common for it as well, but not being a dev, I haven't looked at the code to be sure, and in any case, there can be only one parent, so it can do it for only one clone, even if there's more than one -c snapshot supplied. So -p is primarily for the case where the named snapshot is an earlier state of the one being sent, and should be much more efficient than -c in that case. However, the less strict -c should also work, and if the wording of the manpage can be believed, a single named -c snapshot will be treated as -p anyway. But -c can also be used for snapshots that aren't related with one being an earlier state of the other, where there's simply some reflinks in common, perhaps due to dedup. It should still result in the data with the common reflinks being only sent by reference, but much more metadata will be sent, and if there's not a lot of reflinks in common, it's likely to require enough additional processing that the relatively trivial amount of common reflinked data it might save may not be worth it, compared to simply sending a full non- incremental snapshot. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: difference between -c and -p for send-receive? 2017-09-19 0:41 difference between -c and -p for send-receive? Dave 2017-09-19 4:40 ` Duncan @ 2017-09-19 10:24 ` Graham Cobb 2017-09-19 11:30 ` Andrei Borzenkov 2017-09-20 4:06 ` Andrei Borzenkov 2 siblings, 1 reply; 7+ messages in thread From: Graham Cobb @ 2017-09-19 10:24 UTC (permalink / raw) To: linux-btrfs On 19/09/17 01:41, Dave wrote: > Would it be correct to say the following? Like Duncan, I am just a user, and I haven't checked the code. I recommend Duncan's explanation, but in case you are looking for something simpler, how about thinking with the following analogy... Think of -p as like doing an incremental backup: it tells send to just send the instructions for the changes to get from the "parent" subvolume to the current subvolume. Without -p it is like a full backup: everything in the current subvolume is sent. -c is different: it says "and by the way, these files also already exist on the destination so they might be useful to skip actually sending some of the file contents". Imagine that whenever a file content is about to be sent (whether incremental or full), btrfs-send checks to see if the data is in one of the -c subvolumes and, if it is, it sends "get the data by reflinking to this file over here" instead of sending the data itself. -c is really just an optimisation to save sending data if you know the data is already available somewhere else on the destination. Be aware that this is really just an analogy (like "hard linking" is an analogy for reflinking using the clone range ioctl). Duncan's email provides more real details. In particular, this analogy doesn't explain the original questioner's problem. In the analogy, -c might work without the files actually being present on the source (as long as they are on the destination). But, in reality, because the underlying mechanism is extent range cloning, the files have to be present on **both** the source and the destination in order for btrfs-send to work out what commands to send. By the way, like Duncan, I was surprised that the man page suggests that -c without -p causes one of the clones to be treated as a parent. I have not checked the code to see if that is actually how it works. Graham ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: difference between -c and -p for send-receive? 2017-09-19 10:24 ` Graham Cobb @ 2017-09-19 11:30 ` Andrei Borzenkov 0 siblings, 0 replies; 7+ messages in thread From: Andrei Borzenkov @ 2017-09-19 11:30 UTC (permalink / raw) To: Graham Cobb; +Cc: Btrfs BTRFS On Tue, Sep 19, 2017 at 1:24 PM, Graham Cobb <g.btrfs@cobb.uk.net> wrote: > On 19/09/17 01:41, Dave wrote: >> Would it be correct to say the following? > > Like Duncan, I am just a user, and I haven't checked the code. I > recommend Duncan's explanation, but in case you are looking for > something simpler, how about thinking with the following analogy... > > Think of -p as like doing an incremental backup: it tells send to just > send the instructions for the changes to get from the "parent" subvolume > to the current subvolume. Without -p it is like a full backup: > everything in the current subvolume is sent. > > -c is different: It is not really different - it is extra. You have -p and optionally -c which modifies its behavior. > it says "and by the way, these files also already exist > on the destination so they might be useful to skip actually sending some > of the file contents". Imagine that whenever a file content is about to > be sent (whether incremental or full), btrfs-send checks to see if the > data is in one of the -c subvolumes and, if it is, it sends "get the > data by reflinking to this file over here" instead of sending the data > itself. -c is really just an optimisation to save sending data if you > know the data is already available somewhere else on the destination. > > Be aware that this is really just an analogy (like "hard linking" is an > analogy for reflinking using the clone range ioctl). Duncan's email > provides more real details. > > In particular, this analogy doesn't explain the original questioner's > problem. In the analogy, -c might work without the files actually being > present on the source (as long as they are on the destination). But, in > reality, because the underlying mechanism is extent range cloning, the > files have to be present on **both** the source and the destination in > order for btrfs-send to work out what commands to send. > Yes. Decision whether to send full data or reflink is taken on source, so data must be present on source. > By the way, like Duncan, I was surprised that the man page suggests that > -c without -p causes one of the clones to be treated as a parent. I have > not checked the code to see if that is actually how it works. > It is. As implemented, -c *requires* parent snapshot, either explicitly via -p option or implicitly. What it does: a) checks that both snapshot to transfer and all snapshots given as arguments to -c have the same parent uuid; b) selects "best match" by comparing how close snapshots from -c option are to parent. As far as I can tell it chooses the oldest snapshot (with minimal difference to the parent) as base (implicit -p). Which implies that "btrfs send -c foo bar" is entirely equivalent to "btrfs send -p foo bar". Which still does not explain why script fails. As mentioned, as snapshots created by snapper should have the same parent uuid, which leaves only possibility of non-existent subvolume, but then script should have failed much earlier. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: difference between -c and -p for send-receive? 2017-09-19 0:41 difference between -c and -p for send-receive? Dave 2017-09-19 4:40 ` Duncan 2017-09-19 10:24 ` Graham Cobb @ 2017-09-20 4:06 ` Andrei Borzenkov 2017-09-20 19:05 ` Antoine Belvire 2 siblings, 1 reply; 7+ messages in thread From: Andrei Borzenkov @ 2017-09-20 4:06 UTC (permalink / raw) To: Dave, linux-btrfs 19.09.2017 03:41, Dave пишет: > new subject for new question > > On Mon, Sep 18, 2017 at 1:37 PM, Andrei Borzenkov <arvidjaar@gmail.com> wrote: > >>>> What scenarios can lead to "ERROR: parent determination failed"? >>> >>> The man page for btrfs-send is reasonably clear on the requirements >>> btrfs imposes. If you want to use incremental sends (i.e. the -c or -p >>> options) then the specified snapshots must exist on both the source and >>> destination. If you don't have a suitable existing snapshot then don't >>> use -c or -p and just do a full send. >>> >> >> Well, I do not immediately see why -c must imply incremental send. We >> want to reduce amount of data that is transferred, so reuse data from >> existing snapshots, but it is really orthogonal to whether we send full >> subvolume or just changes since another snapshot. >> > > Starting months ago when I began using btrfs serious, I have been > reading, rereading and trying to understand this: > > FAQ - btrfs Wiki > https://btrfs.wiki.kernel.org/index.php/FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F > This wiki entry is wrong (and as long as I can believe git, it has always been wrong). First, "btrfs send -c" does not start with blank subvolume; it starts with "best parent" which is determined automatically. Actually if you look at the help output in the very first version of send command: "By default, this will send the whole subvolume. To do", "an incremental send, one or multiple '-i <clone_source>'", "arguments have to be specified. A 'clone source' is", "a subvolume that is known to exist on the receiving", "side in exactly the same state as on the sending side.\n", "Normally, a good snapshot parent is searched automatically", "in the list of 'clone sources'. To override this, use", "'-p <parent>' to manually specify a snapshot parent.", it explains fat better what -c and -p do (ignore -i, this is error that was fixed later, it means -c). Second, example in wiki simply does not work. All snapshots listed in -c options and snapshot that we want to transfer must have the same parent uuid, unless -p is explicitly provided. Example shows snapshots of two different subvolumes. I could not make it work even if A and B themselves are cloned from common subvolume. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: difference between -c and -p for send-receive? 2017-09-20 4:06 ` Andrei Borzenkov @ 2017-09-20 19:05 ` Antoine Belvire 2017-09-20 19:21 ` Andrei Borzenkov 0 siblings, 1 reply; 7+ messages in thread From: Antoine Belvire @ 2017-09-20 19:05 UTC (permalink / raw) To: linux-btrfs, arvidjaar Hello, > All snapshots listed in -c options and snapshot that we want to > transfer must have the same parent uuid, unless -p is explicitly > provided. It's rather the same mount point than the same parent uuid, like cp --reflink, isn't it? ~# btrfs subvolume create /test2/ Create subvolume '//test2' ~# btrfs subvolume create /test2/foo Create subvolume '/test2/foo' ~# cd /test2 ~# btrfs subvolume snapshot -r . .1 Create a readonly snapshot of '.' in './.1' ~# ~# # a: 40 MiB in /test2/ ~# dd if=/dev/urandom of=a bs=4k count=10k 10240+0 records in 10240+0 records out 41943040 bytes (42 MB, 40 MiB) copied, 0.198961 s, 211 MB/s ~# ~# # b: 80 MiB in /test2/foo ~# dd if=/dev/urandom of=foo/b bs=4k count=20k 0480+0 records in 20480+0 records out 83886080 bytes (84 MB, 80 MiB) copied, 0.393823 s, 213 MB/s ~# ~# # copy-clone /test2/foo/b to /test2/b ~# cp --reflink foo/b . ~# ~# btrfs subvolume -s . .2 Create a readonly snapshot of '.' in './.2' ~# ~# # Sending .2 with only .1 as parent (.1 already sent) ~# btrfs send -p .1 .2 | wc -c At subvol .2 125909258 # 120 Mio = 'a' + 'b' ~# ~# # Sending .2 with .1 and foo as clone sources (.1 and foo already ~# # sent), .1 is automatically picked as parent ~# btrfs property set foo ro true ~# btrfs send -c .1 -c foo .2 | wc -c At subvol .2 41970349 # 40 Mio, only 'a' ~# UUIDs on the sending side: ~# btrfs subvolume list -uq / | grep test2 ID 6141 gen 454658 top level 6049 parent_uuid - uuid bbf936dd-ca84-f749-9b9b-09f7081879a2 path test2 ID 6142 gen 454658 top level 6141 parent_uuid - uuid 54a7cdea-6198-424a-9349-8116172d0c17 path test2/foo ID 6143 gen 454655 top level 6141 parent_uuid bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid 28f1d7db-7341-f545-a2ac-d8819d22a5b5 path test2/.1 ID 6144 gen 454658 top level 6141 parent_uuid bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid db9ad2b1-aee1-544c-b368-d698b4a05119 path test2/.2 ~# On the receiving side, .1 is used as parent: ~# btrfs subvolume list -uq /var/run/media/antoine/backups/ | grep dest ID 298 gen 443 top level 5 parent_uuid - uuid 7695cba7-dfbf-2f44-bd79-18c9820fdb2f path dest/.1 ID 299 gen 443 top level 5 parent_uuid - uuid c32c06ec-0a17-cf42-9b04-3804ad72f836 path dest/foo ID 300 gen 446 top level 5 parent_uuid 7695cba7-dfbf-2f44-bd79-18c9820fdb2f uuid 552b1d51-38bf-d546-a47f-4bc667ec4128 path dest/.2 ~# Regards, -- Antoine ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: difference between -c and -p for send-receive? 2017-09-20 19:05 ` Antoine Belvire @ 2017-09-20 19:21 ` Andrei Borzenkov 0 siblings, 0 replies; 7+ messages in thread From: Andrei Borzenkov @ 2017-09-20 19:21 UTC (permalink / raw) To: Antoine Belvire, linux-btrfs 20.09.2017 22:05, Antoine Belvire пишет: > Hello, > >> All snapshots listed in -c options and snapshot that we want to >> transfer must have the same parent uuid, unless -p is explicitly >> provided. > > It's rather the same mount point than the same parent uuid, like cp > --reflink, isn't it? Sorry, I do not understand this sentence. Could you rephrase? > > ~# btrfs subvolume create /test2/ > Create subvolume '//test2' > ~# btrfs subvolume create /test2/foo > Create subvolume '/test2/foo' > ~# cd /test2 > ~# btrfs subvolume snapshot -r . .1 > Create a readonly snapshot of '.' in './.1' > ~# > ~# # a: 40 MiB in /test2/ > ~# dd if=/dev/urandom of=a bs=4k count=10k > 10240+0 records in > 10240+0 records out > 41943040 bytes (42 MB, 40 MiB) copied, 0.198961 s, 211 MB/s > ~# > ~# # b: 80 MiB in /test2/foo > ~# dd if=/dev/urandom of=foo/b bs=4k count=20k > 0480+0 records in > 20480+0 records out > 83886080 bytes (84 MB, 80 MiB) copied, 0.393823 s, 213 MB/s > ~# > ~# # copy-clone /test2/foo/b to /test2/b > ~# cp --reflink foo/b . > ~# > ~# btrfs subvolume -s . .2 > Create a readonly snapshot of '.' in './.2' > ~# > ~# # Sending .2 with only .1 as parent (.1 already sent) > ~# btrfs send -p .1 .2 | wc -c > At subvol .2 > 125909258 # 120 Mio = 'a' + 'b' > ~# > ~# # Sending .2 with .1 and foo as clone sources (.1 and foo already > ~# # sent), .1 is automatically picked as parent > ~# btrfs property set foo ro true > ~# btrfs send -c .1 -c foo .2 | wc -c > At subvol .2 > 41970349 # 40 Mio, only 'a' > ~# > > UUIDs on the sending side: > > ~# btrfs subvolume list -uq / | grep test2 > ID 6141 gen 454658 top level 6049 parent_uuid - uuid > bbf936dd-ca84-f749-9b9b-09f7081879a2 path test2 > ID 6142 gen 454658 top level 6141 parent_uuid - uuid > 54a7cdea-6198-424a-9349-8116172d0c17 path test2/foo Yes, sorry, I misread the code. We need at least one snapshot from -c option that has the same parent uuid (i.e. is snapshot of the same subvolume) as the snapshot we want to transfer, not all of them. In your case btrfs send -c .1 -c foo .2 it will select .1 as base snapshot and additionally try to clone from foo if possible. In wiki example the only two snapshots are from completely different subvolumes which will fail. In discussion that lead to this mail snapshots probably did not have any parent uuid at all. > ID 6143 gen 454655 top level 6141 parent_uuid > bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid > 28f1d7db-7341-f545-a2ac-d8819d22a5b5 path test2/.1 > ID 6144 gen 454658 top level 6141 parent_uuid > bbf936dd-ca84-f749-9b9b-09f7081879a2 uuid > db9ad2b1-aee1-544c-b368-d698b4a05119 path test2/.2 > ~# > > On the receiving side, .1 is used as parent: > > ~# btrfs subvolume list -uq /var/run/media/antoine/backups/ | grep dest > ID 298 gen 443 top level 5 parent_uuid - uuid > 7695cba7-dfbf-2f44-bd79-18c9820fdb2f path dest/.1 > ID 299 gen 443 top level 5 parent_uuid - uuid > c32c06ec-0a17-cf42-9b04-3804ad72f836 path dest/foo > ID 300 gen 446 top level 5 parent_uuid > 7695cba7-dfbf-2f44-bd79-18c9820fdb2f uuid > 552b1d51-38bf-d546-a47f-4bc667ec4128 path dest/.2 > ~# > > Regards, > > -- > Antoine ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-09-20 19:21 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-19 0:41 difference between -c and -p for send-receive? Dave 2017-09-19 4:40 ` Duncan 2017-09-19 10:24 ` Graham Cobb 2017-09-19 11:30 ` Andrei Borzenkov 2017-09-20 4:06 ` Andrei Borzenkov 2017-09-20 19:05 ` Antoine Belvire 2017-09-20 19:21 ` Andrei Borzenkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).