From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from [195.159.176.226] ([195.159.176.226]:60340 "EHLO blaine.gmane.org" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1750713AbdISEki (ORCPT ); Tue, 19 Sep 2017 00:40:38 -0400 Received: from list by blaine.gmane.org with local (Exim 4.84_2) (envelope-from ) id 1duAKi-0002JI-B8 for linux-btrfs@vger.kernel.org; Tue, 19 Sep 2017 06:40:28 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: difference between -c and -p for send-receive? Date: Tue, 19 Sep 2017 04:40:22 +0000 (UTC) Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: Dave posted on Mon, 18 Sep 2017 20:41:45 -0400 as excerpted: >> Well, I do not immediately see why -c must imply incremental send. We >> want to reduce amount of data that is transferred, so reuse data from >> existing snapshots, but it is really orthogonal to whether we send full >> subvolume or just changes since another snapshot. >> >> > Starting months ago when I began using btrfs serious, I have been > reading, > rereading and trying to understand this: > > FAQ - btrfs Wiki > https://btrfs.wiki.kernel.org/index.php/ FAQ#What_is_the_difference_between_-c_and_-p_in_send.3F > > The comment above suddenly gives me another clue... > > However, I still don't understand terms like "clone range ioctl", > although I can guess it is something like a hard link. > > Would it be correct to say the following? > > 1. "-c" causes (appropriate) files in the newly transferred snapshot to > be "hard linked" to existing files in another snapshot on the > destination. Technically, it's not a hard link but a reflink. However, it's a reasonably accurate analogy for understanding the process, it's just at a different layer. > Doesn't "-p" do something equivalent though? Yes. See below for the difference. > 2. The -c and -p options can be used together or individually. Yes. > Questions: > > If "-c" "will send all of the metadata of @B.1, but will leave out the > data for @B.1/bigfile, because it's already in the backups filesystem, > and can be reflinked from there" what will -p do in contrast? > > Will "-p" not send all the metadata? > > Will "-p" also leave out the data for @B.1/bigfile, when it's also > already in the backups? -c is less strict than -p, and sends more metadata over the wire as a result, but where the data is the same (reflink points to the same extent), it won't be sent in either case. See below. > What would make me choose one of these options over the other? I still > struggle to see the difference. What -p does is tell send that the named snapshot is a snapshot of an earlier state of the snapshot being sent, and that said earlier-state snapshot exists on both the send and receive end, so only the changes (both data and metadata) from the earlier snapshot must be sent. Put a different way, the snapshot being sent is the parent, plus any changes since then, so to recreate the new snapshot, only the operations needed to update the state from the previous to the new state must be sent, and done by receive on the other end. -c is less strict than -p. It doesn't consider the named snapshot to be an earlier state of the snapshot being sent, but simply says that the two may have some data in common, as defined by reflinks to the same shared extents. So -c will send more over the wire, in particular, it'll send much more metadata, I believe (being no dev or expert, just a list regular) essentially all metadata, because no claim as to the relationship of the metadata between the snapshot being sent and the clone is assumed. But it can and does still assume that any extents reflinked in common can be simply sent by reference, instead of sending the literal data in that extent, because -c says the other end already has the snapshot named as a clone and that it can simply reflink it there, as well. The wording of the manpage description for -c suggests that it picks one (and only one if there's more than one) -c clone and considers it a parent, which would allow it to shortcut sending the metadata in common for it as well, but not being a dev, I haven't looked at the code to be sure, and in any case, there can be only one parent, so it can do it for only one clone, even if there's more than one -c snapshot supplied. So -p is primarily for the case where the named snapshot is an earlier state of the one being sent, and should be much more efficient than -c in that case. However, the less strict -c should also work, and if the wording of the manpage can be believed, a single named -c snapshot will be treated as -p anyway. But -c can also be used for snapshots that aren't related with one being an earlier state of the other, where there's simply some reflinks in common, perhaps due to dedup. It should still result in the data with the common reflinks being only sent by reference, but much more metadata will be sent, and if there's not a lot of reflinks in common, it's likely to require enough additional processing that the relatively trivial amount of common reflinked data it might save may not be worth it, compared to simply sending a full non- incremental snapshot. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman