linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Incremental send robustness question
Date: Fri, 14 Oct 2016 04:43:03 +0000 (UTC)	[thread overview]
Message-ID: <pan$f1d6c$3285aa5a$373244fa$922392c4@cox.net> (raw)
In-Reply-To: 20161012222955.GB2412@fox.home

Sean Greenslade posted on Wed, 12 Oct 2016 18:29:55 -0400 as excerpted:

> Hi, all. I have a question about a backup plan I have involving
> send/receive. As far as I can tell, there's no way to to resume a send
> that has been interrupted. In this case, my interruption comes from an
> overbearing firewall that doesn't like long-lived connections. I'm
> trying to do the initial (non-incremental) sync of the first snapshot
> from my main server to my backup endpoint. The snapshot is ~900 GiB, and
> the internet link is 25 Mbps, so this'll be going for quite a long time.
> 
> What I would like to do is "fake" the first snapshot transfer by
> rsync-ing the files over. So my question is this: if I rsync a subvolume
> (with the -a option to make all file times, permissions, ownerships,
> etc. the same),
> is that good enough to then be used as a parent for future incremental
> sends?

I see the specific questions have been answered, and alternatives 
explored in one direction, but I've another alternative, in a different 
direction, to suggest.

First a disclaimer.  I'm a btrfs user/sysadmin and regular on the list, 
but I'm not a dev, and my own use-case doesn't involve send/receive, so 
what I know regarding send/receive is from the list and manpages, not 
personal experience.  With that in mind...

It's worth noting that send/receive are subvolume-specific -- a send 
won't continue down into a subvolume.

Also note that in addition to -p/parent, there's -s/clone-src.  The 
latter is more flexible than the super-strict parent option, at the 
expense of a fatter send-stream as additional metadata is sent that 
specifies which clone the instructions are relative to.

It should be possible to use the combination of these two facts to split 
and recombine your send stream in a firewall-timeout-friendly manner, as 
long as no individual files are so big that sending an individual file 
exceeds the timeout.

1) Start by taking a read-only snapshot of your intended source 
subvolume, so you have an unchanging reference.

2) Take multiple writable snapshots of it, and selectively delete subdirs 
(and files if necessary) from each writable snapshot, trimming each one 
to a size that should pass the firewall without interruption, so that the 
combination of all these smaller subvolumes contains the content of the 
single larger one.

3) Take read-only snapshots of each of these smaller snapshots, suitable 
for sending.

4) Do a non-incremental send of each of these smaller snapshots to the 
remote.

If it's practical to keep the subvolume divisions, you can simply split 
the working tree into subvolumes and send those individually instead of 
doing the snapshot splitting above, in which case you can then use -p/
parent on each as you were trying to do on the original, and you can stop 
here.

If you need/prefer the single subvolume, continue...

5) Do an incremental send of the original full snapshot, using multiple
-c <src> options to list each of the smaller snapshots.  Since all the 
data has already been transferred in the smaller snapshot sends, this 
send should be all metadata, no actual data.  It'll simply be combining 
the individual reference subvolumes into a single larger subvolume once 
again.

6) Once you have the single larger subvolume on the receive side, you can 
delete the smaller snapshots as you now have a copy of the larger 
subvolume on each side to do further incremental sends of the working 
copy against.

7) I believe the first incremental send of the full working copy against 
the original larger snapshot will still have to use -c, while incremental 
sends based on that first one will be able to use the stricter but 
slimmer send-stream -p, with each one then using the previous one as the 
parent.  However, I'm not sure on that.  It may be that you have to 
continue using the fatter send-stream -c each time.

Again, I don't have send/receive experience of my own, so hopefully 
someone who does can reply either confirming that this should work and 
whether or not -p can be used after the initial setup, or explaining why 
the idea won't work, but at this point based on my own understanding, it 
seems like it should be perfectly workable to me. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


  parent reply	other threads:[~2016-10-14  4:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:29 Incremental send robustness question Sean Greenslade
2016-10-12 22:43 ` Hugo Mills
2016-10-12 22:45 ` Chris Murphy
2016-10-12 23:14 ` Hans van Kranenburg
2016-10-12 23:47   ` Sean Greenslade
2016-10-12 23:58     ` Hans van Kranenburg
2016-10-13 10:07     ` Graham Cobb
2016-10-14  4:43 ` Duncan [this message]
2016-10-16 15:57   ` Sean Greenslade

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$f1d6c$3285aa5a$373244fa$922392c4@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).