From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Cedric.dewijs@eclipso.eu
Cc: linux-btrfs@vger.kernel.org
Subject: Re: synchronize btrfs snapshots over a unreliable, slow connection
Date: Wed, 6 Jan 2021 20:59:55 -0500 [thread overview]
Message-ID: <20210107015955.GV31381@hungrycats.org> (raw)
In-Reply-To: <dc1e528567c9a57d089d77824f071af8@mail.eclipso.de>
On Mon, Jan 04, 2021 at 09:51:46PM +0100, wrote:
> I have a master NAS that makes one read only snapshot of my data per day. I want to transfer these snapshots to a slave NAS over a slow, unreliable internet connection. (it's a cheap provider). This rules out a "btrfs send -> ssh -> btrfs receive" construction, as that can't be resumed.
>
> Therefore I want to use rsync to synchronize the snapshots on the master NAS to the slave NAS.
>
> My thirst thought is something like this:
> 1) create a read-only snapshot on the master NAS:
> btrfs subvolume snapshot -r /mnt/nas/storage /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m)
> 2) send that data to the slave NAS like this:
> rsync --partial -var --compress --bwlimit=500KB -e "ssh -i ~/slave-nas.key" /mnt/nas/storage_snapshots/storage-$(date +%Y_%m_%d-%H%m) cedric@123.123.123.123/nas/storage
> 3) Restart rsync until all data is copied (by checking the error code of rsync, is it's 0 then all data has been transferred)
> 4) Create the read-only snapshot on the slave NAS with the same name as in step 1.
>
> Does somebody already has a script that does this?
Yes, and it is pretty much what you wrote above. You probably also
want rsync options -aXXHS and --del, possibly also --numeric-ids and/or
--fake-super depending on how exact you want this copy to be (i.e. should
it preserve uid/gids, do both NAS hosts have all the same user names but
different user IDs, do you want the receiver to run rsync as root or an
unprivileged user, etc).
> Is there a problem with this approach that I have not yet considered?
rsync will not propagate extent sharing to the receiver, and by default
if part of a file is modified, the entire file becomes unshared. If this
is a problem, you may want to run dedupe on the receiver.
If you omit the -S option and add --inplace to rsync, then there is better
extent sharing (now partially modified files don't unshare the entire file)
but you lose sparse file support (so files that have large holes will have
them filled in with zero-data blocks). This can result in a size increase
with some file formats, to astronomical sizes in the case of files like
/var/log/lastlog.
If the link can fail, then ssh commands to create snapshots on the receiver
can fail too. You can loop to retry those as well.
If it takes more than one day to propagate a snapshot over the link,
you will have to decide whether to let rsync keep trying to catch up,
or abort and start over from the next day's snapshot. You might want
to exit the rsync retry loop if the date changes while it's running.
A related question is what is expected when the sending host reboots.
Does it forget previous incomplete sends and just start a fresh rsync
with the current date's snapshot, or does it loop over all snapshots
in reverse order until it gets to one the receiver has, and then loops
forward from there to send each one from the backlog?
I solved the last two problems by not retaining the snapshots on the
sender side. Each rsync instance sends from its own freshly created
snapshot that is deleted as soon as rsync exits (or upon reboot after
a crash), and the receiver provides its own snapshot names. There is
no problem with backlog this way, but if you want to keep snapshots on
both sides of the SSH connection then this approach is not for you.
> ---
>
> Take your mailboxes with you. Free, fast and secure Mail & Cloud: https://www.eclipso.eu - Time to change!
>
>
prev parent reply other threads:[~2021-01-07 2:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection
2021-01-05 8:34 ` Forza
2021-01-05 11:24 ` Graham Cobb
2021-01-05 11:53 ` Roman Mamedov
2021-01-05 12:24 ` Cerem Cem ASLAN
2021-01-06 8:18 ` Forza
2021-01-07 2:06 ` Zygo Blaxell
2021-01-11 9:32 ` Cerem Cem ASLAN
2021-01-07 3:09 ` Zygo Blaxell
2021-01-07 19:22 ` Graham Cobb
2021-01-07 1:59 ` Zygo Blaxell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210107015955.GV31381@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=Cedric.dewijs@eclipso.eu \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox