From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Forza <forza@tnonline.net>
Cc: Cerem Cem ASLAN <ceremcem@ceremcem.net>,
Graham Cobb <g.btrfs@cobb.uk.net>,
Cedric.dewijs@eclipso.eu,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: synchronize btrfs snapshots over a unreliable, slow connection
Date: Wed, 6 Jan 2021 21:06:04 -0500 [thread overview]
Message-ID: <20210107020604.GW31381@hungrycats.org> (raw)
In-Reply-To: <b9662cf1-e45f-5113-5b23-bf1aaa73cb97@tnonline.net>
On Wed, Jan 06, 2021 at 09:18:30AM +0100, Forza wrote:
>
>
> On 2021-01-05 13:24, Cerem Cem ASLAN wrote:
> > I also thought about a different approach in the past:
> >
> > 1. Take a snapshot and rsync it to the server.
> > 2. When it succeeds, make it readonly and take a note on the remote
> > site that indicates the Received_UUID and checksum of entire
> > subvolume.
> > 3. When you want to send your diff, run `btrfs send -p ./first
> > ./second | list-file-changes -o my-diff-for-second.txt` if that
> > Received_UUID on the remote site matches with ./first. (Otherwise, you
> > should run rsync without taking advantage of
> > `my-diff-for-second.txt`.)
>
> You can use `btrbk diff old-snap new-snap` to list changes between
> snapshots.
>
> Example:
> ------------------------------------------------------------------------------
> #btrbk diff /mnt/systemRoot/snapshots/root.20210101T0001/
> /mnt/systemRoot/snapshots/root.20210102T0001/
>
> Subvolume Diff (btrbk command line client, version 0.30.0)
>
> Date: Wed Jan 6 09:06:37 2021
>
> Showing changed files for subvolume:
> /mnt/systemRoot/snapshots/root.20210102T0001 (gen=6050233)
>
> Starting at generation after subvolume:
> /mnt/systemRoot/snapshots/root.20210101T0001 (gen=6046626)
>
> This will show all files modified within generation range:
> [6046627..6050233]
> Newest file generation (transid marker) was: 6050233
>
> Legend:
> +.. file accessed at offset 0 (at least once)
> .c. flags COMPRESS or COMPRESS|INLINE set (at least once)
> ..i flags INLINE or COMPRESS|INLINE set (at least once)
> <count> file was modified in <count> generations
> <size> file was modified for a total of <size> bytes
> ------------------------------------------------------------------------------
> +ci 1 1318 etc/csh.env
> +ci 1 2116 etc/dispatch-conf.conf
> +ci 1 1111 etc/environment.d/10-gentoo-env.conf
> +ci 1 2000 etc/etc-update.conf
> +c. 1 94208 etc/ld.so.cache
> ...
> ------------------------------------------------------------------------------
>
> You can also use `btrfs find-new` to list filesystem changes, but the output
> is much more verbose than that of btrbk, and you need to figure out the
> generation id's first. I also think that some things like deleted files and
> renamed files do not get listed? [*]
find-new runs TREE_SEARCH to find everything in subvol metadata pages that
were unshared since the given transid. It then filters out references to
file data that are older than the given transid, and prints what is left.
It's roughly all the new extents in the subvol since the given transid. No
deletions, (it has nothing to compare against to know something is now
no longer there), no file attributes, no new clones or reflinks of old data
(i.e. after 'cp --reflink=always old_file old_file_2', old_file_2 will
not show up in find-new).
> Example:
> ------------------------------------------------------------------------------
> # btrfs subvolume find-new /mnt/systemRoot/snapshots/root.20210102T0001/
> 6046626
>
> inode 3054490 file offset 0 len 8192 disk start 239676399616 offset 0 gen
> 6048209 flags COMPRESS etc/passwd-
> inode 9527306 file offset 0 len 4096 disk start 239792578560 offset 0 gen
> 6049979 flags COMPRESS var/lib/dhcp/dhclient.leases
> inode 9527306 file offset 4096 len 4096 disk start 239437688832 offset 0 gen
> 6050179 flags COMPRESS var/lib/dhcp/dhclient.leases
> inode 9527306 file offset 8192 len 4096 disk start 241226248192 offset 0 gen
> 6050220 flags NONE var/lib/dhcp/dhclient.leases
> inode 9527438 file offset 0 len 4096 disk start 244439986176 offset 0 gen
> 6049681 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 4096 len 4096 disk start 244569776128 offset 0 gen
> 6050217 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 8192 len 4096 disk start 243901612032 offset 0 gen
> 6049543 flags NONE var/lib/samba/wins.tdb
> inode 9527438 file offset 12288 len 8192 disk start 242191458304 offset 4096
> gen 6048901 flags PREALLOC var/lib/samba/wins.tdb
> inode 9527438 file offset 20480 len 4096 disk start 244319576064 offset 0
> gen 6049691 flags NONE var/lib/samba/wins.tdb
> ------------------------------------------------------------------------------
>
> > 4. Use rsync to send the changed files listed in `my-diff-for-second.txt`.
> > 5. Verify by using a rolling hash, create a second snapshot and so on.
> >
> > That approach will use all advantages of rsync and adds the "change
> > detection" benefit from BTRFS. The problem is, I don't know how to
> > implement the `list-file-changes` tool.
> >
> > By the way, why wouldn't BTRFS keep a CHECKSUM field on readonly
> > subvolumes and simply use that field for diff and patch operations?
> > Calculating incremental checksums on every new readonly snapshot seems
> > like a computationally cheap operation. We could then transfer our
> > snapshots whatever method/tool we like (even we could create the
> > /home/foo/hello.txt file with "hello world" content manually and then
> > create another snapshot that will automatically match with our new
> > local snapshot).
> >
> [*]http://marc.merlins.org/perso/btrfs/post_2014-05-19_Btrfs-diff-Between-Snapshots.html
next prev parent reply other threads:[~2021-01-07 2:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection
2021-01-05 8:34 ` Forza
2021-01-05 11:24 ` Graham Cobb
2021-01-05 11:53 ` Roman Mamedov
2021-01-05 12:24 ` Cerem Cem ASLAN
2021-01-06 8:18 ` Forza
2021-01-07 2:06 ` Zygo Blaxell [this message]
2021-01-11 9:32 ` Cerem Cem ASLAN
2021-01-07 3:09 ` Zygo Blaxell
2021-01-07 19:22 ` Graham Cobb
2021-01-07 1:59 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210107020604.GW31381@hungrycats.org \
--to=ce3g8jdj@umail.furryterror.org \
--cc=Cedric.dewijs@eclipso.eu \
--cc=ceremcem@ceremcem.net \
--cc=forza@tnonline.net \
--cc=g.btrfs@cobb.uk.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox