Linux Btrfs filesystem development
 help / color / mirror / Atom feed
From: Forza <forza@tnonline.net>
To: Cerem Cem ASLAN <ceremcem@ceremcem.net>,
	Graham Cobb <g.btrfs@cobb.uk.net>
Cc: Cedric.dewijs@eclipso.eu, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: synchronize btrfs snapshots over a unreliable, slow connection
Date: Wed, 6 Jan 2021 09:18:30 +0100	[thread overview]
Message-ID: <b9662cf1-e45f-5113-5b23-bf1aaa73cb97@tnonline.net> (raw)
In-Reply-To: <CAN4oSBcL7ae_qwKDDoP=sbjkR4gcweTO8otEQv1Zh0YhStWZsw@mail.gmail.com>



On 2021-01-05 13:24, Cerem Cem ASLAN wrote:
> I also thought about a different approach in the past:
> 
> 1. Take a snapshot and rsync it to the server.
> 2. When it succeeds, make it readonly and take a note on the remote
> site that indicates the Received_UUID and checksum of entire
> subvolume.
> 3. When you want to send your diff, run `btrfs send -p ./first
> ./second | list-file-changes -o my-diff-for-second.txt` if that
> Received_UUID on the remote site matches with ./first. (Otherwise, you
> should run rsync without taking advantage of
> `my-diff-for-second.txt`.)

You can use `btrbk diff old-snap new-snap` to list changes between 
snapshots.

Example:
------------------------------------------------------------------------------
#btrbk diff /mnt/systemRoot/snapshots/root.20210101T0001/ 
/mnt/systemRoot/snapshots/root.20210102T0001/

Subvolume Diff (btrbk command line client, version 0.30.0)

     Date:   Wed Jan  6 09:06:37 2021

Showing changed files for subvolume:
   /mnt/systemRoot/snapshots/root.20210102T0001  (gen=6050233)

Starting at generation after subvolume:
   /mnt/systemRoot/snapshots/root.20210101T0001  (gen=6046626)

This will show all files modified within generation range: 
[6046627..6050233]
Newest file generation (transid marker) was: 6050233

Legend:
     +..     file accessed at offset 0 (at least once)
     .c.     flags COMPRESS or COMPRESS|INLINE set (at least once)
     ..i     flags INLINE or COMPRESS|INLINE set (at least once)
     <count> file was modified in <count> generations
     <size>  file was modified for a total of <size> bytes
------------------------------------------------------------------------------
+ci   1       1318  etc/csh.env
+ci   1       2116  etc/dispatch-conf.conf
+ci   1       1111  etc/environment.d/10-gentoo-env.conf
+ci   1       2000  etc/etc-update.conf
+c.   1      94208  etc/ld.so.cache
...
------------------------------------------------------------------------------

You can also use `btrfs find-new` to list filesystem changes, but the 
output is much more verbose than that of btrbk, and you need to figure 
out the generation id's first. I also think that some things like 
deleted files and renamed files do not get listed? [*]

Example:
------------------------------------------------------------------------------
# btrfs subvolume find-new /mnt/systemRoot/snapshots/root.20210102T0001/ 
6046626

inode 3054490 file offset 0 len 8192 disk start 239676399616 offset 0 
gen 6048209 flags COMPRESS etc/passwd-
inode 9527306 file offset 0 len 4096 disk start 239792578560 offset 0 
gen 6049979 flags COMPRESS var/lib/dhcp/dhclient.leases
inode 9527306 file offset 4096 len 4096 disk start 239437688832 offset 0 
gen 6050179 flags COMPRESS var/lib/dhcp/dhclient.leases
inode 9527306 file offset 8192 len 4096 disk start 241226248192 offset 0 
gen 6050220 flags NONE var/lib/dhcp/dhclient.leases
inode 9527438 file offset 0 len 4096 disk start 244439986176 offset 0 
gen 6049681 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 4096 len 4096 disk start 244569776128 offset 0 
gen 6050217 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 8192 len 4096 disk start 243901612032 offset 0 
gen 6049543 flags NONE var/lib/samba/wins.tdb
inode 9527438 file offset 12288 len 8192 disk start 242191458304 offset 
4096 gen 6048901 flags PREALLOC var/lib/samba/wins.tdb
inode 9527438 file offset 20480 len 4096 disk start 244319576064 offset 
0 gen 6049691 flags NONE var/lib/samba/wins.tdb
------------------------------------------------------------------------------

> 4. Use rsync to send the changed files listed in `my-diff-for-second.txt`.
> 5. Verify by using a rolling hash, create a second snapshot and so on.
> 
> That approach will use all advantages of rsync and adds the "change
> detection" benefit from BTRFS. The problem is, I don't know how to
> implement the `list-file-changes` tool.
> 
> By the way, why wouldn't BTRFS keep a CHECKSUM field on readonly
> subvolumes and simply use that field for diff and patch operations?
> Calculating incremental checksums on every new readonly snapshot seems
> like a computationally cheap operation. We could then transfer our
> snapshots whatever method/tool we like (even we could create the
> /home/foo/hello.txt file with "hello world" content manually and then
> create another snapshot that will automatically match with our new
> local snapshot).
> 
[*]http://marc.merlins.org/perso/btrfs/post_2014-05-19_Btrfs-diff-Between-Snapshots.html

  reply	other threads:[~2021-01-06  8:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-04 20:51 synchronize btrfs snapshots over a unreliable, slow connection  
2021-01-05  8:34 ` Forza
2021-01-05 11:24   ` Graham Cobb
2021-01-05 11:53     ` Roman Mamedov
2021-01-05 12:24     ` Cerem Cem ASLAN
2021-01-06  8:18       ` Forza [this message]
2021-01-07  2:06         ` Zygo Blaxell
2021-01-11  9:32         ` Cerem Cem ASLAN
2021-01-07  3:09   ` Zygo Blaxell
2021-01-07 19:22     ` Graham Cobb
2021-01-07  1:59 ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9662cf1-e45f-5113-5b23-bf1aaa73cb97@tnonline.net \
    --to=forza@tnonline.net \
    --cc=Cedric.dewijs@eclipso.eu \
    --cc=ceremcem@ceremcem.net \
    --cc=g.btrfs@cobb.uk.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox