linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Incremental send receive of snapshot fails
@ 2016-12-28 11:50 Rene Wolf
  2016-12-29 15:31 ` Giuseppe Della Bianca
  0 siblings, 1 reply; 4+ messages in thread
From: Rene Wolf @ 2016-12-28 11:50 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi all


I have a problem with incremental snapshot send receive in btrfs. May be 
my google-fu is weak, but I couldn't find any pointers, so here goes.


A few words about my setup first:

I have multiple clients that back up to a central server. All clients 
(and the server) are running a (K)Ubuntu 16.10 64Bit on btrfs. Backing 
up works with btrfs send / receive, either full or incremental, 
depending on whats available on the server side. All clients have the 
usual (Ubuntu) btrfs layout: 2 subvolumes, one for / and one for /home; 
explicit entries in fstab; root volume not mounted anywhere. For further 
details see the P.s. at the end.


Here's what happens:

In general I stick to the example form 
https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Backing up 
is done daily by a script, and it works successfully on all of my 
clients except one (called "lab").

I start with the first snapshot on "lab" and do a full send to the 
server. This works as expected (sending takes some hours as it is done 
over wifi+ssh). After that is done I send an incremental snapshot based 
on the previous parent. This also works as expected (no error etc). 
Sending deltas then happens once a day, with the script always keeping 
the last two snapshots on the client and many more on the server. Also 
after each run of the script I do a bit of "house keeping" to prevent 
"disk full" etc (see below p.s. for commands).

I can't exactly say when, but after some time (possibly the next day) 
snapshot sending fails with an error on the receiving end:
ERROR: unlink some/file failed. No such file or directory

Some searching around lead me to this 
https://bugzilla.kernel.org/show_bug.cgi?id=60673 . So I checked to make 
sure my script doesn't use the wrong parent; and it does not. But to 
make really sure I tried a send / receive directly on "lab" without the 
server:

# btrfs subvol snap -r / /.back/new_snap
> Create a readonly snapshot of '/' in '/.back/new_snap'

# btrfs subv show /.back/last_snap_by_script
> /.back/last_snap_by_script
>         Name:                   last_snap_by_script
>         UUID:                   b4634a8b-b74b-154a-9f17-1115f6d07524
>         Parent UUID:            b5f9a301-69f7-0646-8cf1-ba29e0c24fac
>         Received UUID:          196a0866-cd05-d24e-bac6-84e8e5eb037a
>         Creation time:          2016-12-27 17:55:10 +0100
>         Subvolume ID:           486
>         Generation:             52036
>         Gen at creation:        51524
>         Parent ID:              257
>         Top level ID:           257
>         Flags:                  readonly
>         Snapshot(s):

# btrfs subv show /.back/new_snap
> /.back/new_snap
>         Name:                   new_snap
>         UUID:                   fca51929-8101-db45-8df6-f25935c04f98
>         Parent UUID:            b5f9a301-69f7-0646-8cf1-ba29e0c24fac
>         Received UUID:          196a0866-cd05-d24e-bac6-84e8e5eb037a
>         Creation time:          2016-12-28 11:51:43 +0100
>         Subvolume ID:           506
>         Generation:             52271
>         Gen at creation:        52271
>         Parent ID:              257
>         Top level ID:           257
>         Flags:                  readonly
>         Snapshot(s):

# btrfs send -p /.back/last_snap_by_script /.back/new_snap > delta
> At subvol /.back/new_snap

# btrfs subvol del /.back/new_snap
> Delete subvolume (no-commit): '/.back/new_snap'

# cat delta | btrfs receive /.back/
> At snapshot new_snap
> ERROR: unlink some/file failed. No such file or directory

And the receive always fails with some ERROR similar to the above! What 
I find a bit odd is the identical "Received UUID", even before new_snap 
was sent / received ... but maybe that's normal?

If instead of "last_snap_by_script" I also create a new read only 
snapshot and send the delta between these two "new" ones, everything 
works as expected. But then there's little differences between the two 
new snaps ...

I tried to look for differences between the "lab" client and another one 
("navi") where backing up works. So far I couldn't really find anything. 
I did create both file systems at different points in time (possibly 
with different kernels). All fs were created as btrfs and not 
"converted" from ext. "lab" has an SSD, "navi" a spinning disc. Both 
systems run on Intel CPUs in 64Bit ...


So now I have a snapshot on "lab" which I cannot use as a parent, but 
why? What did I do wrong? The whole procedure does work on my other 
clients (with the exact same script), why not on the "lab" client? And 
this is a re-occuring problem: I tried deleting all of the snaps (on 
both ends) and start all over again ... it will again end up with a 
"broken" snapshot eventually.


Up until now using btrfs has been a great experience and I always could 
resolve my troubles quite quickly, but this time I don't know what to do?
Thanks in advance for any suggestions and feel free to ask for other / 
missing details :-)


Regards
Rene


P.s.: here's my system info from the failing client "lab"

$ uname -a
Linux lab 4.8.0-32-generic #34-Ubuntu SMP Tue Dec 13 14:30:43 UTC 2016 
x86_64 x86_64 x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v4.7.3

# btrfs fi show
Label: 'SSD'  uuid: 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a
         Total devices 1 FS bytes used 37.62GiB
         devid    1 size 55.90GiB used 41.03GiB path /dev/sdb1

# btrfs fi df /
Data, single: total=40.00GiB, used=37.09GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=1.00GiB, used=543.08MiB
GlobalReserve, single: total=112.38MiB, used=0.00B

$ mount | grep btrfs
/dev/sdb1 on / type btrfs 
(rw,noatime,ssd,space_cache,subvolid=257,subvol=/@)
/dev/sdb1 on /home type btrfs 
(rw,noatime,ssd,space_cache,subvolid=286,subvol=/@home)

# btrfs scrub start -B /
scrub done for 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a
         scrub started at Wed Dec 28 12:05:53 2016 and finished after 
00:02:24
         total bytes scrubbed: 37.76GiB with 0 errors

"house keeping" mostly based on suggestions from Marc's Blog 
(http://marc.merlins.org/perso/btrfs/)
# /bin/btrfs balance start -v -dusage=0 /
# /bin/btrfs balance start -v -dusage=60 -musage=60 -v /

I can add a dmesg output on request, but so far I couldn't observe any 
reaction there...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Incremental send receive of snapshot fails
  2016-12-28 11:50 Incremental send receive of snapshot fails Rene Wolf
@ 2016-12-29 15:31 ` Giuseppe Della Bianca
  2016-12-29 19:31   ` Rene Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Giuseppe Della Bianca @ 2016-12-29 15:31 UTC (permalink / raw)
  To: igeligel; +Cc: linux-btrfs

Hi.

In such cases, I have run btrfs check (not repair mode !!!) in every
file system/partition that is involved in creating, sending and
receiving snapshots.


Regards.

Gdb


>Rene Wolf Wed, 28 Dec 2016 03:51:07 -0800


>Hi all
>I have a problem with incremental snapshot send receive in btrfs. May
be my google-fu is weak, but I couldn't find any pointers, so here goes.

>A few words about my setup first:

>I have multiple clients that back up to a central server. All clients
(and the server) are running a (K)Ubuntu 16.10 64Bit on btrfs. Backing
up works with btrfs send / receive, either full or incremental,
depending on whats available on >the server side. All clients have the
usual (Ubuntu) btrfs layout: 2 subvolumes, one for / and one for /home;
explicit entries in fstab; root volume not mounted anywhere. For further
details see the P.s. at the end.
]zac[



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Incremental send receive of snapshot fails
  2016-12-29 15:31 ` Giuseppe Della Bianca
@ 2016-12-29 19:31   ` Rene Wolf
  2016-12-30  9:34     ` Giuseppe Della Bianca
  0 siblings, 1 reply; 4+ messages in thread
From: Rene Wolf @ 2016-12-29 19:31 UTC (permalink / raw)
  To: Giuseppe Della Bianca; +Cc: linux-btrfs

Hi


As the fs in question is my root, I tried the following using a live usb 
stick of a xubuntu 16.10:

> Checking filesystem on /dev/sdb1
> UUID: 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a
> checking extents [o]
> checking free space cache [.]
> checking fs roots [o]
> found 40577679360 bytes used err is 0
> total csum bytes: 39027548
> total tree bytes: 571277312
> total fs tree bytes: 453001216
> total extent tree bytes: 71745536
> btree space waste bytes: 116244847
> file data blocks allocated: 46952968192
>  referenced 44081487872

"err is 0" ... so I guess that means everything is fine?

Out of curiosity I retried the new_snap+send+receive on that same fs 
using the live cd: same results (ERROR unlink ...)
Though I noticed that the exact file in question (reported by ERROR) is 
somewhat random ...
For this test with the live usb, I mounted the root volume directly 
instead of subvolumes via fstab. So that doesn't seam to have been the 
problem either.


I did some further meditating on what happens here. From what I read and 
understand of send/receive, the stream produced by send is about 
replaying the fs events. If I give send a parent, it will just replay 
the difference between the two snapshots and only produce a stream that 
contains the changes needed to "transform" one (parent) snap into the 
other (on the receiving end). Now I'm not sure how the receiving end 
figures out what the parent is, and whether it has it, but I guess 
that's where all those UUIDs come into play.

There are three UUIDs, if I compare them on sending ("lab") and 
receiving ("server") side, I see:

## sender
# btrfs subv show /.back/last_snap_by_script
> /.back/last_snap_by_script
>         UUID:                   b4634a8b-b74b-154a-9f17-1115f6d07524
>         Parent UUID:            b5f9a301-69f7-0646-8cf1-ba29e0c24fac
>         Received UUID:          196a0866-cd05-d24e-bac6-84e8e5eb037a

## receiver
# btrfs subv show /media/bak/lab/root/last_snap_by_script
>         UUID:                   89321ec1-2de6-0a4c-8f9f-cdd30fa3a7af
>         Parent UUID:            -
>         Received UUID:          196a0866-cd05-d24e-bac6-84e8e5eb037a

So that does make sense to me, as neither "Parent UUID" nor "UUID" is 
what would fit our needs (both are kind of local to one system). Instead 
the "Received UUID" seams to be the link identifying snaps on both ends 
to be "equal". But then why do both snaps on the sending side have the 
same "Received UUID" for me:

## from my original post, on sender side, this is the "new" delta snapshot
# btrfs subv show /.back/new_snap
> /.back/new_snap
>         Name:                   new_snap
>         UUID:                   fca51929-8101-db45-8df6-f25935c04f98
>         Parent UUID:            b5f9a301-69f7-0646-8cf1-ba29e0c24fac
>         Received UUID:          196a0866-cd05-d24e-bac6-84e8e5eb037a


It would be great if some one could clear this up .. could this point to 
the reason on why the "replay" stream is produced on a wrong basis?

Another thing I tried is the "--max-error 0" option of receive. That 
lets it continue after error, but that produced an endless slur of more 
of the same errors. Is that another indicator that the parent on the 
sending or receiving side is identified wrongly or not at all?

In any case, thanks for the tip Giuseppe :-)


Regards
Rene

On 29.12.2016 16:31, Giuseppe Della Bianca wrote:
> Hi.
>
> In such cases, I have run btrfs check (not repair mode !!!) in every
> file system/partition that is involved in creating, sending and
> receiving snapshots.
>
>
> Regards.
>
> Gdb

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Incremental send receive of snapshot fails
  2016-12-29 19:31   ` Rene Wolf
@ 2016-12-30  9:34     ` Giuseppe Della Bianca
  0 siblings, 0 replies; 4+ messages in thread
From: Giuseppe Della Bianca @ 2016-12-30  9:34 UTC (permalink / raw)
  To: Rene Wolf; +Cc: linux-btrfs

Hi.

If btrfs check does not display error messages, the filesystem is ok.

I do not have enough knowledge to analyze your data.

But if you're sure that all filesystems do not have problems, the
problem is the parent subvolume in the receiving filesystem.


Regards.

Gdb


Rene Wolf:
> Hi
> 
> 
> As the fs in question is my root, I tried the following using a live usb 
> stick of a xubuntu 16.10:
> 
> > Checking filesystem on /dev/sdb1
> > UUID: 122ecca7-9804-4c8a-b4ed-42fd6c6bbe7a
> > checking extents [o]
> > checking free space cache [.]
> > checking fs roots [o]
> > found 40577679360 bytes used err is 0
> > total csum bytes: 39027548
> > total tree bytes: 571277312
> > total fs tree bytes: 453001216
> > total extent tree bytes: 71745536
> > btree space waste bytes: 116244847
> > file data blocks allocated: 46952968192
> >  referenced 44081487872
> 
> "err is 0" ... so I guess that means everything is fine?
> 
]zac[


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-30  9:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-28 11:50 Incremental send receive of snapshot fails Rene Wolf
2016-12-29 15:31 ` Giuseppe Della Bianca
2016-12-29 19:31   ` Rene Wolf
2016-12-30  9:34     ` Giuseppe Della Bianca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).