* Questions on incremental backups
@ 2014-07-17 20:12 Sam Bull
2014-07-18 4:35 ` Russell Coker
0 siblings, 1 reply; 13+ messages in thread
From: Sam Bull @ 2014-07-17 20:12 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2970 bytes --]
I've a couple of questions on incremental backups. I've read the wiki
page, and would like to confirm my understanding of some features, and
also see if other features are possible that are not mentioned. I'm
looking to replace my existing backup solution, and hoping to match the
features I currently use, and go a little beyond.
=== Daily snapshot ===
So, if I understand correctly, I can make a daily snapshot of my
filesystem with very little overhead. Then these can later be synced
efficiently to another system (only syncing the differences), so I can
backup regularly over the internet to my server, and also to an external
HDD. After syncing, I can delete the snapshots (other than the trailing
one needed for the next backup).
In this way I can keep a constant stream of daily backups even when
offline, and simply sync them next time I am online before deleting them
locally.
=== Ignore directories ===
Due to storage limitations on my server, is it possible to ignore
certain directories? For example, ignoring the folder that stores all my
games, as this could be rather large, and the contents can easily be
re-downloaded. The instructions involve subvolumes, so maybe it's
possible to ignore a subvolume when syncing?
If that is possible, then is it also possible to have a separate backup
that does include the ignored directory? For example, having the smaller
sync to the storage-limited server, but having a full sync to an
external HDD.
=== Display backups ===
Is it possible to view the contents of all backups? So, the expected
interface would be something like a tree of all files from across all
snapshots. Any files that are not present in the latest snapshot would
be greyed out to show they have been deleted. Selecting a file would
show a list of versions of the file, with one version for each snapshot
the file has been modified in.
As long as I can get access to this information, maybe some kind of diff
between snapshots, I'm willing to write the actual software to display
this interface. (I suppose even if it's not supported, I could crawl
through the filesystems and generate some kind of database, but that
sounds like a painful process.)
=== Merge snapshots down ===
Is there some way to merge snapshots down? So, I could merge the last
week of daily snapshots into a single weekly snapshot. The new snapshot
should include all files across all the snapshots (even if deleted in
some of the snapshots), and include just the latest version of each
file.
This way, I'd like to maintain daily snapshots, which can be regularly
merged down into weekly snapshots, and then into monthly snapshots, and
then finally into yearly snapshots.
And, finally, there's no problem in deleting old snapshots? I'm assuming
any data from these snapshots used by other snapshots will still be
referenced by the other snapshots, and thus be retained, so nothing will
break?
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-17 20:12 Questions on incremental backups Sam Bull
@ 2014-07-18 4:35 ` Russell Coker
2014-07-18 7:36 ` Bob Williams
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Russell Coker @ 2014-07-18 4:35 UTC (permalink / raw)
To: Sam Bull, linux-btrfs
Daily snapshots work welk with kernel 3.14 and above (I had problems with 3.13 and previous). I have snapshots every 15 mins on some subvols.
Very large numbers of snapshots can cause performance problems. I suggest keeping below 1000 snapshots at this time.
You can use send/recv functionality for remote backups. So far I've used rsync, it works well and send/recv has some limitations about filesystem structure etc. Rsync can transfer to a ext4 or ZFS filesystem if you wish.
Ignoring directories in send/recv is done by subvol. Even if you use rsync it's a good idea to have different subvols for directory trees with different backup requirements.
Displaying backups is an issue of backup software. It is above the level that BTRFS development touches. While people here can probably offer generic advice on backup software it's not the topic of the list.
I use date based snapshots on my backup BTRFS filesystems and I can easily delete snapshots in the middle of the list.
--
Sent from my Samsung Galaxy Note 2 with K-9 Mail.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 4:35 ` Russell Coker
@ 2014-07-18 7:36 ` Bob Williams
2014-07-18 10:45 ` Duncan
2014-07-18 12:56 ` Sam Bull
2 siblings, 0 replies; 13+ messages in thread
From: Bob Williams @ 2014-07-18 7:36 UTC (permalink / raw)
To: linux-btrfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 18/07/14 05:35, Russell Coker wrote:
> Daily snapshots work welk with kernel 3.14 and above (I had
> problems with 3.13 and previous). I have snapshots every 15 mins on
> some subvols.
>
> Very large numbers of snapshots can cause performance problems. I
> suggest keeping below 1000 snapshots at this time.
>
> You can use send/recv functionality for remote backups. So far I've
> used rsync, it works well and send/recv has some limitations about
> filesystem structure etc. Rsync can transfer to a ext4 or ZFS
> filesystem if you wish.
>
> Ignoring directories in send/recv is done by subvol. Even if you
> use rsync it's a good idea to have different subvols for directory
> trees with different backup requirements.
>
> Displaying backups is an issue of backup software. It is above the
> level that BTRFS development touches. While people here can
> probably offer generic advice on backup software it's not the topic
> of the list.
>
> I use date based snapshots on my backup BTRFS filesystems and I can
> easily delete snapshots in the middle of the list.
>
I also backup to an external attached drive using rsync followed by a
snapshot. I have written a small python script that does this,
followed by deleting snapshots older than 90 days.
Restoring backed up data is done using a file manager.
I'm happy to share my script.
Bob
- --
Bob Williams
System: Linux 3.11.10-17-desktop
Distro: openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.3
Uptime: 06:00am up 1 day 22:01, 4 users, load average: 0.87, 0.50, 0.24
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlPIznwACgkQ0Sr7eZJrmU5hZwCglxUmkd+oX3ktsFBQ2gD4Twth
5ucAn38QDkNJflmRZwH/G662DBGRd38J
=kN69
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 4:35 ` Russell Coker
2014-07-18 7:36 ` Bob Williams
@ 2014-07-18 10:45 ` Duncan
2014-07-18 10:55 ` Roman Mamedov
[not found] ` <TmvW1o01t4NXQGV01mvYsU>
2014-07-18 12:56 ` Sam Bull
2 siblings, 2 replies; 13+ messages in thread
From: Duncan @ 2014-07-18 10:45 UTC (permalink / raw)
To: linux-btrfs
Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted:
> Daily snapshots work welk with kernel 3.14 and above (I had problems
> with 3.13 and previous). I have snapshots every 15 mins on some subvols.
>
> Very large numbers of snapshots can cause performance problems. I
> suggest keeping below 1000 snapshots at this time.
The other caveat with btrfs snapshots is how they deal with NOCOW files,
the usual workaround recommended for large (Gig-ish-plus) internal-
rewrite-pattern files such as databases and VM images.
I'll avoid a detailed discussion here since I don't know whether it
applies to the OP's use-case and the problem and workarounds are well
discussed in other threads, but this is a heads-up for the OP to do a bit
of research on the topic if he /does/ deal with gig-plus sized VM images
or databases. Very briefly, putting such files on their own subvolume
and using more traditional backup techniques instead of snapshotting is
recommended. Another alternative is partitioning and choosing a
filesystem other than btrfs for those files, while still considering
btrfs for other files.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 10:45 ` Duncan
@ 2014-07-18 10:55 ` Roman Mamedov
[not found] ` <TmvW1o01t4NXQGV01mvYsU>
1 sibling, 0 replies; 13+ messages in thread
From: Roman Mamedov @ 2014-07-18 10:55 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 910 bytes --]
On Fri, 18 Jul 2014 10:45:37 +0000 (UTC)
Duncan <1i5t5.duncan@cox.net> wrote:
> Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as excerpted:
>
> > Daily snapshots work welk with kernel 3.14 and above (I had problems
> > with 3.13 and previous). I have snapshots every 15 mins on some subvols.
> >
> > Very large numbers of snapshots can cause performance problems. I
> > suggest keeping below 1000 snapshots at this time.
>
> The other caveat with btrfs snapshots is how they deal with NOCOW files,
> the usual workaround recommended for large (Gig-ish-plus) internal-
> rewrite-pattern files such as databases and VM images.
And "how" do they deal with them? To the best of my knowledge there is no
"caveat" whatsoever, NOCOW and snapshots interact perfectly, exactly as it
should be (snapshotted and then changed bits get COW'ed, but only once).
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
[not found] ` <TmvW1o01t4NXQGV01mvYsU>
@ 2014-07-18 12:34 ` Duncan
2014-07-18 13:05 ` Roman Mamedov
0 siblings, 1 reply; 13+ messages in thread
From: Duncan @ 2014-07-18 12:34 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On Fri, 18 Jul 2014 16:55:26 +0600
Roman Mamedov <rm@romanrm.net> wrote:
> On Fri, 18 Jul 2014 10:45:37 +0000 (UTC)
> Duncan <1i5t5.duncan@cox.net> wrote:
>
> > Russell Coker posted on Fri, 18 Jul 2014 14:35:20 +1000 as
> > excerpted:
> >
> > > Daily snapshots work welk with kernel 3.14 and above (I had
> > > problems with 3.13 and previous). I have snapshots every 15 mins
> > > on some subvols.
> > >
> > > Very large numbers of snapshots can cause performance problems. I
> > > suggest keeping below 1000 snapshots at this time.
> >
> > The other caveat with btrfs snapshots is how they deal with NOCOW
> > files, the usual workaround recommended for large (Gig-ish-plus)
> > internal- rewrite-pattern files such as databases and VM images.
>
> And "how" do they deal with them? To the best of my knowledge there
> is no "caveat" whatsoever, NOCOW and snapshots interact perfectly,
> exactly as it should be (snapshotted and then changed bits get
> COW'ed, but only once).
Yes, but the fact that NOCOW files must never-the-less be COWed anyway
on the first write to a block after a snapshot isn't exactly intuitive
to many admins, and even to many list regulars until relatively
recently.
For some time the recommendation for active large database files and
VM images (the ones I mentioned) was to make them NOCOW in ordered to
avoid extreme fragmentation, and people were still reporting extreme
fragmentation and the related performance issues even when the files
were properly NOCOWed at creation. Turned out the reason was that they
had scripted auto-snapshotting enabled, sometimes snapshotting the
files as often as once a minute! With an active VM writing data more
or less randomly to its image equally often, NOCOW lost its
effectiveness as the snapshotting was forcing COW writes most of the
time anyway!
In the context of frequent snapshots, NOCOW is in practice broken
and doesn't do what the label would indicate, thus the caveat.
Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho
the fact that snapshots stop at subvolume borders can be used as a
partial workaround, by putting NOCOW files on a dedicated partition and
not snapshotting it, exactly as I mentioned.
--
Duncan - No HTML messages please, as they are filtered as spam.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 4:35 ` Russell Coker
2014-07-18 7:36 ` Bob Williams
2014-07-18 10:45 ` Duncan
@ 2014-07-18 12:56 ` Sam Bull
2014-07-18 13:40 ` Russell Coker
2 siblings, 1 reply; 13+ messages in thread
From: Sam Bull @ 2014-07-18 12:56 UTC (permalink / raw)
To: Russell Coker; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]
Thanks for the replies, I think that's most of the questions answered.
I'll not bother backing up any VMs, as they won't contain anything worth
backing up. Can anybody answer the last couple of remaining questions?
On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote:
> Ignoring directories in send/recv is done by subvol. Even if you use
> rsync it's a good idea to have different subvols for directory trees
> with different backup requirements.
So, an inner subvol won't be backed up? If I wanted a full backup, I
would presumably get snapshots of each subvol separately, right?
> Displaying backups is an issue of backup software. It is above the
> level that BTRFS development touches. While people here can probably
> offer generic advice on backup software it's not the topic of the
> list.
As said, I don't mind developing the software. But, is the required
information easily available? Is there a way to get a diff, something
like a list of changed/added/removed files between snapshots?
If I want to create a backup view, I could start with just a file view
of the most recent snapshot, but is there a way I can quickly get a list
of additional files in the other snapshots that are not present in the
most recent one (files that have been deleted)?
And, finally, nobody has mentioned on the possibility of merging
multiple snapshots into a single snapshot. Would this be possible, to
create a snapshot that contains the most recent version of each file
present across all of the snapshots (including files which may be
present in only one of the snapshots)?
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 12:34 ` Duncan
@ 2014-07-18 13:05 ` Roman Mamedov
2014-07-18 14:28 ` Imran Geriskovan
0 siblings, 1 reply; 13+ messages in thread
From: Roman Mamedov @ 2014-07-18 13:05 UTC (permalink / raw)
To: Duncan; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]
On Fri, 18 Jul 2014 05:34:22 -0700
Duncan <1i5t5.duncan@cox.net> wrote:
> Effectively, admins can choose NOCOW XOR frequent-snapshotting, altho
> the fact that snapshots stop at subvolume borders can be used as a
> partial workaround, by putting NOCOW files on a dedicated partition and
> not snapshotting it, exactly as I mentioned.
You can't backup running VM images and datafiles of an active database using
"traditional" backup techniques such as file copy or rsync. The tail of a file
you're copying for a backup will be long-inconsistent with the overall state
or the head of the file when you started copying. Snapshots on the other hand
are atomic, and can very much be used to create a static copy of the files for
the purposes of compressing/copying away somewhere. And at worst, the
"restored from backup" state of such a backed up VM or DB will be equivalent
to it just having had a power-loss. Journalling FSes and databases can deal
with that with no major problems.
So just exercise moderation, snapshot e.g. once an hour or even a day, the
result will still be better than not using NOCOW, and will deliver most of the
benefits you get by snapshotting.
Another option is to snapshot->backup->delete snapshot.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 12:56 ` Sam Bull
@ 2014-07-18 13:40 ` Russell Coker
2014-07-18 14:27 ` Mike Hartman
2014-07-18 17:31 ` Daniel Mizyrycki
0 siblings, 2 replies; 13+ messages in thread
From: Russell Coker @ 2014-07-18 13:40 UTC (permalink / raw)
To: Sam Bull; +Cc: linux-btrfs
On Fri, 18 Jul 2014 13:56:58 Sam Bull wrote:
> On ven, 2014-07-18 at 14:35 +1000, Russell Coker wrote:
> > Ignoring directories in send/recv is done by subvol. Even if you use
> > rsync it's a good idea to have different subvols for directory trees
> > with different backup requirements.
>
> So, an inner subvol won't be backed up? If I wanted a full backup, I
> would presumably get snapshots of each subvol separately, right?
If you use btrfs send/recv then it won't get the inner subvol. If you use
rsync then by default it goes through the entire directory tree unless you use
the -x option.
> > Displaying backups is an issue of backup software. It is above the
> > level that BTRFS development touches. While people here can probably
> > offer generic advice on backup software it's not the topic of the
> > list.
>
> As said, I don't mind developing the software. But, is the required
> information easily available? Is there a way to get a diff, something
> like a list of changed/added/removed files between snapshots?
Your usual diff utility will do it. I guess you could parse the output of
btrfs send.
> And, finally, nobody has mentioned on the possibility of merging
> multiple snapshots into a single snapshot. Would this be possible, to
> create a snapshot that contains the most recent version of each file
> present across all of the snapshots (including files which may be
> present in only one of the snapshots)?
There is no btrfs functionality for that. But I'm sure you could do something
with standard Unix utilities and copying files around.
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 13:40 ` Russell Coker
@ 2014-07-18 14:27 ` Mike Hartman
2014-07-20 16:56 ` Sam Bull
2014-07-18 17:31 ` Daniel Mizyrycki
1 sibling, 1 reply; 13+ messages in thread
From: Mike Hartman @ 2014-07-18 14:27 UTC (permalink / raw)
To: russell; +Cc: Sam Bull, linux-btrfs
>> And, finally, nobody has mentioned on the possibility of merging
>> multiple snapshots into a single snapshot. Would this be possible, to
>> create a snapshot that contains the most recent version of each file
>> present across all of the snapshots (including files which may be
>> present in only one of the snapshots)?
>
> There is no btrfs functionality for that. But I'm sure you could do something
> with standard Unix utilities and copying files around.
You could probably use UnionFS or one of the alternatives to get a
merged view of a group of snapshots. You could then copy that merged
view and delete the original snapshots. I'm not sure if there's
anything special you have to do metadata-wise to turn that merged view
into something btrfs would still recognize as a snapshot though.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 13:05 ` Roman Mamedov
@ 2014-07-18 14:28 ` Imran Geriskovan
0 siblings, 0 replies; 13+ messages in thread
From: Imran Geriskovan @ 2014-07-18 14:28 UTC (permalink / raw)
To: linux-btrfs
It's not about snapshots but here is an other incremental
backup recipe for optical mediums like DVDs, BlueRays:
Base Backup:
1) Create encrypted loopback devices of DVD or BlueRay sizes.
2) Create a compressed multi device Btrfs spanning these
loopback devices. (To save space, you may use single
metadata if this is not your only backup)
3) Rsync your data into this fs.
4) Unmount it and make it SEED fs (btrfstune -S 1..)
5) Burn loopback device files to DVDs, Bluerays.
Incremental Part:
a) Before your next backup, create additional encrypted
loopback devices as needed.
b) Mount your base backup. (It will mount as read-only)
c) Add devices created at (a) to your base backup fs.
d) Rsync into your fs. (Note that incremental data
will only go into the devices at (a)
e) Unmount all.
f) Only burn devices at (a) to DVDs, Bluerays. These
are your incremental disks.
Regards,
Imran
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 13:40 ` Russell Coker
2014-07-18 14:27 ` Mike Hartman
@ 2014-07-18 17:31 ` Daniel Mizyrycki
1 sibling, 0 replies; 13+ messages in thread
From: Daniel Mizyrycki @ 2014-07-18 17:31 UTC (permalink / raw)
Cc: linux-btrfs
On 07/18/14 06:40, Russell Coker wrote:
>>> Displaying backups is an issue of backup software. It is above the
>>> level that BTRFS development touches. While people here can probably
>>> offer generic advice on backup software it's not the topic of the
>>> list.
>>
>> As said, I don't mind developing the software. But, is the required
>> information easily available? Is there a way to get a diff, something
>> like a list of changed/added/removed files between snapshots?
>
> Your usual diff utility will do it. I guess you could parse the output of
> btrfs send.
Following this thought, one step closer in getting a text diff can be to
use fardump. It takes a btrfs send binary stream and outputs the send
instructions in plaintext.
(https://kernel.googlesource.com/pub/scm/linux/kernel/git/arne/far-progs).
It certainly would be awesome if btrfs-progs could have an extra
parameter to just generate the list of changed/added/removed files
between snapshots as all the needed infrastructure is already in place.
>
>> And, finally, nobody has mentioned on the possibility of merging
>> multiple snapshots into a single snapshot. Would this be possible, to
>> create a snapshot that contains the most recent version of each file
>> present across all of the snapshots (including files which may be
>> present in only one of the snapshots)?
>
> There is no btrfs functionality for that. But I'm sure you could do something
> with standard Unix utilities and copying files around.
Sure, but the management of data deduplication is left to the user
(presumably using cp --reflink) which is not trivial.
Does anybody knows how safe it is to use duperemove or bedup?
Any recommendations on how to effectively deduplicate btrfs at this point?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Questions on incremental backups
2014-07-18 14:27 ` Mike Hartman
@ 2014-07-20 16:56 ` Sam Bull
0 siblings, 0 replies; 13+ messages in thread
From: Sam Bull @ 2014-07-20 16:56 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 171 bytes --]
Thanks everyone for the responses. I'll start setting up my backup
strategy in 2 or 3 weeks. I'll give the diff and unionFS tips a go, and
report back on any progress.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-07-20 16:56 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-17 20:12 Questions on incremental backups Sam Bull
2014-07-18 4:35 ` Russell Coker
2014-07-18 7:36 ` Bob Williams
2014-07-18 10:45 ` Duncan
2014-07-18 10:55 ` Roman Mamedov
[not found] ` <TmvW1o01t4NXQGV01mvYsU>
2014-07-18 12:34 ` Duncan
2014-07-18 13:05 ` Roman Mamedov
2014-07-18 14:28 ` Imran Geriskovan
2014-07-18 12:56 ` Sam Bull
2014-07-18 13:40 ` Russell Coker
2014-07-18 14:27 ` Mike Hartman
2014-07-20 16:56 ` Sam Bull
2014-07-18 17:31 ` Daniel Mizyrycki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).