Re: btrbk question: Should I Prefer Fileserver-initiated Backups from Several Hosts (Instead of Each Host Sending to the Server)?

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

From: "Joshua" <joshua@mailmag.net>
To: "Dave T" <davestechshop@gmail.com>,
	"Btrfs BTRFS" <linux-btrfs@vger.kernel.org>
Subject: Re: btrbk question: Should I Prefer Fileserver-initiated Backups  from Several Hosts (Instead of Each Host Sending to the Server)?
Date: Tue, 14 Sep 2021 01:49:19 +0000	[thread overview]
Message-ID: <fc2fbb950e825676988f425773c2bde5@mailmag.net> (raw)
In-Reply-To: <CAGdWbB5z6sGbDSJRygvWOiNNSP6hNhzFP-eMDfTm6nGoBpehKQ@mail.gmail.com>

September 12, 2021 10:42 AM, "Dave T" <davestechshop@gmail.com> wrote:

> Are btrbk-specific questions OK here?
> 
> I have a small LAN with a fileserver that should store backups from
> each attached host on the LAN. What is the most efficient (performant)
> way to do this with btrbk?
> 
> Each host (laptops, desktops and a few other devices) does hourly
> local snapshots with btrbk. Once per day, I would like to send backups
> of each volume on each device to the local fileserver. This has to be
> done via SSH (as NFS isn't supported by btrfs send|receive, afaik).
> 
> The options I'm aware of from the btrbk readme
> (https://digint.ch/btrbk/doc/readme.html) are:
> 
> 1. host-initiated backup to the fileserver from each host
> 
> 2. fileserver-initiated backups from all hosts
> 
> My guess is that the second option is preferred. Is that correct?

I personally prefer it, yes.

I can manage all my retention in one place, and my backups are isolated. If a client is
compromised, the backups on the server cannot be deleted by an attacker, since my clients have no
access to the server, rather the server has access to the clients.

> Assuming I use the second option, do I need to be concerned about it
> initiating a backup on a host while that host is also performing a
> local hourly snapshot?

I don't think so.  Hopefully someone will correct me if so.

> What are the disadvantages of the fileserver-initiated approach?

If a client is offline, it will not be backed up at that time.

There's probably other disadvantages, but that's the main one I can think of.

> If one host is offline, will the backup procedure continue on with the
> other hosts it can reach at that time?

It should, but I don't know 100%

> Since deleting snapshots can potentially be a costly operation (in
> terms of performance), should I split the process into two steps,
> where one step would pull the backups from each host without any
> deletions, and a second step would then prune the backups according to
> configured retention policies?

If it's important that the backup process complete as soon as possible, perhaps this would be a
good idea.

If that's not important, I don't see why it would matter.

> How many backups (snapshots) can I safely retain for each host volume?
> I would like to keep as many as possible, but I know there is a
> threshold at which performance can become a problem.

I would think the limits would be relatively high, but I personally only run dailies for a week,
then weeklies for a month, then monthlies for a year.

> I mount btrfs volumes on the **hosts** with these mount options:
> 
> autodefrag,noatime,nodiratime,compress=lzo,space_cache=v2

Just FYI, noatime implies nodiratime. Source: https://lwn.net/Articles/245002

> And I have the systemd fstrim.service enabled.
> 
> The fileserver is a dedicated backup server, not a general-purpose
> fileserver. I plan to use most of those same mount options. Do I need
> the autodefrag option? Will autodefrag help or hurt performance in
> this use-case? The following message from this list caused me some
> confusion as I would have expected the opposite:

Sorry, I honestly don't know what impact this might have.
I personally run autodefrag on my clients, and not on my backup server.

> [freezes during snapshot creation/deletion -- to be expected? November
> 2019, 00:21:18 CET]
> 
>> So just to follow up on this, reducing the total number of snapshots and increasing the time
>> between their creation from hourly to once every six hours did help a *little* bit. However, about
>> a week ago I decided to try an experiment and added the "autodefrag" mount option (which I don't
>> usually do on SSDs), and that helped *massively*. Ever since, snapper-cleanup.service runs without
>> me noticing at all!.
> 
> Are there any other recommendations?

next prev parent reply	other threads:[~2021-09-14  1:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-12 17:40 btrbk question: Should I Prefer Fileserver-initiated Backups from Several Hosts (Instead of Each Host Sending to the Server)? Dave T
2021-09-14  1:49 ` Joshua [this message]
2021-09-14  5:10   ` Forza
2021-09-14 15:42     ` Dave T
2021-09-14  9:59 ` Graham Cobb
2021-09-14 16:17   ` Dave T
2021-09-14 17:17     ` Graham Cobb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc2fbb950e825676988f425773c2bde5@mailmag.net \
    --to=joshua@mailmag.net \
    --cc=davestechshop@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox