From: Bill Davidsen <davidsen@tmr.com>
To: Jon Nelson <jnelson@jamponi.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: Backups w/ rsync
Date: Fri, 28 Sep 2007 12:25:23 -0400 [thread overview]
Message-ID: <46FD2AF3.4040501@tmr.com> (raw)
In-Reply-To: <cccedfc60709280811j44d6b1c8w60e209d7c6defca9@mail.gmail.com>
Jon Nelson wrote:
> Please note: I'm having trouble w/gmail's formatting... so please
> forgive this if it looks horrible. :-|
>
> On 9/28/07, Bill Davidsen <davidsen@tmr.com> wrote:
>
>> Dean S. Messing wrote:
>>
>>> It has been some time since I read the rsync man page. I see that
>>> there is (among the bazillion and one switches) a "--link-dest=DIR"
>>> switch which I suppose does what you describe. I'll have to
>>> experiment with this and think things through. Thanks, Michal.
>>>
>>>
>> Be aware that rsync is useful for making a *copy* of your files, which
>> isn't always the best backup. If the goal is to preserve data and be
>> able to recover in time of disaster, it's probably not optimal, while if
>> you need frequent access to old or deleted files it's fine.
>>
>
>
> You are absolutely right when you say it isn't always the best backup. There
> IS no 'best' backup.
>
> For example, full and incremental backup methods such as dump and
>
>> restore are usually faster to take and restore than a copy, and allow
>> easy incremental backups.
>>
>
>
> If "copy" meant "full data copy" and not "hard link where possible", I'd
> agree with you. However...
>
> I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a
> drbd-backed drive. I'll explain why I use drbd in just a moment.
>
> Technically, I have a 3 disk raid5 (Linux Software Raid) which is the
> primary store for the data. Then I have a second drive (non-raid) that is
> used as a drbd backing store, which I rsync *to* from filesystems built off
> of the raid. I keep *30 days* of nightly backups on the drbd volume. The
> average difference between nightly backups is about 45MB, or a bit less than
> 10%. The total disk usage is (on average) about 10% more than a single
> backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the
> entire process takes between 1 and 2 minutes, from start to finish.
>
> Using hard links means I can snapshot ~175,000 files, about 40GiB, in under
> 2 minutes - something I'd have a hard time doing with dump+restore. I could
> easily make incremental or differential copies, and maybe even in that time
> frame, but I'm not sure I much advantage in that. Furthermore, as you state,
> dump+restore does *not* include the removal of files which for some
> scenarios is a huge deal.
>
What I don't understand is how you use hard links... because a hard link
needs to be in the same filesystem, and because a hard link is just
another pointer to the inode and doesn't make a physical copy of the
data to another device or to anywhere, really.
> The long and short of it is this: using hard links (via rsync or cp or
> whatever) to do snapshot backups can be really, really fast and have
> significant advantages but there are, as with all things, some downsides.
> Those downsides are fairly easily mitigated, however. In my case, I can lose
> 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part
> of the raid) has the data I care about. If I lose the entire machine, the
> *other* machine (the other end of the drbd, only woken up every other day or
> so) has the data. Going back 30 days. And a bare-metal "restore" is as fast
> as your I/O is. I back my /really/ important stuff up on DLT.
>
> Thanks again to drbd, when the secondary comes up it communicates with the
> primary and is able to figure out only which blocks have changed and only
> copies those. On a nightly basis that is usually a couple of hundred
> megabytes, and at 12MiB/s that doesn't take terribly long to take care of.
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
next prev parent reply other threads:[~2007-09-28 16:25 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-18 23:09 Help: very slow software RAID 5 Dean S. Messing
2007-09-19 0:05 ` Justin Piszcz
2007-09-19 1:49 ` Dean S. Messing
2007-09-19 8:38 ` Justin Piszcz
2007-09-19 17:49 ` Dean S. Messing
2007-09-19 18:25 ` Justin Piszcz
2007-09-19 23:31 ` Dean S. Messing
2007-09-20 8:25 ` Justin Piszcz
2007-09-20 18:16 ` Michal Soltys
2007-09-20 19:06 ` Dean S. Messing
2007-09-20 15:33 ` Bill Davidsen
2007-09-20 18:47 ` Dean S. Messing
2007-09-20 21:08 ` Michael Tokarev
2007-09-21 0:58 ` Dean S. Messing
2007-09-21 13:00 ` Bill Davidsen
2007-09-21 20:01 ` Dean S. Messing
2007-09-21 20:21 ` Dean S. Messing
2007-09-25 9:31 ` Goswin von Brederlow
2007-09-25 18:16 ` Dean S. Messing
2007-09-25 21:46 ` Goswin von Brederlow
2007-09-25 23:50 ` Dean S. Messing
2007-09-26 1:45 ` Goswin von Brederlow
2007-09-27 6:23 ` Dean S. Messing
2007-09-27 9:51 ` Michal Soltys
2007-09-27 22:10 ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing
2007-09-28 7:57 ` Backups w/ rsync Michael Tokarev
2007-09-28 10:23 ` Goswin von Brederlow
2007-09-28 11:18 ` Michal Soltys
2007-09-28 12:47 ` Goswin von Brederlow
2007-09-28 14:17 ` Michal Soltys
2007-09-29 0:11 ` Dean S. Messing
2007-09-29 8:43 ` Michael Tokarev
2007-09-28 14:48 ` Bill Davidsen
2007-09-28 14:57 ` Wolfgang Denk
2007-09-28 16:50 ` Bill Davidsen
2007-10-01 4:45 ` Michal Soltys
2007-09-28 15:11 ` Jon Nelson
2007-09-28 16:25 ` Bill Davidsen [this message]
2007-09-28 16:52 ` Jon Nelson
2007-09-27 22:40 ` Help: very slow software RAID 5 Bill Davidsen
2007-09-28 23:38 ` Dean S. Messing
2007-09-29 14:52 ` Bill Davidsen
2007-09-27 22:17 ` Bill Davidsen
2007-09-28 23:21 ` Dean S. Messing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46FD2AF3.4040501@tmr.com \
--to=davidsen@tmr.com \
--cc=jnelson@jamponi.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.