linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Jon Nelson <jnelson@jamponi.net>
Cc: linux-raid@vger.kernel.org
Subject: Re: Backups w/ rsync
Date: Fri, 28 Sep 2007 12:25:23 -0400	[thread overview]
Message-ID: <46FD2AF3.4040501@tmr.com> (raw)
In-Reply-To: <cccedfc60709280811j44d6b1c8w60e209d7c6defca9@mail.gmail.com>

Jon Nelson wrote:
> Please note: I'm having trouble w/gmail's formatting... so please
> forgive this if it looks horrible. :-|
>
> On 9/28/07, Bill Davidsen <davidsen@tmr.com> wrote:
>   
>> Dean S. Messing wrote:
>>     
>>> It has been some time since I read the rsync man page.  I see that
>>> there is (among the bazillion and one switches) a "--link-dest=DIR"
>>> switch which I suppose does what you describe.  I'll have to
>>> experiment with this and think things through.  Thanks, Michal.
>>>
>>>       
>> Be aware that rsync is useful for making a *copy* of your files, which
>> isn't always the best backup. If the goal is to preserve data and be
>> able to recover in time of disaster, it's probably not optimal, while if
>> you need frequent access to old or deleted files it's fine.
>>     
>
>
> You are absolutely right when you say it isn't always the best backup. There
> IS no 'best' backup.
>
> For example, full and incremental backup methods such as dump and
>   
>> restore are usually faster to take and restore than a copy, and allow
>> easy incremental backups.
>>     
>
>
> If "copy" meant "full data copy" and not "hard link where possible", I'd
> agree with you. However...
>
> I use a nightly rsync (with --link-dest) to backup more than 40 GiB to a
> drbd-backed drive. I'll explain why I use drbd in just a moment.
>
> Technically, I have a 3 disk raid5 (Linux Software Raid) which is the
> primary store for the data. Then I have a second drive (non-raid) that is
> used as a drbd backing store, which I rsync *to* from filesystems built off
> of the raid. I keep *30 days* of nightly backups on the drbd volume. The
> average difference between nightly backups is about 45MB, or a bit less than
> 10%. The total disk usage is (on average) about 10% more than a single
> backup. On an AMD x86-64 dual core (3600 de-clocked to run at 1GHz) the
> entire process takes between 1 and 2 minutes, from start to finish.
>
> Using hard links means I can snapshot ~175,000 files, about 40GiB, in under
> 2 minutes - something I'd have a hard time doing with dump+restore. I could
> easily make incremental or differential copies, and maybe even in that time
> frame, but I'm not sure I much advantage in that. Furthermore, as you state,
> dump+restore does *not* include the removal of files which for some
> scenarios is a huge deal.
>   

What I don't understand is how you use hard links... because a hard link 
needs to be in the same filesystem, and because a hard link is just 
another pointer to the inode and doesn't make a physical copy of the 
data to another device or to anywhere, really.
> The long and short of it is this: using hard links (via rsync or cp or
> whatever) to do snapshot backups can be really, really fast and have
> significant advantages but there are, as with all things, some downsides.
> Those downsides are fairly easily mitigated, however. In my case, I can lose
> 1 drive of the raid and I'm OK. If I lose 2, then the other drive (not part
> of the raid) has the data I care about. If I lose the entire machine, the
> *other* machine (the other end of the drbd, only woken up every other day or
> so) has the data. Going back 30 days. And a bare-metal "restore" is as fast
> as your I/O is.  I back my /really/ important stuff up on DLT.
>
> Thanks again to drbd, when the secondary comes up it communicates with the
> primary and is able to figure out only which blocks have changed and only
> copies those. On a nightly basis that is usually a couple of hundred
> megabytes, and at 12MiB/s that doesn't take terribly long to take care of.
>   

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  reply	other threads:[~2007-09-28 16:25 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-18 23:09 Help: very slow software RAID 5 Dean S. Messing
2007-09-19  0:05 ` Justin Piszcz
2007-09-19  1:49   ` Dean S. Messing
2007-09-19  8:38     ` Justin Piszcz
2007-09-19 17:49       ` Dean S. Messing
2007-09-19 18:25         ` Justin Piszcz
2007-09-19 23:31           ` Dean S. Messing
2007-09-20  8:25             ` Justin Piszcz
2007-09-20 18:16             ` Michal Soltys
2007-09-20 19:06               ` Dean S. Messing
2007-09-20 15:33         ` Bill Davidsen
2007-09-20 18:47           ` Dean S. Messing
2007-09-20 21:08             ` Michael Tokarev
2007-09-21  0:58               ` Dean S. Messing
2007-09-21 13:00                 ` Bill Davidsen
2007-09-21 20:01                   ` Dean S. Messing
2007-09-21 20:21                   ` Dean S. Messing
2007-09-25  9:31                 ` Goswin von Brederlow
2007-09-25 18:16                   ` Dean S. Messing
2007-09-25 21:46                     ` Goswin von Brederlow
2007-09-25 23:50                       ` Dean S. Messing
2007-09-26  1:45                         ` Goswin von Brederlow
2007-09-27  6:23                           ` Dean S. Messing
2007-09-27  9:51                             ` Michal Soltys
2007-09-27 22:10                               ` Backups w/ rsync (was: Help: very slow software RAID 5.) Dean S. Messing
2007-09-28  7:57                                 ` Backups w/ rsync Michael Tokarev
2007-09-28 10:23                                   ` Goswin von Brederlow
2007-09-28 11:18                                     ` Michal Soltys
2007-09-28 12:47                                       ` Goswin von Brederlow
2007-09-28 14:17                                         ` Michal Soltys
2007-09-29  0:11                                   ` Dean S. Messing
2007-09-29  8:43                                     ` Michael Tokarev
2007-09-28 14:48                                 ` Bill Davidsen
2007-09-28 14:57                                   ` Wolfgang Denk
2007-09-28 16:50                                     ` Bill Davidsen
2007-10-01  4:45                                     ` Michal Soltys
2007-09-28 15:11                                   ` Jon Nelson
2007-09-28 16:25                                     ` Bill Davidsen [this message]
2007-09-28 16:52                                       ` Jon Nelson
2007-09-27 22:40                         ` Help: very slow software RAID 5 Bill Davidsen
2007-09-28 23:38                           ` Dean S. Messing
2007-09-29 14:52                             ` Bill Davidsen
2007-09-27 22:17                     ` Bill Davidsen
2007-09-28 23:21                       ` Dean S. Messing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46FD2AF3.4040501@tmr.com \
    --to=davidsen@tmr.com \
    --cc=jnelson@jamponi.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).