Re: Snapshots a la NetApp?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hans Reiser <reiser@namesys.com>
To: Heinz-Josef Claes <hjclaes@web.de>
Cc: kend@xanoptix.com, reiserfs-list@namesys.com,
	Alexander Lyamin <flx@namesys.com>
Subject: Re: Snapshots a la NetApp?
Date: Tue, 25 Mar 2003 05:35:01 +0300	[thread overview]
Message-ID: <3E7FC055.2030903@namesys.com> (raw)
In-Reply-To: <1048542143.1332.31.camel@wusel.schnulli.de>

Heinz-Josef Claes wrote:

>Am Mon, 2003-03-24 um 21.49 schrieb Hans Reiser:
>  
>
>>Heinz-Josef Claes wrote:
>>
>>    
>>
>>>Am Son, 2003-03-23 um 04.38 schrieb kend@xanoptix.com:
>>> 
>>>
>>>      
>>>
>>>>I'm just wondering if there's any development being done on NetApp-like
>>>>snapshots.  (This differs from LVM snapshots in that it's file-by-file,
>>>>instead of volume based.)  If not, has anyone given it any consideration? 
>>>>It would be a huge win for the RAID-in-a-box folks, and, speaking as a
>>>>sysadmin, is something about which I dream frequently.
>>>>
>>>>Just curious...
>>>>
>>>>Ken D'Ambrosio
>>>>Sr. SysAdmin,
>>>>Xanoptix, Inc.
>>>>   
>>>>
>>>>        
>>>>
>>>Hi,
>>>you can look at the URL below. It's a kind of snapshot inspired by
>>>NetApp, but has nothing to do with LVM or the filesystem. It also has a
>>>different behaviour.
>>>
>>> 
>>>
>>>      
>>>
>>I didn't find the magic click that gave me the design doc for 
>>storebackup, could you post it to the list?
>>    
>>
>
>There is no design doc, because it's principal design is so simple :-).
>Here is an excerpt from the README file include in the tar file at
>sourceforge:
>
>GENERAL FUNCTIONALITY
>---------------------
>
>storeBackup is a disk-to-disk backup tool for Linux. It should run on
>other Unix like machines. You can directly browse through the backuped
>files (locally, via NFS, via SAMBA or whatever). This gives the users
>the possibility to restore files absolutely easily and fast. He/She
>only has to copy (and possibly uncompress) the file. There is also a
>tool for easily restoring (sub) trees for the administrator. Every
>single backup of a specific time can be deleted without affecting the
>other existing backups.
>
>
>HOW DOES IT WORK?
>-----------------
>
>storeBackup makes a backup of a directory to another direct reachable
>location. It does not care about where this location is (same disk,
>another disc, via NFS over the network). You should use another disk
>or better another computer to store the backup. The target directory
>must be on a Unix virtual file system which supports hard links. So
>backing up to a SAMBA share is not possible. Naturally, you can also
>mount the source directory via NFS and backup in a local
>filesystem. In this case, it's good to have a fast network.  The
>backup(s) can be seen in a directory in the form date_time
>(yyyy.mm.dd_hh.mm.ss) which it creates.
>
>Implemented are several optimizations to reduce disk usage:
>
>- The files to backup are compressed (default bzip2) as discrete files
>  in the backup. Files with definable suffixes (like .gz, which is part
>  of the default value) will not be compressed. It is possible to
>  avoid compression in general.
>
>- If a file with the same contents exists in the previous backup, the
>  new backup will only be a hard link to the other one. (This
>  mechanism depends on the contents, not on a file name or path!) If you
>  rename a file or directory or move sub trees around, it will not cost
>  additional space in the backup.
>
>- You can also check older backups than the last one for files with
>  the same contents. But this is normally not worth the effort. You
>  can also check backups of *other* machines (or backup series)
>  for files with the same contents, which can be very efficient.
>
>- If a file with the same contents exists in the same tree to back up,
>  it will be hard linked to the other one (and naturally to the older
>  ones).
>
>As a result, only changes resulting in different files were stored
>(compressed) and require disk space. Normally, the required disk space
>for one backup a day for 30 days is less then the required space for the
>original. But this depends on the number of changes.
>
>
>There are several optimizations to improve performance. Normally, the
>first backup is much slower than the followings, because all the data
>has to be compressed and/or copied. storeBackup is able to take
>advantage of multiprocessor machines. From the second run, it should not
>be a problem to get more than 100 files/sec with a normal machine of
>today. This does not depend on the file size.
>
>There is are special files in the root of the created backup called
>.md5CheckSums.info and .md5CheckSums or .md5CheckSums.bz2
>(default). You must not delete these files. They contain all
>information about the original files. You can use this information to
>write you own tools beside the existing to restore or to analyze the
>backups.
>
>When started, storeBackup will read .md5CheckSums and creates its own
>databases (dbm file) in $TMPDIR or --tmpdir (default is /tmp). If you
>back up a large number of files (some millions), the required space can
>be several dozens of megabytes. If you do not have enough memory to
>cache the dbm file, I recommend using a separate hard disk (if
>available) for better performance.
>
>
>LIMITATIONS
>-----------
>
>- storeBackup can backup normal files, directories, symbolic links and
>  named pipes. Other file types are not supported up to now and will
>  generate a warning.
>
>- The permissions in the backup tree(s) are equal to the permissions
>  in the original directory. Under special rare conditions it is
>  possible, that a user cannot read one ore more of own his/her files
>  in the backup. With the restore tool - storeBackupRecover.pl -
>  everything is restored with the original permissions.
>
>- storeBackup uses hard links to save disk space. Linux with ext2 file
>  system supports up to 32000, reiserfs up to 64535 hard links. If
>  storeBackup needs more hard links, it will write a warning and store
>  a new (compressed) copy of the file. If you use ext2 for the backup,
>  you have to reserve enough (static) inodes!
>
>
>
>Hope this helps. As I wrote, it's not depending on the filesystem. It's
>depending on files, while NetApp is depending on blocks. Making a
>snapshot of a big database file with NetApp is a good idea. Making
>backups of big database files with storebackup costs you a lot of space.
>
>On the other side, if you change a normal text file, the block related
>algorithm of NetApp will not save much blocks. Compressing the whole
>file is much more efficient. And duplicated files, with exist in a great
>number in a "normal" file system containing the data of many users (I
>also didn't believe this :-)) are saved only once in the backup.
>
>If you want to backup big DB files on linux, there's a better tool for
>this use case which uses the algorithm of rsync and stores the original
>file and a series of deltas. If I remember well it's called rdiff.
>(Sorry, but I never used it.)
>
>  
>
Flx, should we try this for our backups?

-- 
Hans

next prev parent reply	other threads:[~2003-03-25  2:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-03-22 23:22 Encryption Pierre Abbat
2003-03-22 23:29 ` Encryption Yury Umanets
2003-03-24 20:37   ` Encryption Hans Reiser
2003-03-25 19:18     ` Encryption Edward Shushkin
2003-03-23  3:38 ` Snapshots a la NetApp? kend
2003-03-23  9:16   ` Heinz-Josef Claes
2003-03-24 20:49     ` Hans Reiser
2003-03-24 21:42       ` Heinz-Josef Claes
2003-03-25  0:03         ` Tom Vier
2003-03-26 21:11           ` Heinz-Josef Claes
2003-03-25  2:35         ` Hans Reiser [this message]
2003-03-26 14:15           ` Heinz-Josef Claes
2003-03-26 18:37             ` Hans Reiser
2003-03-26 22:45               ` Valdis.Kletnieks
2003-03-25  6:15       ` unsubscribe Voicu Liviu
2003-03-23 11:18   ` Snapshots a la NetApp? Lars Marowsky-Bree
2003-03-24 12:49     ` Ragnar Kjørstad
2003-03-23 12:44   ` Oleg Drokin
  -- strict thread matches above, loose matches on Subject: below --
2003-03-26 17:54 Barry, Christopher
2003-03-26 18:29 ` Heinz-Josef Claes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E7FC055.2030903@namesys.com \
    --to=reiser@namesys.com \
    --cc=flx@namesys.com \
    --cc=hjclaes@web.de \
    --cc=kend@xanoptix.com \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.