From: Andreas Dilger <adilger@Sun.COM>
To: Zach Brown <zab@zabbo.net>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: bdar: efficiently backup allocated bytes in file systems
Date: Wed, 19 Mar 2008 10:58:43 +0800 [thread overview]
Message-ID: <20080319025843.GE2971@webber.adilger.int> (raw)
In-Reply-To: <47DF1737.2050700@zabbo.net>
On Mar 17, 2008 18:13 -0700, Zach Brown wrote:
> So, I had a fun time throwing together a utility last weekend. I
> thought I'd share it sooner rather than later.
>
> I found myself wanting to backup a copy of an ancient ~75g ext3 file
> system. I got frustrated by of our utilities which don't saturate
> storage. I wanted dd line rates but I also only wanted to copy
> referenced data.
>
> So I threw something together which does that. I made it work roughly
> like tar so that people have some idea what to expect. So you can do
> something like:
>
> $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz
> ...
> $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3
>
> and it will do exactly what you would guess it would do after reading
> those command lines.
>
> The bdar file format is just a header and then a series of regions of
> bytes described by their length and offset. To create a bdar file from
> a file system bdar needs to know enough to figure out what extents are
> referenced. Restoring a bdar is generic, though, it just stamps bytes
> into the target file.
So the question is whether the ".bdar" file is specific to the filesystem
being backed up, and if it only allows backing up the whole filesystem?
Does it create a dense output file or a sparse one? Does it store the
data as chunks of blocks in a full-device map or on a per file basis?
If you can't restore a .bdar backup file to a smaller device than the
source device that makes it less useful than most of the other tools.
> I only taught it the most basic knowledge of ext[234]. Just enough to
> show that generating the bdar is ~4x faster than tar and ~2x faster than
> dump :). There's still some available disk bandwidth to consume with
> read-ahead, but it's pretty close. (single spindle, ~5g of kernel
> trees, beefy cpus.)
The question is whether the 2x speed improvement is worth the lack of
portability compared to even dump?
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
next prev parent reply other threads:[~2008-03-19 19:25 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-18 1:13 bdar: efficiently backup allocated bytes in file systems Zach Brown
2008-03-18 7:47 ` Sitsofe Wheeler
2008-03-20 16:25 ` Zach Brown
2008-03-18 21:35 ` David Chinner
2008-03-18 22:06 ` Zach Brown
2008-03-18 23:52 ` David Chinner
2008-03-20 0:26 ` Szabolcs Szakacsits
2008-03-20 1:13 ` Andreas Dilger
2008-03-20 0:32 ` Ric Wheeler
2008-03-19 2:58 ` Andreas Dilger [this message]
2008-03-19 3:10 ` Zach Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080319025843.GE2971@webber.adilger.int \
--to=adilger@sun.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=zab@zabbo.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.