All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zach Brown <zab@zabbo.net>
To: linux-fsdevel@vger.kernel.org
Subject: bdar: efficiently backup allocated bytes in file systems
Date: Mon, 17 Mar 2008 18:13:27 -0700	[thread overview]
Message-ID: <47DF1737.2050700@zabbo.net> (raw)

So, I had a fun time throwing together a utility last weekend.  I
thought I'd share it sooner rather than later.

I found myself wanting to backup a copy of an ancient ~75g ext3 file
system.  I got frustrated by of our utilities which don't saturate
storage.  I wanted dd line rates but I also only wanted to copy
referenced data.

So I threw something together which does that.  I made it work roughly
like tar so that people have some idea what to expect.  So you can do
something like:

 $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz
...
 $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3

and it will do exactly what you would guess it would do after reading
those command lines.

The bdar file format is just a header and then a series of regions of
bytes described by their length and offset.  To create a bdar file from
a file system bdar needs to know enough to figure out what extents are
referenced.  Restoring a bdar is generic, though, it just stamps bytes
into the target file.

I only taught it the most basic knowledge of ext[234].  Just enough to
show that generating the bdar is ~4x faster than tar and ~2x faster than
dump :).  There's still some available disk bandwidth to consume with
read-ahead, but it's pretty close.  (single spindle, ~5g of kernel
trees, beefy cpus.)

I'm going to continue hacking this into something which could be trusted
with data but not on any rigorous schedule.  I thought I would put it up
for others to get a look at and, hopefully, contribute to.  There's a
lot of fun stuff we can do.

It's in a mercurial repo:

  http://www.zabbo.net/hg/bdar

  $ hg clone http://www.zabbo.net/hg/bdar ; ls ./bdar

Let me know if you give it a try, I'm interested in all feedback.

- z

             reply	other threads:[~2008-03-18  1:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-18  1:13 Zach Brown [this message]
2008-03-18  7:47 ` bdar: efficiently backup allocated bytes in file systems Sitsofe Wheeler
2008-03-20 16:25   ` Zach Brown
2008-03-18 21:35 ` David Chinner
2008-03-18 22:06   ` Zach Brown
2008-03-18 23:52     ` David Chinner
2008-03-20  0:26       ` Szabolcs Szakacsits
2008-03-20  1:13       ` Andreas Dilger
2008-03-20  0:32     ` Ric Wheeler
2008-03-19  2:58 ` Andreas Dilger
2008-03-19  3:10   ` Zach Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47DF1737.2050700@zabbo.net \
    --to=zab@zabbo.net \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.