linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bdar: efficiently backup allocated bytes in file systems
@ 2008-03-18  1:13 Zach Brown
  2008-03-18  7:47 ` Sitsofe Wheeler
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Zach Brown @ 2008-03-18  1:13 UTC (permalink / raw)
  To: linux-fsdevel

So, I had a fun time throwing together a utility last weekend.  I
thought I'd share it sooner rather than later.

I found myself wanting to backup a copy of an ancient ~75g ext3 file
system.  I got frustrated by of our utilities which don't saturate
storage.  I wanted dd line rates but I also only wanted to copy
referenced data.

So I threw something together which does that.  I made it work roughly
like tar so that people have some idea what to expect.  So you can do
something like:

 $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz
...
 $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3

and it will do exactly what you would guess it would do after reading
those command lines.

The bdar file format is just a header and then a series of regions of
bytes described by their length and offset.  To create a bdar file from
a file system bdar needs to know enough to figure out what extents are
referenced.  Restoring a bdar is generic, though, it just stamps bytes
into the target file.

I only taught it the most basic knowledge of ext[234].  Just enough to
show that generating the bdar is ~4x faster than tar and ~2x faster than
dump :).  There's still some available disk bandwidth to consume with
read-ahead, but it's pretty close.  (single spindle, ~5g of kernel
trees, beefy cpus.)

I'm going to continue hacking this into something which could be trusted
with data but not on any rigorous schedule.  I thought I would put it up
for others to get a look at and, hopefully, contribute to.  There's a
lot of fun stuff we can do.

It's in a mercurial repo:

  http://www.zabbo.net/hg/bdar

  $ hg clone http://www.zabbo.net/hg/bdar ; ls ./bdar

Let me know if you give it a try, I'm interested in all feedback.

- z

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18  1:13 bdar: efficiently backup allocated bytes in file systems Zach Brown
@ 2008-03-18  7:47 ` Sitsofe Wheeler
  2008-03-20 16:25   ` Zach Brown
  2008-03-18 21:35 ` David Chinner
  2008-03-19  2:58 ` Andreas Dilger
  2 siblings, 1 reply; 11+ messages in thread
From: Sitsofe Wheeler @ 2008-03-18  7:47 UTC (permalink / raw)
  To: linux-fsdevel

On Mon, 17 Mar 2008 18:13:27 -0700, Zach Brown wrote:

> The bdar file format is just a header and then a series of regions of
> bytes described by their length and offset.  To create a bdar file from
> a file system bdar needs to know enough to figure out what extents are
> referenced.  Restoring a bdar is generic, though, it just stamps bytes
> into the target file.

Is it possible to also make it write files out in partimage (
http://www.partimage.org/Main_Page ) format or will this slow it down too 
much?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18  1:13 bdar: efficiently backup allocated bytes in file systems Zach Brown
  2008-03-18  7:47 ` Sitsofe Wheeler
@ 2008-03-18 21:35 ` David Chinner
  2008-03-18 22:06   ` Zach Brown
  2008-03-19  2:58 ` Andreas Dilger
  2 siblings, 1 reply; 11+ messages in thread
From: David Chinner @ 2008-03-18 21:35 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-fsdevel

On Mon, Mar 17, 2008 at 06:13:27PM -0700, Zach Brown wrote:
> So, I had a fun time throwing together a utility last weekend.  I
> thought I'd share it sooner rather than later.
> 
> I found myself wanting to backup a copy of an ancient ~75g ext3 file
> system.  I got frustrated by of our utilities which don't saturate
> storage.  I wanted dd line rates but I also only wanted to copy
> referenced data.
> 
> So I threw something together which does that.  I made it work roughly
> like tar so that people have some idea what to expect.  So you can do
> something like:
> 
>  $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz
> ...
>  $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3
> 
> and it will do exactly what you would guess it would do after reading
> those command lines.
> 
> The bdar file format is just a header and then a series of regions of
> bytes described by their length and offset.  To create a bdar file from
> a file system bdar needs to know enough to figure out what extents are
> referenced.  Restoring a bdar is generic, though, it just stamps bytes
> into the target file.

Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS
filesystems....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18 21:35 ` David Chinner
@ 2008-03-18 22:06   ` Zach Brown
  2008-03-18 23:52     ` David Chinner
  2008-03-20  0:32     ` Ric Wheeler
  0 siblings, 2 replies; 11+ messages in thread
From: Zach Brown @ 2008-03-18 22:06 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-fsdevel


> Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS
> filesystems....

haha, yet another round of the -fsdevel XFS drinking game :)

Does xfs_copy tend to assert the XFS file format in the backup files it
generates?  One of the things I was hoping for with bdar was to have the
resulting copy image be agnostic.  It's just a sparse map with some
checksumming, really.

That limits what we can do, of course.  The current trivial format only
has one address space which doesn't fit well with the plans file systems
have of working with multiple addressable block ranges.

But I think I'm fine with that.  The value:complexity ratio of this
trivial version is refreshingly large.

- z

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18 22:06   ` Zach Brown
@ 2008-03-18 23:52     ` David Chinner
  2008-03-20  0:26       ` Szabolcs Szakacsits
  2008-03-20  1:13       ` Andreas Dilger
  2008-03-20  0:32     ` Ric Wheeler
  1 sibling, 2 replies; 11+ messages in thread
From: David Chinner @ 2008-03-18 23:52 UTC (permalink / raw)
  To: Zach Brown; +Cc: David Chinner, linux-fsdevel

On Tue, Mar 18, 2008 at 03:06:27PM -0700, Zach Brown wrote:
> 
> > Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS
> > filesystems....
> 
> haha, yet another round of the -fsdevel XFS drinking game :)

/me grins

> Does xfs_copy tend to assert the XFS file format in the backup files it
> generates?

Yes. If the destination is a file, the resultant image is a sparse
file that is a mountable XFS filesystem.

> One of the things I was hoping for with bdar was to have the
> resulting copy image be agnostic.  It's just a sparse map with some
> checksumming, really.

Yup - the use of a sparse file avoids the need for an internal map.
But that does't really work for piping the output, though.
xfs_metadump is probably more similar to bdar in that respect, but it
doesn't copy data....

FWIW, xfs_copy is really for efficient duplication of one source
disk to many destination disks in manufacturing, not so much as
a filesystem backup tool...

> That limits what we can do, of course.  The current trivial format only
> has one address space which doesn't fit well with the plans file systems
> have of working with multiple addressable block ranges.

Can't do everything ;)

> But I think I'm fine with that.  The value:complexity ratio of this
> trivial version is refreshingly large.

Agreed. 

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18  1:13 bdar: efficiently backup allocated bytes in file systems Zach Brown
  2008-03-18  7:47 ` Sitsofe Wheeler
  2008-03-18 21:35 ` David Chinner
@ 2008-03-19  2:58 ` Andreas Dilger
  2008-03-19  3:10   ` Zach Brown
  2 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2008-03-19  2:58 UTC (permalink / raw)
  To: Zach Brown; +Cc: linux-fsdevel

On Mar 17, 2008  18:13 -0700, Zach Brown wrote:
> So, I had a fun time throwing together a utility last weekend.  I
> thought I'd share it sooner rather than later.
> 
> I found myself wanting to backup a copy of an ancient ~75g ext3 file
> system.  I got frustrated by of our utilities which don't saturate
> storage.  I wanted dd line rates but I also only wanted to copy
> referenced data.
> 
> So I threw something together which does that.  I made it work roughly
> like tar so that people have some idea what to expect.  So you can do
> something like:
> 
>  $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz
> ...
>  $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3
> 
> and it will do exactly what you would guess it would do after reading
> those command lines.
> 
> The bdar file format is just a header and then a series of regions of
> bytes described by their length and offset.  To create a bdar file from
> a file system bdar needs to know enough to figure out what extents are
> referenced.  Restoring a bdar is generic, though, it just stamps bytes
> into the target file.

So the question is whether the ".bdar" file is specific to the filesystem
being backed up, and if it only allows backing up the whole filesystem?
Does it create a dense output file or a sparse one?  Does it store the
data as chunks of blocks in a full-device map or on a per file basis?

If you can't restore a .bdar backup file to a smaller device than the
source device that makes it less useful than most of the other tools.

> I only taught it the most basic knowledge of ext[234].  Just enough to
> show that generating the bdar is ~4x faster than tar and ~2x faster than
> dump :).  There's still some available disk bandwidth to consume with
> read-ahead, but it's pretty close.  (single spindle, ~5g of kernel
> trees, beefy cpus.)

The question is whether the 2x speed improvement is worth the lack of
portability compared to even dump?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-19  2:58 ` Andreas Dilger
@ 2008-03-19  3:10   ` Zach Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Zach Brown @ 2008-03-19  3:10 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-fsdevel


> So the question is whether the ".bdar" file is specific to the filesystem
> being backed up, and if it only allows backing up the whole filesystem?
> Does it create a dense output file or a sparse one?  Does it store the
> data as chunks of blocks in a full-device map or on a per file basis?

Let's see.. yes, yes, dense, full-device.  It's a tiny little program,
you could read it in 10 minutes :).  You'll giggle at how incomplete its
knowledge of ext* is.

> If you can't restore a .bdar backup file to a smaller device than the
> source device that makes it less useful than most of the other tools.

It is different in a way that won't satisfy people who want to restore
to smaller devices, yes.

> The question is whether the 2x speed improvement is worth the lack of
> portability compared to even dump?

Indeed.  For me, in my current situation, it is.

- z

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18 23:52     ` David Chinner
@ 2008-03-20  0:26       ` Szabolcs Szakacsits
  2008-03-20  1:13       ` Andreas Dilger
  1 sibling, 0 replies; 11+ messages in thread
From: Szabolcs Szakacsits @ 2008-03-20  0:26 UTC (permalink / raw)
  To: David Chinner; +Cc: Zach Brown, linux-fsdevel


On Wed, 19 Mar 2008, David Chinner wrote:

> Yup - the use of a sparse file avoids the need for an internal map.
> But that does't really work for piping the output, though.
> xfs_metadump is probably more similar to bdar in that respect, but it
> doesn't copy data....

Ntfsclone also efficiently copies data or metadata to dense, mountable 
sparse or a device file since 2003. 

It was an important factor to get reliable NTFS write support sooner 
because NTFS is slightly bigger than the other file systems (the size of 
the Microsoft NTFS driver is almost like the sum of the 60 Linux file 
systems altogether) and we needed to handle the heavily used (meta)data 
images efficiently during development and quality assurance.

	Szaka

-- 
NTFS-3G:  http://ntfs-3g.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18 22:06   ` Zach Brown
  2008-03-18 23:52     ` David Chinner
@ 2008-03-20  0:32     ` Ric Wheeler
  1 sibling, 0 replies; 11+ messages in thread
From: Ric Wheeler @ 2008-03-20  0:32 UTC (permalink / raw)
  To: Zach Brown; +Cc: David Chinner, linux-fsdevel

Zach Brown wrote:
>> Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS
>> filesystems....
> 
> haha, yet another round of the -fsdevel XFS drinking game :)
> 
> Does xfs_copy tend to assert the XFS file format in the backup files it
> generates?  One of the things I was hoping for with bdar was to have the
> resulting copy image be agnostic.  It's just a sparse map with some
> checksumming, really.
> 
> That limits what we can do, of course.  The current trivial format only
> has one address space which doesn't fit well with the plans file systems
> have of working with multiple addressable block ranges.
> 
> But I think I'm fine with that.  The value:complexity ratio of this
> trivial version is refreshingly large.
> 
> - z

About a year back, I was trying various ways to read every file on a 
fairly massive (reiserfs v3) file system (order of tens of millions of 
files).

I don't recall how close I came to native dd speed, but I could get a 
substantial win by grabbing a substantial chunk of files (say 5000), 
sort them by either inode number or creation time, and then read them in 
that order.

We had some good experience with this, but our use case has no sparse 
files and tends to have lots of little or medium sized files.  This only 
did the read phase, but the basic assumption is that the file system 
will tend to allocate disk sectors in sequential order over time and 
this gave a fairly close approximation of that ;-)

ric


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18 23:52     ` David Chinner
  2008-03-20  0:26       ` Szabolcs Szakacsits
@ 2008-03-20  1:13       ` Andreas Dilger
  1 sibling, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2008-03-20  1:13 UTC (permalink / raw)
  To: David Chinner; +Cc: Zach Brown, linux-fsdevel

On Mar 19, 2008  10:52 +1100, David Chinner wrote:
> On Tue, Mar 18, 2008 at 03:06:27PM -0700, Zach Brown wrote:
> > Does xfs_copy tend to assert the XFS file format in the backup files it
> > generates?
> 
> Yes. If the destination is a file, the resultant image is a sparse
> file that is a mountable XFS filesystem.

The "e2image" program also works in that manner.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: bdar: efficiently backup allocated bytes in file systems
  2008-03-18  7:47 ` Sitsofe Wheeler
@ 2008-03-20 16:25   ` Zach Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Zach Brown @ 2008-03-20 16:25 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: linux-fsdevel


> Is it possible to also make it write files out in partimage (
> http://www.partimage.org/Main_Page ) format or will this slow it down too 
> much?

I'm not sure what the point would be.. why not just use partimage to
write out partimage files?

- z

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-03-20 16:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-18  1:13 bdar: efficiently backup allocated bytes in file systems Zach Brown
2008-03-18  7:47 ` Sitsofe Wheeler
2008-03-20 16:25   ` Zach Brown
2008-03-18 21:35 ` David Chinner
2008-03-18 22:06   ` Zach Brown
2008-03-18 23:52     ` David Chinner
2008-03-20  0:26       ` Szabolcs Szakacsits
2008-03-20  1:13       ` Andreas Dilger
2008-03-20  0:32     ` Ric Wheeler
2008-03-19  2:58 ` Andreas Dilger
2008-03-19  3:10   ` Zach Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).