From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zach Brown Subject: bdar: efficiently backup allocated bytes in file systems Date: Mon, 17 Mar 2008 18:13:27 -0700 Message-ID: <47DF1737.2050700@zabbo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit To: linux-fsdevel@vger.kernel.org Return-path: Received: from tetsuo.zabbo.net ([207.173.201.20]:44683 "EHLO tetsuo.zabbo.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752444AbYCRBN2 (ORCPT ); Mon, 17 Mar 2008 21:13:28 -0400 Received: from Macintosh.local (unknown [192.168.110.240]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tetsuo.zabbo.net (Postfix) with ESMTP id 431B5D1049F for ; Mon, 17 Mar 2008 18:13:27 -0700 (PDT) Sender: linux-fsdevel-owner@vger.kernel.org List-ID: So, I had a fun time throwing together a utility last weekend. I thought I'd share it sooner rather than later. I found myself wanting to backup a copy of an ancient ~75g ext3 file system. I got frustrated by of our utilities which don't saturate storage. I wanted dd line rates but I also only wanted to copy referenced data. So I threw something together which does that. I made it work roughly like tar so that people have some idea what to expect. So you can do something like: $ bdar -cf - /dev/sda3 | gzip -c > /tmp/sda3-backup.bdar.gz ... $ zcat /tmp/sda3-backup.bdar.gz | bdar -xf - /dev/sda3 and it will do exactly what you would guess it would do after reading those command lines. The bdar file format is just a header and then a series of regions of bytes described by their length and offset. To create a bdar file from a file system bdar needs to know enough to figure out what extents are referenced. Restoring a bdar is generic, though, it just stamps bytes into the target file. I only taught it the most basic knowledge of ext[234]. Just enough to show that generating the bdar is ~4x faster than tar and ~2x faster than dump :). There's still some available disk bandwidth to consume with read-ahead, but it's pretty close. (single spindle, ~5g of kernel trees, beefy cpus.) I'm going to continue hacking this into something which could be trusted with data but not on any rigorous schedule. I thought I would put it up for others to get a look at and, hopefully, contribute to. There's a lot of fun stuff we can do. It's in a mercurial repo: http://www.zabbo.net/hg/bdar $ hg clone http://www.zabbo.net/hg/bdar ; ls ./bdar Let me know if you give it a try, I'm interested in all feedback. - z