From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ric Wheeler <ric@emc.com>
Subject: Re: bdar: efficiently backup allocated bytes in file systems
Date: Wed, 19 Mar 2008 20:32:15 -0400
Message-ID: <47E1B08F.5090706@emc.com>
References: <47DF1737.2050700@zabbo.net> <20080318213543.GC155407@sgi.com> <47E03CE3.3080903@zabbo.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: David Chinner <dgc@sgi.com>, linux-fsdevel@vger.kernel.org
To: Zach Brown <zab@zabbo.net>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mexforward.lss.emc.com ([128.222.32.20]:16915 "EHLO
	mexforward.lss.emc.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S938595AbYCTAcW (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Wed, 19 Mar 2008 20:32:22 -0400
In-Reply-To: <47E03CE3.3080903@zabbo.net>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Zach Brown wrote:
>> Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS
>> filesystems....
> 
> haha, yet another round of the -fsdevel XFS drinking game :)
> 
> Does xfs_copy tend to assert the XFS file format in the backup files it
> generates?  One of the things I was hoping for with bdar was to have the
> resulting copy image be agnostic.  It's just a sparse map with some
> checksumming, really.
> 
> That limits what we can do, of course.  The current trivial format only
> has one address space which doesn't fit well with the plans file systems
> have of working with multiple addressable block ranges.
> 
> But I think I'm fine with that.  The value:complexity ratio of this
> trivial version is refreshingly large.
> 
> - z

About a year back, I was trying various ways to read every file on a 
fairly massive (reiserfs v3) file system (order of tens of millions of 
files).

I don't recall how close I came to native dd speed, but I could get a 
substantial win by grabbing a substantial chunk of files (say 5000), 
sort them by either inode number or creation time, and then read them in 
that order.

We had some good experience with this, but our use case has no sparse 
files and tends to have lots of little or medium sized files.  This only 
did the read phase, but the basic assumption is that the file system 
will tend to allocate disk sectors in sequential order over time and 
this gave a fairly close approximation of that ;-)

ric