From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: bdar: efficiently backup allocated bytes in file systems Date: Wed, 19 Mar 2008 20:32:15 -0400 Message-ID: <47E1B08F.5090706@emc.com> References: <47DF1737.2050700@zabbo.net> <20080318213543.GC155407@sgi.com> <47E03CE3.3080903@zabbo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: David Chinner , linux-fsdevel@vger.kernel.org To: Zach Brown Return-path: Received: from mexforward.lss.emc.com ([128.222.32.20]:16915 "EHLO mexforward.lss.emc.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938595AbYCTAcW (ORCPT ); Wed, 19 Mar 2008 20:32:22 -0400 In-Reply-To: <47E03CE3.3080903@zabbo.net> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Zach Brown wrote: >> Neat, Zach. You should look at xfs_copy - it does pretty much this for XFS >> filesystems.... > > haha, yet another round of the -fsdevel XFS drinking game :) > > Does xfs_copy tend to assert the XFS file format in the backup files it > generates? One of the things I was hoping for with bdar was to have the > resulting copy image be agnostic. It's just a sparse map with some > checksumming, really. > > That limits what we can do, of course. The current trivial format only > has one address space which doesn't fit well with the plans file systems > have of working with multiple addressable block ranges. > > But I think I'm fine with that. The value:complexity ratio of this > trivial version is refreshingly large. > > - z About a year back, I was trying various ways to read every file on a fairly massive (reiserfs v3) file system (order of tens of millions of files). I don't recall how close I came to native dd speed, but I could get a substantial win by grabbing a substantial chunk of files (say 5000), sort them by either inode number or creation time, and then read them in that order. We had some good experience with this, but our use case has no sparse files and tends to have lots of little or medium sized files. This only did the read phase, but the basic assumption is that the file system will tend to allocate disk sectors in sequential order over time and this gave a fairly close approximation of that ;-) ric