From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Chinner Subject: Re: Proposal to improve filesystem/block snapshot interaction Date: Wed, 31 Oct 2007 18:04:02 +1100 Message-ID: <20071031070402.GZ995458@sgi.com> References: <20070927063113.GD2989@sgi.com> <20071030010453.GF27385@sgi.com> <18214.45062.754722.885137@notabene.brown> <20071030235652.GY995458@sgi.com> <20071031040158.GC9041@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Chinner , Neil Brown , Linux Filesystem Mailing List , David Chinner , Donald Douwsma , Christoph Hellwig , Roger Strassburg , Mark Goodwin , Brett Jon Grandbois , Arnd Bergmann To: Greg Banks Return-path: Received: from netops-testserver-4-out.sgi.com ([192.48.171.29]:41442 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752914AbXJaHEQ (ORCPT ); Wed, 31 Oct 2007 03:04:16 -0400 Content-Disposition: inline In-Reply-To: <20071031040158.GC9041@sgi.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Oct 31, 2007 at 03:01:58PM +1100, Greg Banks wrote: > On Wed, Oct 31, 2007 at 10:56:52AM +1100, David Chinner wrote: > > On Tue, Oct 30, 2007 at 03:16:06PM +1100, Neil Brown wrote: > > > On Tuesday October 30, gnb@sgi.com wrote: > > > > BIO_HINT_RELEASE > > > > The bio's block extent is no longer in use by the filesystem > > > > and will not be read in the future. Any storage used to back > > > > the extent may be released without any threat to filesystem > > > > or data integrity. > > > > > > If the allocation unit of the storage device (e.g. a few MB) does not > > > match the allocation unit of the filesystem (e.g. a few KB) then for > > > this to be useful either the storage device must start recording tiny > > > allocations, or the filesystem should re-release areas as they grow. > > > i.e. when releasing a range of a device, look in the filesystem's usage > > > records for the largest surrounding free space, and release all of that. > > > > I figured that the easiest way around this is reporting free space > > extents, not the amoutn actually freed. e.g. > > > > 4k in file A @ block 10 > > 4k in file B @ block 11 > > 4k free space @ block 12 > > 4k in file C @ block 13 > > 1008k in free space at block 14. > > > > If we free file A, we report that we've released an extent of 4k @ block 10. > > if we then free file B, we report we've released an extent of 12k @ block 10. > > If we then free file C, we report a release of 1024k @ block 10. > > > > Then the underlying device knows what the aggregated free space regions > > are and can easily release large regions without needing to track tiny > > allocations and frees done by the filesystem. > > If you could do that in the filesystem, it certainly solve the problem. > In which case I'll explicitly allow for the hint's extent to overlap > extents previous extents thus hinted, and define the semantics > for overlaps. I think I'll rename the hint to BIO_HINT_RELEASED, > I think that will make the semantics a little clearer. I think that can be done - i wouldn't have mentioned it if I didn't think it was possible to implement ;). It will require a further btree lookup once the free transaction hits the disk, but I think that's pretty easy to do. I'd probably hook xfs_alloc_clear_busy() to do this. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group