From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Altaparmakov Subject: Re: Efficient handling of sparse files Date: Tue, 1 Mar 2005 07:50:16 +0000 (GMT) Message-ID: References: <20050228174149.GA28741@parcelfarce.linux.theplanet.co.uk> <422384D7.4060002@zabbo.net> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: Matthew Wilcox , linux-fsdevel@vger.kernel.org Received: from ppsw-8.csi.cam.ac.uk ([131.111.8.138]:14819 "EHLO ppsw-8.csi.cam.ac.uk") by vger.kernel.org with ESMTP id S261258AbVCAHuV (ORCPT ); Tue, 1 Mar 2005 02:50:21 -0500 To: Zach Brown In-Reply-To: <422384D7.4060002@zabbo.net> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 28 Feb 2005, Zach Brown wrote: > > Please keep one thing in mind and that is that there are file systems > > where ->bmap actually makes no sense whatsoever > > Of course, so return -ESORRY. Ah but it gets even worse, ->bmap uses 0 to mean sparse when in NTFS 0 is a valid block so it cannot be sparse, sparse needs its own namespace outside 0-2^63-1. Internally in NTFS I use -1 for sparse for example. > > This is one of the reasons why noone should be using ->bmap. It is a > > stupid interface that only fits very particular sets of file systems and > > cannot be applied generically. > > No, it's a reason to only ask about the details of block mapping in > cases where it actually makes sense (like, wanting to find out of > concurrent file extension is getting good batched contiguous allocation, > etc). Just because file systems x, y, and z can't answer the question > meaningfully doesn't mean it isn't a reasonble thing to ask of file > systems m, n, and o. > > Now, I'm not at all opposed to an explicit sparse-testing interface that > doesn't confuse that functionality with querying specific block mappings. That's cool. Just an array of zero-data[file offset, length]n would be sufficient, no? Note I think doing just sparse is not as good as doing zero-data because on NTFS you can have on-disk blocks allocated but the file can be marked as empty beyond a certain offset (in that case the on-disk blocks contain random garbage and the driver just knows to fill in with zero on read - i.e. no disk access needed at all on reads and to do actual writes to disk on non-zero write). This is actually used in Windows by applications (office and outlook for example) that want to have guaranteed storage allocations (e.g. for the mail INBOX) so deliveries cannot fail but want to efficiently clear the file contents beyond a certain offset. I suppose the ntfs driver could simply pretend that this non-initialized but allocated space is sparse if a sys_get_sparse_regions() rather than a sys_get_zero_regions() is implemented so it wouldn't be such a big problem. Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/