From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Altaparmakov Subject: Re: Efficient handling of sparse files Date: Mon, 28 Feb 2005 20:40:57 +0000 (GMT) Message-ID: References: <20050228174149.GA28741@parcelfarce.linux.theplanet.co.uk> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: linux-fsdevel@vger.kernel.org Received: from ppsw-8.csi.cam.ac.uk ([131.111.8.138]:49891 "EHLO ppsw-8.csi.cam.ac.uk") by vger.kernel.org with ESMTP id S261728AbVB1UlC (ORCPT ); Mon, 28 Feb 2005 15:41:02 -0500 To: Matthew Wilcox In-Reply-To: <20050228174149.GA28741@parcelfarce.linux.theplanet.co.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Mon, 28 Feb 2005, Matthew Wilcox wrote: > This problem came up with the systemimager program which uses rsync to > install files from a master server to many clients. Red Hat has a system > user with uid 2^32-1 which causes lastlog to grow to 1.2GB in size. > rsync does understand the concept of sparse files (with the -S flag), but > it has to read every block to discover that it is indeed empty. This sucks. > > I was wondering if we could introduce a new system call (or ioctl?) that, > given an fd would find the next block with data in it. We could use the > ->bmap method ... except that has dire warnings about adding new callers > and viro may soon be in testicle-gouging range. > > One system interface hack would be to introduce lseek(fd, 0, SEEK_DATA) > ... but without permission to reuse ->bmap for this purpose, it's > pointless to discuss user interfaces. > > Suggestions? Please keep one thing in mind and that is that there are file systems where ->bmap actually makes no sense whatsoever - for example NTFS where you can have compressed or encrypted file in both of which you do not have any blocks on disk where you can read/write the actual data and in addition to those you also have resident files where the file content itself is stored inside the on-disk inode (at a variable and unaligned offset) so here again there is no block a ->bmap could return that would only contain the file data - it would also contain metadata and the data would certainly not start at a block boundary. This is one of the reasons why noone should be using ->bmap. It is a stupid interface that only fits very particular sets of file systems and cannot be applied generically. Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/