From mboxrd@z Thu Jan 1 00:00:00 1970 From: Badari Pulavarty Subject: Re: Lazy block allocation and block_prepare_write? Date: Mon, 18 Apr 2005 20:01:24 -0700 Message-ID: <42647484.5040208@us.ibm.com> References: <8e70aacf05041717546fdff3f@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-fsdevel@vger.kernel.org Return-path: Received: from e35.co.us.ibm.com ([32.97.110.133]:41366 "EHLO e35.co.us.ibm.com") by vger.kernel.org with ESMTP id S261295AbVDSDB0 (ORCPT ); Mon, 18 Apr 2005 23:01:26 -0400 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e35.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j3J31PLg472060 for ; Mon, 18 Apr 2005 23:01:25 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j3J31PIA350794 for ; Mon, 18 Apr 2005 21:01:25 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j3J31OBm017557 for ; Mon, 18 Apr 2005 21:01:24 -0600 To: Martin Jambor In-Reply-To: <8e70aacf05041717546fdff3f@mail.gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Martin Jambor wrote: > Hi all, > > I am a member of a group that implements a filesystem that allocates > disk blocks to in-memory blocks lazily, that means, the decision is > made just before the data are actually sent to disk. Moreover, when > cached pages are modified, the data can be (and almost certainly will > be) written to a different place to from where it was read. > > I was wondering, whether we could use the generic function > block_prepare_write at all. The function checks every buffer of the > page and if it is not mapped, it calls a fs supplied function that is > supposed to map the buffer, i.e. assign it a block on the device and > set its mapped flag. > > This is where we would like to give an error if there is not enough > free disk space left but we cannot give a specific device block number > yet. Can we make one up, such as -1? What would that do to such dark > functions as unmap_underlying_metadata or any other? Would some other > part of kernel break if there was a bunch of buffers assigned to the > same spot on the disk? > > On the other hand, if I understand buffer flags correctly, I need to > be able to emulate mapping of buffers to set them dirty, or em I > wrong? > > Thanks for any insight or thoughts, Yes. Its possible to do what you want to. I am currently working on adding "delayed allocation" support to ext3. As part of that, We are modifying generic helper routines to delay the allocation from prepare time to actual writeout time. (writepage). Here is the basic idea: ======================= The idea is to "reserve" a block at the prepare/commit write instead of allocating the block. Do the actual allocation in writepage(). Sounds simple :) Here are the issues: ==================== 1) Currently none of the generic helper routines can handle this. We need to add support to do these, but still somehow make the routines generic enough for every ones use. 2) There is no easy way to find out if we "reserved" a block or not in writepage() correctly. There are 2 paths to writepage(). sys_write() -> prepare/commit() and later sync() ----> writepage() mmap() -> touch a page() and later --> writepage() In order to do the correct accounting, we need to mark a page to indicate if we reserved a block or not. One way to do this, to use page->private to indicate this. But then, all the generic routines will fail - since they assume that page->private represents bufferheads. So we need a better way to do this. 3) We need add hooks into filesystem specific calls from these generic routines to handle "journaling mode" requirements (for ext3 and may be others). So, what are your requirements ? I am looking for a common way to combine all the requirements and come out with a saner "generic" routines to handle these. Thanks, Badari