From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p0L1vYc1188746 for ; Thu, 20 Jan 2011 19:57:34 -0600 Received: from ipmail06.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id A101727534F for ; Thu, 20 Jan 2011 17:59:53 -0800 (PST) Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id q7G9CJinnGqhcZ6U for ; Thu, 20 Jan 2011 17:59:53 -0800 (PST) Date: Fri, 21 Jan 2011 12:59:40 +1100 From: Dave Chinner Subject: Re: Issues with delalloc->real extent allocation Message-ID: <20110121015940.GX16267@dastard> References: <20110114002900.GF16267@dastard> <20110114214334.GN28274@sgi.com> <20110114235549.GI16267@dastard> <20110118204752.GB28791@infradead.org> <20110118231831.GZ28803@dastard> <20110119120321.GC12941@infradead.org> <20110119133147.GN16267@dastard> <20110119135548.GA11502@infradead.org> <20110120013346.GO16267@dastard> <20110120111612.GA14571@infradead.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110120111612.GA14571@infradead.org> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Christoph Hellwig Cc: bpm@sgi.com, xfs@oss.sgi.com On Thu, Jan 20, 2011 at 06:16:12AM -0500, Christoph Hellwig wrote: > On Thu, Jan 20, 2011 at 12:33:46PM +1100, Dave Chinner wrote: > > It's case b) that I'm mainly worried about, esp. w.r.t the 64k page > > size on ia64/ppc. If we only track a single dirty bit in the page, > > then every sub-page, non-appending write to an uncached region of a > > file becomes a RMW cycle to initialise the areas around the write > > correctly. The question is whether we care about this enough given > > that we return at least PAGE_SIZE in stat() to tell applications the > > optimal IO size to avoid RMW cycles. > > Note that this generally is only true for the first write into the > region - after that we'll have the rest read into the cache. But > we also have the same issue for appending writes if they aren't > page aligned. True - I kind of implied that by saying RMW cycles are limited to "uncached regions", but you've stated in a much clearer and easier to understand way. ;) > > And if we only do IO on whole pages (i.e regardless of block size) > > .writepage suddenly becomes a lot simpler, as well as being trivial > > to implement our own .readpage/.readpages.... > > I don't think it simplifies writepage a lot. All the buffer head > handling goes away, but we'll still need to do xfs_bmapi calls at > block size granularity. Why would you want to replaced the > readpage/readpages code? The generic mpage helpers for it do just fine. When I went through the mpage code I found there were cases that it would attached bufferheads to pages or assume PagePrivate() contains a bufferhead list. e.g. If there are multiple holes in the page, it will fall through to block_read_full_page() which makes this assumption. If we want/need to keep any of our own state on PagePrivate(), we cannot use any function that assumes PagePrivate() is used to hold bufferheads for the page. Quite frankly, a simple extent mapping loop like we do for .writepage is far simpler than what mpage_readpages does. This is what btrfs does (extent_readpages/__extent_read_full_page), and that is far easier to follow and understand than mpage_do_readpage().... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs