From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755343AbXD0FQ3 (ORCPT ); Fri, 27 Apr 2007 01:16:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755344AbXD0FQ3 (ORCPT ); Fri, 27 Apr 2007 01:16:29 -0400 Received: from smtp1.linux-foundation.org ([65.172.181.25]:39419 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755343AbXD0FQ1 (ORCPT ); Fri, 27 Apr 2007 01:16:27 -0400 Date: Thu, 26 Apr 2007 22:15:28 -0700 From: Andrew Morton To: David Chinner Cc: clameter@sgi.com, linux-kernel@vger.kernel.org, Mel Gorman , William Lee Irwin III , Jens Axboe , Badari Pulavarty , Maxim Levitsky Subject: Re: [00/17] Large Blocksize Support V3 Message-Id: <20070426221528.655d79cb.akpm@linux-foundation.org> In-Reply-To: <20070427042046.GI65285596@melbourne.sgi.com> References: <20070424222105.883597089@sgi.com> <20070426190438.3a856220.akpm@linux-foundation.org> <20070427022731.GF65285596@melbourne.sgi.com> <20070426195357.597ffd7e.akpm@linux-foundation.org> <20070427042046.GI65285596@melbourne.sgi.com> X-Mailer: Sylpheed version 2.2.7 (GTK+ 2.8.17; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 27 Apr 2007 14:20:46 +1000 David Chinner wrote: > > blocksizes via this scheme - instantiate and lock four pages and go for > > it. > > So now how do you get block aligned writeback? in writeback and pageout: if (page->index & mapping->block_size_mask) continue; > Or make sure that truncate > doesn't race on a partial *block* truncate? lock four pages > You basically have to > jump through nasty, nasty hoops, to handle corner cases that are introduced > because the generic code can no longer reliably lock out access to a > filesystem block. > > Eventually you end up with something like fs/xfs/linux-2.6/xfs_buf.c and > doing everything inside the filesystem because it's the only way sane > way to serialise access to these aggregated structures. This is > the way XFS used to work in it's data path, and we all know how long > and loud people complained about that..... > > A filesystem specific aggregation mechanism is not a palatable solution > here because it drives filesystems away from being able to use generic > code. I would expect we could (should) implement this in generic code by modifying the existing stuff. I'm not saying it's especially simple, nor fast. But it has the advantage that we're not forced to use larger pages with _it's_ attendant performance problems. And it will benefit all filesystems immediately. And it doesn't introduce a rather nasty hack of pretending (in some places) that pages are larger than they really are. And it has the very significant advantage that it doesn't introduce brand new concepts and some complexity into core MM. And make no mistake: the latter disadvantage is huge. Because if we do the PAGE_CACHE_SIZE hack (sorry, but it _is_), we have to do it *for ever*. Maintaining and enhancing core MM and VFS becomes harder and more costly and slower and more buggy *for ever*. The ramp for people to become competent on core MM becomes longer. Our developer pool becomes smaller, and proportionally less skilled. And hardware gets better. If Intel & AMD come out with a 16k pagesize option in a couple of years we'll look pretty dumb. If the problems which you're presently having with that controller get sorted out in the next generation of the hardware, we'll also look pretty dumb. As always, there are tradeoffs. We can see the cons, and they are very significant. We don't yet know the pros. Perhaps they will be similarly significant. But I don't believe that the larger PAGE_CACHE_SIZE hack (sorry) is the only way in which they can be realised.