From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754705AbYESFqh (ORCPT ); Mon, 19 May 2008 01:46:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752686AbYESFq2 (ORCPT ); Mon, 19 May 2008 01:46:28 -0400 Received: from relay2.sgi.com ([192.48.171.30]:35124 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752584AbYESFq0 (ORCPT ); Mon, 19 May 2008 01:46:26 -0400 Date: Mon, 19 May 2008 15:45:54 +1000 From: David Chinner To: Christoph Lameter Cc: David Chinner , akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Mel Gorman , andi@firstfloor.org, Rik van Riel , Pekka Enberg , mpm@selenic.com Subject: Re: [patch 10/21] buffer heads: Support slab defrag Message-ID: <20080519054554.GY103491721@sgi.com> References: <20080510030831.796641881@sgi.com> <20080510030916.935905242@sgi.com> <20080512002403.GP103491721@sgi.com> <20080515231045.GY155679365@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 16, 2008 at 10:01:38AM -0700, Christoph Lameter wrote: > On Fri, 16 May 2008, David Chinner wrote: > > > On Thu, May 15, 2008 at 10:42:15AM -0700, Christoph Lameter wrote: > > > On Mon, 12 May 2008, David Chinner wrote: > > > > > > > If you are going to clean bufferheads (or pages), please clean entire > > > > mappings via ->writepages as it leads to far superior I/O patterns > > > > and a far higher aggregate rate of page cleaning..... > > > > > > That brings up another issue: Lets say I use writepages on a large file > > > (couple of gig). How much do you want to write back? > > > > We're out of memory. I'd suggest write backing as much as you can > > without blocking. e.g. treat it like pdflush and say 1024 pages, or > > like balance_dirty_pages() and write a 'write_chunk' back from the > > mapping (i.e. sync_writeback_pages()). > > Why are we out of memory? Defragmentation is triggered as part of the usual memory reclaim process. Which implies we've run out of free memory, correct? > How do you trigger such a special writeout? filemap_fdatawrite_range() perhaps? > > Any of these are better from an I/O perspective than single page > > writeback.... > > But then filesystem can do tricks like writing out the surrounding areas > as needed. The filesystem likely can estimate better how much writeout > makes sense. Pushing write-around into a method that is only supposed to write the single page that is passed to it is a pretty bad abuse of the API. Especially as we have many simple, ranged writeback methods you could call. filemap_fdatawrite_range(), do_writepages(), ->writepages, etc. FWIW, look at the mess of layering violations that write clustering causes in XFS because we have to do this to keep allocation overhead and fragmentation down to a minimum. It's a nasty hack to mitigate the impact of the awful I/O patterns we see from the VM - suggesting that all filesystems do this just so you don't have to call a slightly smarter writeback primitive is insane.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group