From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030778AbXD1TTv (ORCPT ); Sat, 28 Apr 2007 15:19:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1031190AbXD1TTs (ORCPT ); Sat, 28 Apr 2007 15:19:48 -0400 Received: from holomorphy.com ([66.93.40.71]:53304 "EHLO holomorphy.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030778AbXD1TTk (ORCPT ); Sat, 28 Apr 2007 15:19:40 -0400 Date: Sat, 28 Apr 2007 12:19:56 -0700 From: William Lee Irwin III To: Andrew Morton Cc: Peter Zijlstra , Nick Piggin , David Chinner , Christoph Lameter , linux-kernel@vger.kernel.org, Mel Gorman , Jens Axboe , Badari Pulavarty , Maxim Levitsky Subject: Re: [00/17] Large Blocksize Support V3 Message-ID: <20070428191956.GY31925@holomorphy.com> References: <20070427002640.22a71d06.akpm@linux-foundation.org> <20070427163620.GI32602149@melbourne.sgi.com> <20070427173432.GJ32602149@melbourne.sgi.com> <20070427121108.9ee05710.akpm@linux-foundation.org> <4632A6DF.7080301@yahoo.com.au> <1177747448.28223.26.camel@twins> <20070428012251.fae10a71.akpm@linux-foundation.org> <20070428140907.GU19966@holomorphy.com> <20070428112640.5b92b995.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070428112640.5b92b995.akpm@linux-foundation.org> Organization: The Domain of Holomorphy User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III wrote: >> The gang allocation affair would may also want to make the calls into >> the page allocator batched. For instance, grab enough compound pages to >> build the gang under the lock, since we're going to blow the per-cpu >> lists with so many pages, then break the compound pages up outside the >> zone->lock. On Sat, Apr 28, 2007 at 11:26:40AM -0700, Andrew Morton wrote: > Sure, but... > Allocating a single order-3 (say) page _is_ a form of batching Sorry, I should clarify here. If we fall back, we may still want to get all the pages together. For instance, if we can't get an order 3, grab an order 2, then if a second order 2 doesn't pan out, an order 1, and so on, until as many pages as requested are allocated or an allocation failure occurs. Also, passing around the results linked together into a list vs. e.g. filling an array has the advantage of splice operations under the lock, though arrays can catch up for the most part if their elements are allowed to vary in terms of the orders of the pages. On Sat, Apr 28, 2007 at 11:26:40AM -0700, Andrew Morton wrote: > We don't want compound pages here: just higher-order ones > Higher-order allocations bypass the per-cpu lists Sorry again. I conflated the two, and failed to take the use of higher-order pages as an assumption as I should've. On Sat, 28 Apr 2007 07:09:07 -0700 William Lee Irwin III wrote: >> I think it'd be good to have some corresponding tactics for freeing as >> well. On Sat, Apr 28, 2007 at 11:26:40AM -0700, Andrew Morton wrote: > hm, hadn't thought about that - would need to peek at contiguous pages in > the pagecache and see if we can gang-free them as higher-order pages. > The place to do that is perhaps inside the per-cpu magazines: it's more > general. Dunno if it would net advantageous though. What I was hoping for was an interface to hand back groups of pages at a time which would then do contiguity detection if advantageous, and if not, just assembles the pages into something that can be slung around more quickly under the lock. Essentially doing small bits of the buddy system's work for it outside the lock. Arrays make more sense here, as it's relatively easy to do contiguity detection by heapifying them and dequeueing in order in preparation for work under the lock. There is an issue in that reclaim is not organized in such a fashion as to issue calls to such freeing functions. An implicit effect of this sort could be achieved by maintaining the pcp lists as an array-based deque via duelling heap arrays with reversed comparators if an appropriate deque structure for sets as small as the pcp arrays can't be dredged up, or an auxiliary adjacency detection structure. I'm skeptical, however, that the contiguity gains will compensate for the CPU required to do such with the pcp lists. I think rather that users of an interface for likely-contiguous batched freeing would be better to arrange provided reclaim in such manners makes sense from the standpoint of IO. Gang freeing in general could do adjacency detection without disturbing the characteristics of the pcp lists, though it, too, may not be productive without some specific notion of whether contiguity is likely. For instance, quicklist_trim() could readily use gang freeing, but it's not likely to have much in the way of contiguity. These sorts of algorithmic concerns are probably not quite as pressing as the general notion of trying to establish some sort of contiguity, so I'm by no means insistent on any of this. -- wli