From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S267864AbUHNBVs (ORCPT ); Fri, 13 Aug 2004 21:21:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267867AbUHNBVs (ORCPT ); Fri, 13 Aug 2004 21:21:48 -0400 Received: from smtp104.mail.sc5.yahoo.com ([66.163.169.223]:46433 "HELO smtp104.mail.sc5.yahoo.com") by vger.kernel.org with SMTP id S267864AbUHNBVp (ORCPT ); Fri, 13 Aug 2004 21:21:45 -0400 Message-ID: <411D6920.9050007@yahoo.com.au> Date: Sat, 14 Aug 2004 11:21:36 +1000 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040707 Debian/1.7-5 X-Accept-Language: en MIME-Version: 1.0 To: "Martin J. Bligh" CC: Jesse Barnes , akpm@osdl.org, linux-kernel@vger.kernel.org, steiner@sgi.com Subject: Re: [PATCH] allocate page caches pages in round robin fasion References: <200408121646.50740.jbarnes@engr.sgi.com> <200408130859.24637.jbarnes@engr.sgi.com> <89760000.1092414010@[10.10.2.4]> <200408130934.20913.jbarnes@engr.sgi.com> <92140000.1092415652@[10.10.2.4]> <411CFB04.603@yahoo.com.au> <75260000.1092431774@flay> In-Reply-To: <75260000.1092431774@flay> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Martin J. Bligh wrote: >>>Well, either we're: >>> >>>1. Falling back and putting all our most recent accesses off-node. >>> >>>or. >>> >>>2. Not falling back and only able to use one node's memory for any one >>>(single threaded) app. >>> >>>Either situation is crap, though I'm not sure which turd we picked right >>>now ... I'd have to look at the code again ;-) I thought it was 2, but >>>I might be wrong. >>> >> >>I'm looking at this now. We are doing 1 currently. > > > In theory, yes. In practice, I have a feeling kswapd will keep us above > the level of free memory where we'd fall back to another zone to allocate, > won't it? > Nope. Take a look at the first loop-through-the-zones in alloc_pages (preferably in akpm's tree that is cleaned up a bit). We go through *all* zones first and allocate them down to pages_low before kicking kswapd. I have tried kicking kswapd before going off node, but it frees memory really aggressively - so you're nearly left with a local alloc policy. > >>There are a couple of issues. The first is that you need to minimise >>regressions for when working set size is bigger than the local node. > > > Good point ... that is, indeed, a total bitch to fix. > At the end of the day we'll possibly just have to have a sysctl. I don't think all regressions could be eliminated completely. We'll see. > >>I have a patch going now that just reclaims use-once file cache before >>going off node. Seems to help a bit for basic things that just push >>pagecache through the system. It definitely reduces remote allocations >>by several orders of magnitude for those cases. > > > Makes sense, but doesn't the same thing make sense on a global basis? > I don't feel NUMA is anything magical here ... > Didn't parse that. If you mean the transition from highmem->normal->dma zones, I don't think that should be treated in this way. Imagine small highmem zones for example. We have the lower zone protection in place for that case - and that is something that in turn isn't good for NUMA, because the SGI guys (I think) already ran into that and fixed it to be per-node only.