From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965096AbWBGOXr (ORCPT ); Tue, 7 Feb 2006 09:23:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S965097AbWBGOXr (ORCPT ); Tue, 7 Feb 2006 09:23:47 -0500 Received: from ns2.suse.de ([195.135.220.15]:46536 "EHLO mx2.suse.de") by vger.kernel.org with ESMTP id S965096AbWBGOXq (ORCPT ); Tue, 7 Feb 2006 09:23:46 -0500 From: Andi Kleen To: Ingo Molnar Subject: Re: [PATCH 1/5] cpuset memory spread basic implementation Date: Tue, 7 Feb 2006 15:23:19 +0100 User-Agent: KMail/1.8.2 Cc: steiner@sgi.com, Paul Jackson , clameter@engr.sgi.com, akpm@osdl.org, dgc@sgi.com, Simon.Derr@bull.net, linux-kernel@vger.kernel.org References: <20060204071910.10021.8437.sendpatchset@jackhammer.engr.sgi.com> <200602071414.31119.ak@suse.de> <20060207141141.GA14706@elte.hu> In-Reply-To: <20060207141141.GA14706@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200602071523.20174.ak@suse.de> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 07 February 2006 15:11, Ingo Molnar wrote: > > * Andi Kleen wrote: > > > I meant it's not that big an issue if it's remote, but it's bad if it > > fills up the local node. > > are you sure this is not some older VM issue? Unless you implement page migration for all caches it's still there. The only way to get rid of caches on a node currently is to throw them away. And refetching them from disk is quite costly. > Unless it's a fundamental > property of NUMA systems, it would be bad to factor in some VM artifact > into the caching design. Why not? It has to work with the real existing VM, not some imaginary perfect one. > > Basically you have to consider the frequency of access: > > > > Mapped memory is very frequently accessed. For it memory placement is > > really important. Optimizing it at the cost of everything else is a > > good default strategy > > > > File cache is much less frequently accessed (most programs buffer > > read/write well) and when it is accessed it is using functions that > > are relatively latency tolerant (kernel memcpy). So memory placement > > is much less important here. > > > > And d/inode are also very infrequently accessed compared to local > > memory, so the occasionally additional latency is better than > > competing too much with local memory allocation. > > Most pagecache pages are clean, ... unless you've just written a lot of data. > and it's easy and fast to zap a clean > page when a new anonymous page needs space. So i dont really see why the > pagecache is such a big issue - it should in essence be invisible to the > rest of the VM. (barring the extreme case of lots of dirty pages in the > pagecache) What am i missing? d/icaches for once don't work this way. Do a find / and watch the results on your local node. And in practice your assumption of everything clean and nice in page cache is also often not true. > > > i also mentioned software-based clusters in the previous mail, so it's > > > not only about big systems. Caching attributes are very much relevant > > > there. Tightly integrated clusters can be considered NUMA systems with a > > > NUMA factor of 1000 or so (or worse). > > > > To be honest I don't think systems with such a NUMA factor will ever > > work well in the general case. So I wouldn't recommend considering > > them much if at all in your design thoughts. The result would likely > > not be a good balanced design. > > loosely coupled clusters do seem to work quite well, since the > overwhelming majority of computing jobs tend to deal with easily > partitionable workloads. Yes, but with message passing but without any kind of shared memory. > Making clusters more seemless via software > (a'ka OpenMosix) is still quite tempting i think. Ok we agree on that then. Great. -Andi