From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751865AbWCRFis (ORCPT ); Sat, 18 Mar 2006 00:38:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751866AbWCRFis (ORCPT ); Sat, 18 Mar 2006 00:38:48 -0500 Received: from smtp106.mail.mud.yahoo.com ([209.191.85.216]:23465 "HELO smtp106.mail.mud.yahoo.com") by vger.kernel.org with SMTP id S1751865AbWCRFir (ORCPT ); Sat, 18 Mar 2006 00:38:47 -0500 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.au; h=Received:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=6wlmDKvzpnAH8jc5FLrw/XswhlJvzY5XvZ5ZCq3WaaDLzWltigyZp7M8/Y+31nzTXARWZoThOR4+Zuo7VE2gA2FY9E+E5gI1Owh4Y6YpaF6VoO1nu8JUPlU8GdYTiHD3EpLe3BOuzE09QmTp6Z0VMxmNKYgSlUZJpmbliRd5mIo= ; Message-ID: <441B9CE2.2050204@yahoo.com.au> Date: Sat, 18 Mar 2006 16:38:42 +1100 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051007 Debian/1.7.12-1 X-Accept-Language: en MIME-Version: 1.0 To: Andrew Morton CC: steiner@sgi.com, mingo@elte.hu, linux-kernel@vger.kernel.org Subject: Re: [PATCH] - Reduce overhead of calc_load References: <20060317145709.GA4296@sgi.com> <20060317145912.GA13207@elte.hu> <20060317152611.GA4449@sgi.com> <20060317171538.3826eb41.akpm@osdl.org> <441B6BD3.2030807@yahoo.com.au> <20060317183742.10431ba2.akpm@osdl.org> <441B7489.1090403@yahoo.com.au> <20060317211315.55457f22.akpm@osdl.org> In-Reply-To: <20060317211315.55457f22.akpm@osdl.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Andrew Morton wrote: > Nick Piggin wrote: > >>Andrew Morton wrote: >> >>>Nick Piggin wrote: >> >>>>Is there a need? Do they (except calc_load) use multiple values at >>>>the same time? >>> >>> >>>Don't know. It might happen in the future. And the additional cost is >>>practically zero. >>> >> >>Unless it happens to hit another cacheline (cachelines for all other >>CPUs but our own will most likely be invalid on this cpu). In which >>case the cost could double quite easily. >> > > > That would be an inefficient implementation. Let's not implement it > inefficiently. > Unconditionally adding up n fields in the runqueue versus 1 field? It is inevitable that they will cross cacheline boundaries on some CPU architectures and with some per-cpu implementations isn't it? > >>I think it might be better to leave it for the moment. If something comes >>up we can always take a look at it then (it isn't particularly tricky code). > > > What we're seeing here is a proliferation of little functions, all of which > do the same thing, some of them in different ways. > Of course they should be made consistent where it makes sense. > Take a look at (for example) nr_iowait. We forget to spill the count out > of the departing CPU's runqueue and hence we have to sum it across all I don't think a departing runqueue should have any iowaiters on it, should it? > possible CPUs and we don't bother accounting for the possibility of the sum > going negative because we happen to dink with the runqueue of a > now-possibly-downed CPU. It's inefficient and it's inconsistent and some > of it is, or will become incorrect. The other counters there probably have > various combinations of these problems but I can't be bothered checking > them all because they're all implemented differently. > Maybe (they're also used in different ways, and with different races to be careful of so in some respects that is inevitable). But that doesn't mean we should introduce this new get_sched_stats thing. > Better to do them all in the one place and do them all the same way. I'd > suggest a cacheline-aligned struct of local_t's which can be queried into a > struct of ulongs. > Now common scheduler operations have to access the runqueue cacheilnes and this disjoint "stats" structure cacheline, so basic operations will get slower. Not to mention that all but one are protected by the runqueue lock, so local_t isn't needed for the rest of them. > That query should only look at online CPUs, which becomes rather necessary > if we're to allocate runqueues only for online CPUs (desirable - the thing > is huge). > Sure, those consistency and efficiency changes should be made now, with the current structure. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com