From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756337AbYA2Mg4 (ORCPT ); Tue, 29 Jan 2008 07:36:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752947AbYA2Mgq (ORCPT ); Tue, 29 Jan 2008 07:36:46 -0500 Received: from netops-testserver-3-out.sgi.com ([192.48.171.28]:44641 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751403AbYA2Mgq (ORCPT ); Tue, 29 Jan 2008 07:36:46 -0500 Date: Tue, 29 Jan 2008 06:36:38 -0600 From: Paul Jackson To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, vatsa@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com, nickpiggin@yahoo.com.au, ebiederm@xmission.com, akpm@linux-foundation.org, sgrubb@redhat.com, rostedt@goodmis.org, ghaskins@novell.com, dmitry.adamushko@gmail.com, tong.n.li@intel.com, tglx@linutronix.de, menage@google.com, rientjes@google.com Subject: Re: scheduler scalability - cgroups, cpusets and load-balancing Message-Id: <20080129063638.874b38ef.pj@sgi.com> In-Reply-To: <1201608457.28547.130.camel@lappy> References: <1201600428.28547.87.camel@lappy> <20080129040130.7b2904b6.pj@sgi.com> <1201603816.28547.94.camel@lappy> <20080129051353.4628c9eb.pj@sgi.com> <1201606284.28547.114.camel@lappy> <20080129055318.5b669847.pj@sgi.com> <1201608457.28547.130.camel@lappy> Organization: SGI X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.12.0; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Peter, responding to Paul: > > I really doubt we'd want to have such systems triggering the hard RT > > scheduler on whatever CPUs were in the batch schedulers big cpuset > > that didn't happened to have an active job currently assigned to them. > > My turn to be confused.. > > If SD_LOAD_BALANCE is only set on the smaller, per-job, sets, how will > the RT balancer trigger on the large set? What 'sched_load_balance' does now is help you setup a -partial- covering of non-overlappping sched domains. In the batch scheduler example, those CPUs that were: 1) being managed by the batch scheduler, but 2) were not assigned to any active job at the moment would -not- be in any sched domain. It's not a question of the SC_LOAD_BALANCE flag. It's a question of whether a given CPU is even included in any sched domain. If we did as you are suggesting (if I understand) then instead of leaving these CPUs out of any sched domain, rather we'd setup a new kind of sched domain for these CPUs, marked for hard real time load balancing, rather than the somewhat more scalable, but softer normal load balancing. We want no load balancing on those CPUs, not realtime load balancing. Indeed, I suspect, we especially do not want realtime load balancing on those CPUs as that kind of load balancing is (I'm suspecting) more expensive and less scalable than normal load balancing. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson 1.940.382.4214