From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751490AbWBLXK1 (ORCPT ); Sun, 12 Feb 2006 18:10:27 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751493AbWBLXK1 (ORCPT ); Sun, 12 Feb 2006 18:10:27 -0500 Received: from omta05ps.mx.bigpond.com ([144.140.83.195]:29265 "EHLO omta05ps.mx.bigpond.com") by vger.kernel.org with ESMTP id S1751490AbWBLXK0 (ORCPT ); Sun, 12 Feb 2006 18:10:26 -0500 Message-ID: <43EFC05F.6010204@bigpond.net.au> Date: Mon, 13 Feb 2006 10:10:23 +1100 From: Peter Williams User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Williams CC: Andrew Morton , suresh.b.siddha@intel.com, kernel@kolivas.org, npiggin@suse.de, mingo@elte.hu, rostedt@goodmis.org, linux-kernel@vger.kernel.org, torvalds@osdl.org Subject: Re: [rfc][patch] sched: remove smpnice References: <20060207142828.GA20930@wotan.suse.de> <200602080157.07823.kernel@kolivas.org> <20060207141525.19d2b1be.akpm@osdl.org> <200602081011.09749.kernel@kolivas.org> <20060207153617.6520f126.akpm@osdl.org> <20060209230145.A17405@unix-os.sc.intel.com> <20060209231703.4bd796bf.akpm@osdl.org> <43ED3D6A.8010300@bigpond.net.au> <20060210180010.16c9d20a.akpm@osdl.org> <43EE8BAD.9040601@bigpond.net.au> In-Reply-To: <43EE8BAD.9040601@bigpond.net.au> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH PLAIN at omta05ps.mx.bigpond.com from [147.10.133.38] using ID pwil3058@bigpond.net.au at Sun, 12 Feb 2006 23:10:23 +0000 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Peter Williams wrote: > Andrew Morton wrote: > >> Peter Williams wrote: >> >>> I don't think either of these issues warrant abandoning smpnice. The >>> first is highly unlikely to occur on real systems and the second is >>> just an example of the patch doing its job (maybe too officiously). >>> I don't think users would notice either on real systems. >>> >>> Even if you pull it from 2.6.16 rather than upgrading it with my >>> patch can you please leave both in -mm? >> >> >> >> Yes, I have done that. I currently have: >> >> sched-restore-smpnice.patch >> sched-modified-nice-support-for-smp-load-balancing.patch >> sched-cleanup_task_activated.patch >> sched-alter_uninterruptible_sleep_interactivity.patch >> sched-make_task_noninteractive_use_sleep_type.patch >> sched-dont_decrease_idle_sleep_avg.patch >> sched-include_noninteractive_sleep_in_idle_detect.patch >> sched-new-sched-domain-for-representing-multi-core.patch >> sched-fix-group-power-for-allnodes_domains.patch > > > OK. Having slept on these problems I am now of the opinion that the > problems are caused by the use of NICE_TO_BIAS_PRIO(0) to set *imbalance > inside the (*imbalance < SCHED_LOAD_SCALE) if statement in > find_busiest_group(). What is happening here is that even though the > imbalance is less than one (average) task sometimes the decision is made > to move a task anyway but with the current version this decision can be > subverted in two ways: 1) if the task to be moved has a nice value less > than zero the value of *imbalance that is set will be too small for > move_tasks() to move it; and 2) if there are a number of tasks with nice > values greater than zero on the "busiest" more than one of them may be > moved as the value of *imbalance that is set may be big enough to > include more than one of these tasks. > > The fix for this problem is to replace NICE_TO_BIAS_PRIO(0) with the > "average bias prio per runnable task" on "busiest". This will > (generally) result in a larger value for *imbalance in case 1. above and > a smaller one in case 2. and alleviate both problems. A patch to apply > this fix is attached. > > Signed-off-by: Peter Williams > > Could you please add this patch to -mm so that it can be tested? > > Thanks > Peter > > > ------------------------------------------------------------------------ > > Index: MM-2.6.X/kernel/sched.c > =================================================================== > --- MM-2.6.X.orig/kernel/sched.c 2006-02-12 11:24:48.000000000 +1100 > +++ MM-2.6.X/kernel/sched.c 2006-02-12 11:35:40.000000000 +1100 > @@ -735,6 +735,19 @@ static inline unsigned long biased_load( > { > return (wload * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE; > } > + > +/* get the average biased load per runnable task for a run queue */ > +static inline unsigned long avg_biased_load(runqueue_t *rq) > +{ > + /* > + * When I'm convinced that this won't be called with a zero nr_running > + * and that it can't change during the call this can be simplified. > + * For the time being and for proof of concept let's paly it safe. > + */ > + unsigned long n = rq->nr_running; > + > + return n ? rq->prio_bias / n : 0; > +} > #else > static inline void set_bias_prio(task_t *p) > { > @@ -2116,7 +2129,7 @@ find_busiest_group(struct sched_domain * > unsigned long tmp; > > if (max_load - this_load >= SCHED_LOAD_SCALE*2) { > - *imbalance = NICE_TO_BIAS_PRIO(0); > + *imbalance = avg_biased_load(busiest); > return busiest; > } > > @@ -2149,7 +2162,7 @@ find_busiest_group(struct sched_domain * > if (pwr_move <= pwr_now) > goto out_balanced; > > - *imbalance = NICE_TO_BIAS_PRIO(0); > + *imbalance = avg_biased_load(busiest); > return busiest; > } > Can we pull this one, please? I've mistakenly assumed that busiest was a run queue when it's actually a sched group. :-( Peter -- Peter Williams pwil3058@bigpond.net.au "Learning, n. The kind of ignorance distinguishing the studious." -- Ambrose Bierce