From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262040AbUBHEFt (ORCPT ); Sat, 7 Feb 2004 23:05:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262078AbUBHEFt (ORCPT ); Sat, 7 Feb 2004 23:05:49 -0500 Received: from mail-10.iinet.net.au ([203.59.3.42]:3290 "HELO mail.iinet.net.au") by vger.kernel.org with SMTP id S262040AbUBHEFp (ORCPT ); Sat, 7 Feb 2004 23:05:45 -0500 Message-ID: <4025B572.9040904@cyberone.com.au> Date: Sun, 08 Feb 2004 15:05:06 +1100 From: Nick Piggin User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040122 Debian/1.6-1 X-Accept-Language: en MIME-Version: 1.0 To: Anton Blanchard CC: Rick Lindsley , "Martin J. Bligh" , akpm@osdl.org, linux-kernel@vger.kernel.org, dvhltc@us.ibm.com Subject: Re: [PATCH] Load balancing problem in 2.6.2-mm1 References: <20040207095057.GS19011@krispykreme> <200402080040.i180eY811893@owlet.beaverton.ibm.com> <20040208011221.GV19011@krispykreme> <40258F21.30209@cyberone.com.au> <20040208014141.GX19011@krispykreme> <4025AB00.3030601@cyberone.com.au> <20040208035721.GY19011@krispykreme> In-Reply-To: <20040208035721.GY19011@krispykreme> Content-Type: multipart/mixed; boundary="------------020408040408050702060402" Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org This is a multi-part message in MIME format. --------------020408040408050702060402 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Anton Blanchard wrote: > >Hi, > > > >>Yeah its because you have a lot of cpus, so the average is still >>small. You also need something like >> >>if (*imbalance == 0 && max_load - this_load > SCHED_LOAD_SCALE) >> *imbalance = 1; >> >> > >OK I'll give that a try. > > > > Can you try this patch instead pretty please ;) Thanks --------------020408040408050702060402 Content-Type: text/plain; name="rollup.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="rollup.patch" linux-2.6-npiggin/kernel/sched.c | 52 ++++++++++++++++++++++++--------------- 1 files changed, 32 insertions(+), 20 deletions(-) diff -puN kernel/sched.c~rollup kernel/sched.c --- linux-2.6/kernel/sched.c~rollup 2004-02-08 15:03:53.000000000 +1100 +++ linux-2.6-npiggin/kernel/sched.c 2004-02-08 15:03:53.000000000 +1100 @@ -1405,16 +1405,28 @@ find_busiest_group(struct sched_domain * total_load += avg_load; total_nr_cpus += nr_cpus; - avg_load /= nr_cpus; + + /* + * Load is cumulative over SD_FLAG_IDLE domains, but + * spread over !SD_FLAG_IDLE domains. For example, 2 + * processes running on an SMT CPU puts a load of 2 on + * that CPU, however 2 processes running on 2 CPUs puts + * a load of 1 on that domain. + * + * This should be configurable so as SMT siblings become + * more powerful, they can "spread" more load - for example, + * the above case might only count as a load of 1.7. + */ + if (!(domain->flags & SD_FLAG_IDLE)) + avg_load /= nr_cpus; + + if (avg_load > max_load) + max_load = avg_load; if (local_group) { this_load = avg_load; - goto nextgroup; - } - - if (avg_load >= max_load) { + } else if (avg_load >= max_load) { busiest = group; - max_load = avg_load; busiest_nr_cpus = nr_cpus; } nextgroup: @@ -1424,8 +1436,10 @@ nextgroup: if (!busiest) goto out_balanced; - avg_load = total_load / total_nr_cpus; - if (idle == NOT_IDLE && this_load >= avg_load) + if (!(domain->flags & SD_FLAG_IDLE)) + avg_load = total_load / total_nr_cpus; + + if (this_load >= avg_load) goto out_balanced; if (idle == NOT_IDLE && 100*max_load <= domain->imbalance_pct*this_load) @@ -1437,20 +1451,18 @@ nextgroup: * reduce the max loaded cpu below the average load, as either of these * actions would just result in more rebalancing later, and ping-pong * tasks around. Thus we look for the minimum possible imbalance. + * Negative imbalances (*we* are more loaded than anyone else) will + * be counted as no imbalance for these purposes -- we can't fix that + * by pulling tasks to us. Be careful of negative numbers as they'll + * appear as very large values with unsigned longs. */ - *imbalance = min(max_load - avg_load, avg_load - this_load); - - /* Get rid of the scaling factor now, rounding *up* as we divide */ - *imbalance = (*imbalance + SCHED_LOAD_SCALE - 1) >> SCHED_LOAD_SHIFT; - - if (*imbalance == 0) { - if (package_idle != NOT_IDLE && domain->flags & SD_FLAG_IDLE - && max_load * busiest_nr_cpus > (3*SCHED_LOAD_SCALE/2)) - *imbalance = 1; - else - busiest = NULL; - } + *imbalance = min(max_load - avg_load, avg_load - this_load) / 2; + /* Get rid of the scaling factor, rounding *up* as we divide */ + *imbalance = (*imbalance + SCHED_LOAD_SCALE-1) + >> SCHED_LOAD_SHIFT; + if (*imbalance == 0 && (max_load - this_load) > SCHED_LOAD_SCALE) + *imbalance = 1; return busiest; out_balanced: _ --------------020408040408050702060402--