public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Peter Williams <pwil3058@bigpond.net.au>
To: Con Kolivas <kernel@kolivas.org>
Cc: Martin Bligh <mbligh@google.com>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Andy Whitcroft <apw@shadowen.org>
Subject: Re: -mm seems significanty slower than mainline on kernbench
Date: Fri, 13 Jan 2006 18:06:32 +1100	[thread overview]
Message-ID: <43C75178.80809@bigpond.net.au> (raw)
In-Reply-To: <43C6D636.8000105@bigpond.net.au>

[-- Attachment #1: Type: text/plain, Size: 4046 bytes --]

Peter Williams wrote:
> Peter Williams wrote:
> 
>> Martin Bligh wrote:
>>
>>>
>>>>>
>>>>> But I was thinking more about the code that (in the original) 
>>>>> handled the case where the number of tasks to be moved was less 
>>>>> than 1 but more than 0 (i.e. the cases where "imbalance" would have 
>>>>> been reduced to zero when divided by SCHED_LOAD_SCALE).  I think 
>>>>> that I got that part wrong and you can end up with a bias load to 
>>>>> be moved which is less than any of the bias_prio values for any 
>>>>> queued tasks (in circumstances where the original code would have 
>>>>> rounded up to 1 and caused a move).  I think that the way to handle 
>>>>> this problem is to replace 1 with "average bias prio" within that 
>>>>> logic.  This would guarantee at least one task with a bias_prio 
>>>>> small enough to be moved.
>>>>>
>>>>> I think that this analysis is a strong argument for my original 
>>>>> patch being the cause of the problem so I'll go ahead and generate 
>>>>> a fix. I'll try to have a patch available later this morning.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Attached is a patch that addresses this problem.  Unlike the 
>>>> description above it does not use "average bias prio" as that 
>>>> solution would be very complicated.  Instead it makes the assumption 
>>>> that NICE_TO_BIAS_PRIO(0) is a "good enough" for this purpose as 
>>>> this is highly likely to be the median bias prio and the median is 
>>>> probably better for this purpose than the average.
>>>>
>>>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>>>
>>>
>>>
>>>
>>> Doesn't fix the perf issue.
>>
>>
>>
>> OK, thanks.  I think there's a few more places where SCHED_LOAD_SCALE 
>> needs to be multiplied by NICE_TO_BIAS_PRIO(0).  Basically, anywhere 
>> that it's added to, subtracted from or compared to a load.  In those 
>> cases it's being used as a scaled version of 1 and we need a scaled 
> 
> 
> This would have been better said as "the load generated by 1 task" 
> rather than just "a scaled version of 1".  Numerically, they're the same 
> but one is clearer than the other and makes it more obvious why we need 
> NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.
> 
>> version of NICE_TO_BIAS_PRIO(0).  I'll have another patch later today.
> 
> 
> I'm just testing this at the moment.

Attached is a new patch to fix the excessive idle problem.  This patch 
takes a new approach to the problem as it was becoming obvious that 
trying to alter the load balancing code to cope with biased load was 
harder than it seemed.

This approach reverts to the old load values but weights them according 
to tasks' bias_prio values.  This means that any assumptions by the load 
balancing code that the load generated by a single task is 
SCHED_LOAD_SCALE will still hold.  Then, in find_busiest_group(), the 
imbalance is scaled back up to bias_prio scale so that move_tasks() can 
move biased load rather than tasks.

One advantage of this is that when there are no non zero niced tasks the 
processing will be mathematically the same as the original code. 
Kernbench results from a 2 CPU Celeron 550Mhz system are:

Average Optimal -j 8 Load Run:
Elapsed Time 1056.16 (0.831102)
User Time 1906.54 (1.38447)
System Time 182.086 (0.973386)
Percent CPU 197 (0)
Context Switches 48727.2 (249.351)
Sleeps 27623.4 (413.913)

This indicates that, on average, 98.9% of the total available CPU was 
used by the build.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

BTW I think that we need to think about a slightly more complex nice to 
bias mapping function.  The current one gives a nice==19 1/20 of the 
bias of a nice=0 task but only gives nice=-20 tasks twice the bias of a 
nice=0 task.  I don't think this is a big problem as the majority of non 
nice==0 tasks will have positive nice but should be looked at for a 
future enhancement.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: different-approach-to-smp-nice-problem --]
[-- Type: text/plain, Size: 1832 bytes --]

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c	2006-01-13 14:53:34.000000000 +1100
+++ MM-2.6.X/kernel/sched.c	2006-01-13 15:11:19.000000000 +1100
@@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
 static unsigned long source_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
 static inline unsigned long target_load(int cpu, int type)
 {
 	runqueue_t *rq = cpu_rq(cpu);
-	unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
+	unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 
 	if (type == 0)
 		return load_now;
@@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
 			 * of the current CPU:
 			 */
 			if (sync)
-				tl -= p->bias_prio * SCHED_LOAD_SCALE;
+				tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
+					NICE_TO_BIAS_PRIO(0);
 
 			if ((tl <= load &&
 				tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
@@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
 	}
 
 	/* Get rid of the scaling factor, rounding down as we divide */
-	*imbalance = *imbalance / SCHED_LOAD_SCALE;
+	*imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
 	return busiest;
 
 out_balanced:
@@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
 	struct sched_domain *sd;
 	int i;
 
-	this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
+	this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
+		NICE_TO_BIAS_PRIO(0);
 	/* Update our load */
 	for (i = 0; i < 3; i++) {
 		unsigned long new_load = this_load;

  reply	other threads:[~2006-01-13  7:06 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-11  1:14 -mm seems significanty slower than mainline on kernbench Martin Bligh
2006-01-11  1:31 ` Andrew Morton
2006-01-11  1:41   ` Martin Bligh
2006-01-11  1:48     ` Andrew Morton
2006-01-11  1:49     ` Con Kolivas
2006-01-11  2:38       ` Peter Williams
2006-01-11  3:07         ` Con Kolivas
2006-01-11  3:12           ` Martin Bligh
2006-01-11  3:40           ` Peter Williams
2006-01-11  3:49             ` Con Kolivas
2006-01-11  4:33               ` Peter Williams
2006-01-11  5:14             ` Peter Williams
2006-01-11  6:21               ` Martin J. Bligh
2006-01-11 12:24                 ` Peter Williams
2006-01-11 14:29                   ` Con Kolivas
2006-01-11 22:05                     ` Peter Williams
2006-01-12  0:54                       ` Peter Williams
2006-01-12  1:18                         ` Con Kolivas
2006-01-12  1:29                           ` Peter Williams
2006-01-12  1:36                             ` Con Kolivas
2006-01-12  2:23                               ` Peter Williams
2006-01-12  2:26                                 ` Martin Bligh
2006-01-12  6:39                                   ` Con Kolivas
2006-01-23 19:28                                     ` Martin Bligh
2006-01-24  1:25                                       ` Peter Williams
2006-01-24  3:50                                         ` Peter Williams
2006-01-24  4:41                                           ` Martin J. Bligh
2006-01-24  6:22                                             ` Peter Williams
2006-01-24  6:42                                               ` Martin J. Bligh
2006-01-28 23:20                                                 ` Peter Williams
2006-01-29  0:52                                                   ` Martin J. Bligh
2006-01-12  2:27                                 ` Con Kolivas
2006-01-12  2:04                           ` Martin Bligh
2006-01-12  6:35                             ` Martin J. Bligh
2006-01-12  6:41                               ` Con Kolivas
2006-01-12  6:54                                 ` Peter Williams
2006-01-12 18:39                         ` Martin Bligh
2006-01-12 20:03                           ` Peter Williams
2006-01-12 22:20                             ` Peter Williams
2006-01-13  7:06                               ` Peter Williams [this message]
2006-01-13 12:00                                 ` Peter Williams
2006-01-13 16:15                                 ` Martin J. Bligh
2006-01-13 16:26                                 ` Andy Whitcroft
2006-01-13 17:54                                   ` Andy Whitcroft
2006-01-13 20:41                                     ` Martin Bligh
2006-01-14  0:23                                       ` Peter Williams
2006-01-14  5:03                                         ` Nick Piggin
2006-01-14  5:40                                           ` Con Kolivas
2006-01-14  6:05                                             ` Nick Piggin
2006-01-14  5:53                                           ` Peter Williams
2006-01-14  6:13                                             ` Nick Piggin
2006-01-13 22:59                                     ` Peter Williams
2006-01-14 18:48                                 ` Martin J. Bligh
2006-01-15  0:05                                   ` Peter Williams
2006-01-15  2:04                                     ` Con Kolivas
2006-01-15  2:09                                     ` [PATCH] sched - remove unnecessary smpnice ifdefs Con Kolivas
2006-01-15  3:50                                     ` -mm seems significanty slower than mainline on kernbench Ingo Molnar
2006-01-12  1:25                       ` Peter Williams
2006-01-11  1:52     ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43C75178.80809@bigpond.net.au \
    --to=pwil3058@bigpond.net.au \
    --cc=akpm@osdl.org \
    --cc=apw@shadowen.org \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbligh@google.com \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox