From: Peter Williams <pwil3058@bigpond.net.au>
To: Peter Williams <pwil3058@bigpond.net.au>
Cc: Con Kolivas <kernel@kolivas.org>,
Martin Bligh <mbligh@google.com>, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Andy Whitcroft <apw@shadowen.org>
Subject: Re: -mm seems significanty slower than mainline on kernbench
Date: Fri, 13 Jan 2006 23:00:51 +1100 [thread overview]
Message-ID: <43C79673.8040507@bigpond.net.au> (raw)
In-Reply-To: <43C75178.80809@bigpond.net.au>
Peter Williams wrote:
> Peter Williams wrote:
>
>> Peter Williams wrote:
>>
>>> Martin Bligh wrote:
>>>
>>>>
>>>>>>
>>>>>> But I was thinking more about the code that (in the original)
>>>>>> handled the case where the number of tasks to be moved was less
>>>>>> than 1 but more than 0 (i.e. the cases where "imbalance" would
>>>>>> have been reduced to zero when divided by SCHED_LOAD_SCALE). I
>>>>>> think that I got that part wrong and you can end up with a bias
>>>>>> load to be moved which is less than any of the bias_prio values
>>>>>> for any queued tasks (in circumstances where the original code
>>>>>> would have rounded up to 1 and caused a move). I think that the
>>>>>> way to handle this problem is to replace 1 with "average bias
>>>>>> prio" within that logic. This would guarantee at least one task
>>>>>> with a bias_prio small enough to be moved.
>>>>>>
>>>>>> I think that this analysis is a strong argument for my original
>>>>>> patch being the cause of the problem so I'll go ahead and generate
>>>>>> a fix. I'll try to have a patch available later this morning.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Attached is a patch that addresses this problem. Unlike the
>>>>> description above it does not use "average bias prio" as that
>>>>> solution would be very complicated. Instead it makes the
>>>>> assumption that NICE_TO_BIAS_PRIO(0) is a "good enough" for this
>>>>> purpose as this is highly likely to be the median bias prio and the
>>>>> median is probably better for this purpose than the average.
>>>>>
>>>>> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Doesn't fix the perf issue.
>>>
>>>
>>>
>>>
>>> OK, thanks. I think there's a few more places where SCHED_LOAD_SCALE
>>> needs to be multiplied by NICE_TO_BIAS_PRIO(0). Basically, anywhere
>>> that it's added to, subtracted from or compared to a load. In those
>>> cases it's being used as a scaled version of 1 and we need a scaled
>>
>>
>>
>> This would have been better said as "the load generated by 1 task"
>> rather than just "a scaled version of 1". Numerically, they're the
>> same but one is clearer than the other and makes it more obvious why
>> we need NICE_TO_BIAS_PRIO(0) * SCHED_LOAD_SCALE and where we need it.
>>
>>> version of NICE_TO_BIAS_PRIO(0). I'll have another patch later today.
>>
>>
>>
>> I'm just testing this at the moment.
>
>
> Attached is a new patch to fix the excessive idle problem. This patch
> takes a new approach to the problem as it was becoming obvious that
> trying to alter the load balancing code to cope with biased load was
> harder than it seemed.
>
> This approach reverts to the old load values but weights them according
> to tasks' bias_prio values. This means that any assumptions by the load
> balancing code that the load generated by a single task is
> SCHED_LOAD_SCALE will still hold. Then, in find_busiest_group(), the
> imbalance is scaled back up to bias_prio scale so that move_tasks() can
> move biased load rather than tasks.
>
> One advantage of this is that when there are no non zero niced tasks the
> processing will be mathematically the same as the original code.
> Kernbench results from a 2 CPU Celeron 550Mhz system are:
>
> Average Optimal -j 8 Load Run:
> Elapsed Time 1056.16 (0.831102)
> User Time 1906.54 (1.38447)
> System Time 182.086 (0.973386)
> Percent CPU 197 (0)
> Context Switches 48727.2 (249.351)
> Sleeps 27623.4 (413.913)
>
> This indicates that, on average, 98.9% of the total available CPU was
> used by the build.
Here's the numbers for the same machine with the "improved smp nice
handling" completely removed i.e. back to 2.6.15 version.
Average Optimal -j 8 Load Run:
Elapsed Time 1059.95 (1.19324)
User Time 1914.94 (1.11102)
System Time 181.846 (0.916695)
Percent CPU 197.4 (0.547723)
Context Switches 40917.4 (469.184)
Sleeps 26654 (320.824)
>
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
>
> BTW I think that we need to think about a slightly more complex nice to
> bias mapping function. The current one gives a nice==19 1/20 of the
> bias of a nice=0 task but only gives nice=-20 tasks twice the bias of a
> nice=0 task. I don't think this is a big problem as the majority of non
> nice==0 tasks will have positive nice but should be looked at for a
> future enhancement.
>
> Peter
>
>
> ------------------------------------------------------------------------
>
> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c 2006-01-13 14:53:34.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c 2006-01-13 15:11:19.000000000 +1100
> @@ -1042,7 +1042,8 @@ void kick_process(task_t *p)
> static unsigned long source_load(int cpu, int type)
> {
> runqueue_t *rq = cpu_rq(cpu);
> - unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
> + unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
> + NICE_TO_BIAS_PRIO(0);
>
> if (type == 0)
> return load_now;
> @@ -1056,7 +1057,8 @@ static unsigned long source_load(int cpu
> static inline unsigned long target_load(int cpu, int type)
> {
> runqueue_t *rq = cpu_rq(cpu);
> - unsigned long load_now = rq->prio_bias * SCHED_LOAD_SCALE;
> + unsigned long load_now = (rq->prio_bias * SCHED_LOAD_SCALE) /
> + NICE_TO_BIAS_PRIO(0);
>
> if (type == 0)
> return load_now;
> @@ -1322,7 +1324,8 @@ static int try_to_wake_up(task_t *p, uns
> * of the current CPU:
> */
> if (sync)
> - tl -= p->bias_prio * SCHED_LOAD_SCALE;
> + tl -= (p->bias_prio * SCHED_LOAD_SCALE) /
> + NICE_TO_BIAS_PRIO(0);
>
> if ((tl <= load &&
> tl + target_load(cpu, idx) <= SCHED_LOAD_SCALE) ||
> @@ -2159,7 +2162,7 @@ find_busiest_group(struct sched_domain *
> }
>
> /* Get rid of the scaling factor, rounding down as we divide */
> - *imbalance = *imbalance / SCHED_LOAD_SCALE;
> + *imbalance = (*imbalance * NICE_TO_BIAS_PRIO(0)) / SCHED_LOAD_SCALE;
> return busiest;
>
> out_balanced:
> @@ -2472,7 +2475,8 @@ static void rebalance_tick(int this_cpu,
> struct sched_domain *sd;
> int i;
>
> - this_load = this_rq->prio_bias * SCHED_LOAD_SCALE;
> + this_load = (this_rq->prio_bias * SCHED_LOAD_SCALE) /
> + NICE_TO_BIAS_PRIO(0);
> /* Update our load */
> for (i = 0; i < 3; i++) {
> unsigned long new_load = this_load;
--
Peter Williams pwil3058@bigpond.net.au
"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
next prev parent reply other threads:[~2006-01-13 12:00 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-01-11 1:14 -mm seems significanty slower than mainline on kernbench Martin Bligh
2006-01-11 1:31 ` Andrew Morton
2006-01-11 1:41 ` Martin Bligh
2006-01-11 1:48 ` Andrew Morton
2006-01-11 1:49 ` Con Kolivas
2006-01-11 2:38 ` Peter Williams
2006-01-11 3:07 ` Con Kolivas
2006-01-11 3:12 ` Martin Bligh
2006-01-11 3:40 ` Peter Williams
2006-01-11 3:49 ` Con Kolivas
2006-01-11 4:33 ` Peter Williams
2006-01-11 5:14 ` Peter Williams
2006-01-11 6:21 ` Martin J. Bligh
2006-01-11 12:24 ` Peter Williams
2006-01-11 14:29 ` Con Kolivas
2006-01-11 22:05 ` Peter Williams
2006-01-12 0:54 ` Peter Williams
2006-01-12 1:18 ` Con Kolivas
2006-01-12 1:29 ` Peter Williams
2006-01-12 1:36 ` Con Kolivas
2006-01-12 2:23 ` Peter Williams
2006-01-12 2:26 ` Martin Bligh
2006-01-12 6:39 ` Con Kolivas
2006-01-23 19:28 ` Martin Bligh
2006-01-24 1:25 ` Peter Williams
2006-01-24 3:50 ` Peter Williams
2006-01-24 4:41 ` Martin J. Bligh
2006-01-24 6:22 ` Peter Williams
2006-01-24 6:42 ` Martin J. Bligh
2006-01-28 23:20 ` Peter Williams
2006-01-29 0:52 ` Martin J. Bligh
2006-01-12 2:27 ` Con Kolivas
2006-01-12 2:04 ` Martin Bligh
2006-01-12 6:35 ` Martin J. Bligh
2006-01-12 6:41 ` Con Kolivas
2006-01-12 6:54 ` Peter Williams
2006-01-12 18:39 ` Martin Bligh
2006-01-12 20:03 ` Peter Williams
2006-01-12 22:20 ` Peter Williams
2006-01-13 7:06 ` Peter Williams
2006-01-13 12:00 ` Peter Williams [this message]
2006-01-13 16:15 ` Martin J. Bligh
2006-01-13 16:26 ` Andy Whitcroft
2006-01-13 17:54 ` Andy Whitcroft
2006-01-13 20:41 ` Martin Bligh
2006-01-14 0:23 ` Peter Williams
2006-01-14 5:03 ` Nick Piggin
2006-01-14 5:40 ` Con Kolivas
2006-01-14 6:05 ` Nick Piggin
2006-01-14 5:53 ` Peter Williams
2006-01-14 6:13 ` Nick Piggin
2006-01-13 22:59 ` Peter Williams
2006-01-14 18:48 ` Martin J. Bligh
2006-01-15 0:05 ` Peter Williams
2006-01-15 2:04 ` Con Kolivas
2006-01-15 2:09 ` [PATCH] sched - remove unnecessary smpnice ifdefs Con Kolivas
2006-01-15 3:50 ` -mm seems significanty slower than mainline on kernbench Ingo Molnar
2006-01-12 1:25 ` Peter Williams
2006-01-11 1:52 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=43C79673.8040507@bigpond.net.au \
--to=pwil3058@bigpond.net.au \
--cc=akpm@osdl.org \
--cc=apw@shadowen.org \
--cc=kernel@kolivas.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mbligh@google.com \
--cc=mingo@elte.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox