* [patch] sched: avoid div in rebalance_tick
@ 2007-01-12 6:02 Nick Piggin
2007-01-12 9:59 ` Alan
0 siblings, 1 reply; 4+ messages in thread
From: Nick Piggin @ 2007-01-12 6:02 UTC (permalink / raw)
To: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List
Just noticed this while looking at a bug.
--
Avoid an expensive integer divide 3 times per CPU per tick.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,13 +2887,14 @@ static void active_load_balance(struct r
static void update_load(struct rq *this_rq)
{
unsigned long this_load;
- int i, scale;
+ int i;
this_load = this_rq->raw_weighted_load;
/* Update our load: */
- for (i = 0, scale = 1; i < 3; i++, scale <<= 1) {
+ for (i = 0; i < 3; i++) {
unsigned long old_load, new_load;
+ int scale;
old_load = this_rq->cpu_load[i];
new_load = this_load;
@@ -2902,9 +2903,11 @@ static void update_load(struct rq *this_
* prevents us from getting stuck on 9 if the load is 10, for
* example.
*/
+ scale = 1 << i;
if (new_load > old_load)
new_load += scale-1;
- this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+ this_rq->cpu_load[i] = (old_load*(scale-1) + new_load)
+ >> i; /* (divide by 'scale') */
}
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] sched: avoid div in rebalance_tick
2007-01-12 6:02 [patch] sched: avoid div in rebalance_tick Nick Piggin
@ 2007-01-12 9:59 ` Alan
2007-01-12 10:27 ` Nick Piggin
2007-01-13 6:46 ` Nick Piggin
0 siblings, 2 replies; 4+ messages in thread
From: Alan @ 2007-01-12 9:59 UTC (permalink / raw)
To: Nick Piggin; +Cc: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List
On Fri, 12 Jan 2007 07:02:13 +0100
Nick Piggin <npiggin@suse.de> wrote:
> Just noticed this while looking at a bug.
> Avoid an expensive integer divide 3 times per CPU per tick.
Integer divide is cheap on some modern processors, and multibit shift
isn't on all embedded ones.
How about putting back scale = 1 and using
scale += scale;
instead of the shift and getting what ought to be even better results
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] sched: avoid div in rebalance_tick
2007-01-12 9:59 ` Alan
@ 2007-01-12 10:27 ` Nick Piggin
2007-01-13 6:46 ` Nick Piggin
1 sibling, 0 replies; 4+ messages in thread
From: Nick Piggin @ 2007-01-12 10:27 UTC (permalink / raw)
To: Alan; +Cc: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List
On Fri, Jan 12, 2007 at 09:59:40AM +0000, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <npiggin@suse.de> wrote:
>
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
>
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.
Well integer divide unit is non-pipelined on P4 K8 Core2 and probably
most processors, AFAIK. So the 3 divs would take 240 cycles on a P4,
perhaps.
> How about putting back scale = 1 and using
>
> scale += scale;
>
> instead of the shift and getting what ought to be even better results
Yes I gues we ccan do this as well, good idea. I'll make a
quick userspace benchmark and post some numbers with my next
submission.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] sched: avoid div in rebalance_tick
2007-01-12 9:59 ` Alan
2007-01-12 10:27 ` Nick Piggin
@ 2007-01-13 6:46 ` Nick Piggin
1 sibling, 0 replies; 4+ messages in thread
From: Nick Piggin @ 2007-01-13 6:46 UTC (permalink / raw)
To: Alan; +Cc: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List
On Fri, Jan 12, 2007 at 09:59:40AM +0000, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <npiggin@suse.de> wrote:
>
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
>
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.
>
> How about putting back scale = 1 and using
>
> scale += scale;
>
> instead of the shift and getting what ought to be even better results
OK, how about this? It only works out to be around 0.01% of my P3's CPU time
at 1000HZ, but it also did make the x86 code 16 bytes smaller.
--
Avoid expensive integer divide 3 times per CPU per tick.
A userspace test of this loop went from 26ns, down to 19ns on a G5; and
from 123ns down to 28ns on a P3.
(Also avoid a variable bit shift, as suggested by Alan. The effect
of this wasn't noticable on the CPUs I tested with).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2887,14 +2887,16 @@ static void active_load_balance(struct r
static void update_load(struct rq *this_rq)
{
unsigned long this_load;
- int i, scale;
+ unsigned int i, scale;
this_load = this_rq->raw_weighted_load;
/* Update our load: */
- for (i = 0, scale = 1; i < 3; i++, scale <<= 1) {
+ for (i = 0, scale = 1; i < 3; i++, scale += scale) {
unsigned long old_load, new_load;
+ /* scale is effectively 1 << i now, and >> i divides by scale */
+
old_load = this_rq->cpu_load[i];
new_load = this_load;
/*
@@ -2904,7 +2906,7 @@ static void update_load(struct rq *this_
*/
if (new_load > old_load)
new_load += scale-1;
- this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) / scale;
+ this_rq->cpu_load[i] = (old_load*(scale-1) + new_load) >> i;
}
}
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-01-13 6:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-12 6:02 [patch] sched: avoid div in rebalance_tick Nick Piggin
2007-01-12 9:59 ` Alan
2007-01-12 10:27 ` Nick Piggin
2007-01-13 6:46 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox