The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [patch] scheduler: rebalance_tick interval update
@ 2004-11-15 22:38 Darren Hart
  2004-11-16  1:17 ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Darren Hart @ 2004-11-15 22:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nick Piggin, Matt Dobson, Martin J Bligh, Rick Lindsley,
	Andrew Morton, Ingo Molnar

The current rebalance_tick() code assigns each sched_domain's
last_balance field to += interval after performing a load_balance.  If
interval is 10, this has the effect of saying:  we want to run
load_balance at time = 10, 20, 30, 40, etc...  If for example
last_balance=10 and for some reason rebalance_tick can't be run until
30, load_balance will be called and last_balance will be updated to 20,
causing it to call load_balance again immediately the next time it is
called since the interval is 10 and we are already at >30.  It seems to
me that it would make much more sense for last_balance to be assigned
jiffies after a load_balance, then the meaning of last_balance is more
exact: "this domain was last balanced at jiffies" rather than "we last
handled the balance we were supposed to do at 20, at some indeterminate
time".  The following patch makes this change.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---

diff -purN -X /home/mbligh/.diff.exclude /home/linux/views/linux-2.6.10-rc1-mm5/kernel/sched.c linux-2.6.10-rc1-mm5-jni/kernel/sched.c
--- /home/linux/views/linux-2.6.10-rc1-mm5/kernel/sched.c	2004-11-11 05:33:53.000000000 -0800
+++ linux-2.6.10-rc1-mm5-jni/kernel/sched.c	2004-11-15 11:28:16.000000000 -0800
@@ -2201,7 +2201,7 @@ static void rebalance_tick(int this_cpu,
 				/* We've pulled tasks over so no longer idle */
 				idle = NOT_IDLE;
 			}
-			sd->last_balance += interval;
+			sd->last_balance = jiffies;
 		}
 	}
 }



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-15 22:38 [patch] scheduler: rebalance_tick interval update Darren Hart
@ 2004-11-16  1:17 ` Nick Piggin
  2004-11-16  1:53   ` Matthew Dobson
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-11-16  1:17 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Matt Dobson, Martin J Bligh, Rick Lindsley,
	Andrew Morton, Ingo Molnar



Darren Hart wrote:

>The current rebalance_tick() code assigns each sched_domain's
>last_balance field to += interval after performing a load_balance.  If
>interval is 10, this has the effect of saying:  we want to run
>load_balance at time = 10, 20, 30, 40, etc...  If for example
>last_balance=10 and for some reason rebalance_tick can't be run until
>30, load_balance will be called and last_balance will be updated to 20,
>causing it to call load_balance again immediately the next time it is
>called since the interval is 10 and we are already at >30.  It seems to
>me that it would make much more sense for last_balance to be assigned
>jiffies after a load_balance, then the meaning of last_balance is more
>exact: "this domain was last balanced at jiffies" rather than "we last
>handled the balance we were supposed to do at 20, at some indeterminate
>time".  The following patch makes this change.
>
>

Hi Darren,

This is how I first implemented it... but I think this will cause
rebalance points of each processor to tend to become synchronised
(rather than staggered) as ticks get lost.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  1:17 ` Nick Piggin
@ 2004-11-16  1:53   ` Matthew Dobson
  2004-11-16  2:17     ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Dobson @ 2004-11-16  1:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Darren Hart, LKML, Martin J. Bligh, Rick Lindsley, Andrew Morton,
	Ingo Molnar

On Mon, 2004-11-15 at 17:17, Nick Piggin wrote:
> Darren Hart wrote:
> 
> >The current rebalance_tick() code assigns each sched_domain's
> >last_balance field to += interval after performing a load_balance.  If
> >interval is 10, this has the effect of saying:  we want to run
> >load_balance at time = 10, 20, 30, 40, etc...  If for example
> >last_balance=10 and for some reason rebalance_tick can't be run until
> >30, load_balance will be called and last_balance will be updated to 20,
> >causing it to call load_balance again immediately the next time it is
> >called since the interval is 10 and we are already at >30.  It seems to
> >me that it would make much more sense for last_balance to be assigned
> >jiffies after a load_balance, then the meaning of last_balance is more
> >exact: "this domain was last balanced at jiffies" rather than "we last
> >handled the balance we were supposed to do at 20, at some indeterminate
> >time".  The following patch makes this change.
> >
> >
> 
> Hi Darren,
> 
> This is how I first implemented it... but I think this will cause
> rebalance points of each processor to tend to become synchronised
> (rather than staggered) as ticks get lost.


But isn't that what this is supposed to stop:

        unsigned long j = jiffies + CPU_OFFSET(this_cpu);
....
                if (j - sd->last_balance >= interval) {
                        if (load_balance(this_cpu, this_rq, sd, idle)) {
                                /* We've pulled tasks over so no longer idle */
                                idle = NOT_IDLE;
                        }
                        sd->last_balance += interval;
                }

The CPU_OFFSET() macro is designed to spread out the balancing so they
don't all occur at the same time, no?

-Matt


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  1:53   ` Matthew Dobson
@ 2004-11-16  2:17     ` Nick Piggin
  2004-11-16  2:27       ` Nick Piggin
                         ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Nick Piggin @ 2004-11-16  2:17 UTC (permalink / raw)
  To: colpatch
  Cc: Darren Hart, LKML, Martin J. Bligh, Rick Lindsley, Andrew Morton,
	Ingo Molnar



Matthew Dobson wrote:

>On Mon, 2004-11-15 at 17:17, Nick Piggin wrote:
>
>>Darren Hart wrote:
>>
>>
>>>The current rebalance_tick() code assigns each sched_domain's
>>>last_balance field to += interval after performing a load_balance.  If
>>>interval is 10, this has the effect of saying:  we want to run
>>>load_balance at time = 10, 20, 30, 40, etc...  If for example
>>>last_balance=10 and for some reason rebalance_tick can't be run until
>>>30, load_balance will be called and last_balance will be updated to 20,
>>>causing it to call load_balance again immediately the next time it is
>>>called since the interval is 10 and we are already at >30.  It seems to
>>>me that it would make much more sense for last_balance to be assigned
>>>jiffies after a load_balance, then the meaning of last_balance is more
>>>exact: "this domain was last balanced at jiffies" rather than "we last
>>>handled the balance we were supposed to do at 20, at some indeterminate
>>>time".  The following patch makes this change.
>>>
>>>
>>>
>>Hi Darren,
>>
>>This is how I first implemented it... but I think this will cause
>>rebalance points of each processor to tend to become synchronised
>>(rather than staggered) as ticks get lost.
>>
>
>
>But isn't that what this is supposed to stop:
>
>        unsigned long j = jiffies + CPU_OFFSET(this_cpu);
>....
>                if (j - sd->last_balance >= interval) {
>                        if (load_balance(this_cpu, this_rq, sd, idle)) {
>                                /* We've pulled tasks over so no longer idle */
>                                idle = NOT_IDLE;
>                        }
>                        sd->last_balance += interval;
>                }
>
>The CPU_OFFSET() macro is designed to spread out the balancing so they
>don't all occur at the same time, no?
>
>

Yes, but if you balance n ticks since the last _rebalance_, then things will
be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals,
5 jiffies apart:

jiffy   CPU0                              CPU1
0       rebalance (next, 10)
5                                         rebalance (next, 15)
10      rebalance (next, 20)
15                                        rebalance can't be run until
                                          30 as per Darren's example.
20      rebalance (next, 30)
30      rebalance (next, 40)              rebalance (next, 40)

So CPU0 and CPU1 are now synchronised. Having the next balance be calculated
from the _current_ time leaves you open to all sorts of these drift issues.

Another example, in some ticks, a CPU won't see the updated 'jiffies', other
times it will (at least on Altix systems, this can happen).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  2:17     ` Nick Piggin
@ 2004-11-16  2:27       ` Nick Piggin
  2004-11-16  3:50         ` Darren Hart
  2004-11-16  3:51       ` Darren Hart
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-11-16  2:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: colpatch, Darren Hart, LKML, Martin J. Bligh, Rick Lindsley,
	Andrew Morton, Ingo Molnar



Nick Piggin wrote:

>
> Another example, in some ticks, a CPU won't see the updated 'jiffies', 
> other
> times it will (at least on Altix systems, this can happen).
>
>

Note that if you didn't want to have this rash of balancing attempted after
a CPU wasn't able to run the rebalance for a long time, the solution would
be to keep adding the balance interval until it becomes greater than the
current jiffies.

I actually prefer it to try to make up the lost balances, just from the
perspective of gathering scheduler statistics. I don't suspect it happens
enough to justify adding the extra logic - Darren, are you actually seeing
problems?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  2:27       ` Nick Piggin
@ 2004-11-16  3:50         ` Darren Hart
  0 siblings, 0 replies; 10+ messages in thread
From: Darren Hart @ 2004-11-16  3:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matt Dobson, lkml, Martin J Bligh, Rick Lindsley, Andrew Morton,
	Ingo Molnar

On Tue, 2004-11-16 at 13:27 +1100, Nick Piggin wrote: 
> 
> Nick Piggin wrote:
> 
> >
> > Another example, in some ticks, a CPU won't see the updated 'jiffies', 
> > other
> > times it will (at least on Altix systems, this can happen).
> >
> >
> 
> Note that if you didn't want to have this rash of balancing attempted after
> a CPU wasn't able to run the rebalance for a long time, the solution would
> be to keep adding the balance interval until it becomes greater than the
> current jiffies.

As I mentioned in my last post, I don't think the "synchronized
rebalancing" is a real concern since the interval isn't likely to be the
same and the CPU_OFFSET macro is already in place to prevent this "rash
of balancing" (nice term :-).

> I actually prefer it to try to make up the lost balances, just from the
> perspective of gathering scheduler statistics. 

IMO, scheduler statistics are not worth running load_balance() for no
reason.  (And running it two or three times in a row is clearly not
accomplishing anything)

> I don't suspect it happens
> enough to justify adding the extra logic - Darren, are you actually seeing
> problems?

Not seeing in obvious problems, but the existing logic seems incorrect
to me (and the term last_balanced is currently misleading).  Running
load_balance() multiple times in order to catch up seems wasteful to me
as well.  The current code says something like: run load_balance() 10
times in a second.  If the second is almost up and you have only run it
6 times, it will run it 4 times in a row, that just seems wrong to me.


-- 
Darren Hart <darren@dvhart.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  2:17     ` Nick Piggin
  2004-11-16  2:27       ` Nick Piggin
@ 2004-11-16  3:51       ` Darren Hart
  2004-11-16  7:00       ` Rick Lindsley
       [not found]       ` <1100576400.14742.12.camel@farah.beaverton.ibm.com>
  3 siblings, 0 replies; 10+ messages in thread
From: Darren Hart @ 2004-11-16  3:51 UTC (permalink / raw)
  To: Matt Dobson, lkml, Martin J Bligh, Rick Lindsley, Andrew Morton,
	Ingo Molnar

On Tue, 2004-11-16 at 13:17 +1100, Nick Piggin wrote:
> 
> Matthew Dobson wrote:
> 
> >On Mon, 2004-11-15 at 17:17, Nick Piggin wrote:
> >
> >>Darren Hart wrote:
> >>
> >>
> >>>The current rebalance_tick() code assigns each sched_domain's
> >>>last_balance field to += interval after performing a load_balance.  If
> >>>interval is 10, this has the effect of saying:  we want to run
> >>>load_balance at time = 10, 20, 30, 40, etc...  If for example
> >>>last_balance=10 and for some reason rebalance_tick can't be run until
> >>>30, load_balance will be called and last_balance will be updated to 20,
> >>>causing it to call load_balance again immediately the next time it is
> >>>called since the interval is 10 and we are already at >30.  It seems to
> >>>me that it would make much more sense for last_balance to be assigned
> >>>jiffies after a load_balance, then the meaning of last_balance is more
> >>>exact: "this domain was last balanced at jiffies" rather than "we last
> >>>handled the balance we were supposed to do at 20, at some indeterminate
> >>>time".  The following patch makes this change.
> >>>
> >>>
> >>>
> >>Hi Darren,
> >>
> >>This is how I first implemented it... but I think this will cause
> >>rebalance points of each processor to tend to become synchronised
> >>(rather than staggered) as ticks get lost.
> >>
> >
> >
> >But isn't that what this is supposed to stop:
> >
> >        unsigned long j = jiffies + CPU_OFFSET(this_cpu);
> >....
> >                if (j - sd->last_balance >= interval) {
> >                        if (load_balance(this_cpu, this_rq, sd, idle)) {
> >                                /* We've pulled tasks over so no longer idle */
> >                                idle = NOT_IDLE;
> >                        }
> >                        sd->last_balance += interval;
> >                }
> >
> >The CPU_OFFSET() macro is designed to spread out the balancing so they
> >don't all occur at the same time, no?
> >
> >
> 
> Yes, but if you balance n ticks since the last _rebalance_, then things will
> be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals,
> 5 jiffies apart:
> 
> jiffy   CPU0                              CPU1
> 0       rebalance (next, 10)
> 5                                         rebalance (next, 15)
> 10      rebalance (next, 20)
> 15                                        rebalance can't be run until
>                                           30 as per Darren's example.
> 20      rebalance (next, 30)
> 30      rebalance (next, 40)              rebalance (next, 40)
> 

This example didn't take into account:
	unsigned long j = jiffies + CPU_OFFSET(this_cpu);
Which, even if the last_balance's were equal and intervals were the same
(unlikely since each CPU has it's own domain and the intervals are
adjusted independently?), would prevent them from both running on the
same timer tick.  So I don't think this example is accurate.  On the
other hand, even if it was valid, I would prefer this to running the
load_balance code on the same CPU and domain several ticks in a row
(which obviously accomplishes nothing).


-- 
Darren Hart <darren@dvhart.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16  2:17     ` Nick Piggin
  2004-11-16  2:27       ` Nick Piggin
  2004-11-16  3:51       ` Darren Hart
@ 2004-11-16  7:00       ` Rick Lindsley
       [not found]       ` <1100576400.14742.12.camel@farah.beaverton.ibm.com>
  3 siblings, 0 replies; 10+ messages in thread
From: Rick Lindsley @ 2004-11-16  7:00 UTC (permalink / raw)
  To: Nick Piggin
  Cc: colpatch, Darren Hart, LKML, Martin J. Bligh, Andrew Morton,
	Ingo Molnar

    Yes, but if you balance n ticks since the last _rebalance_, then things will
    be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals,
    5 jiffies apart:

[example deleted]

First, why wasn't the rebalance run when it was supposed to be run?

The answer isn't really all that important.  Regardless of the answer,
if this can happen once, I presume it can happen more than once.  Despite
our best efforts, rebalancing ran in the "wrong timeslot". Aside from the
CPU_OFFSET math, it would seem to me that this inexplicable delay, itself,
introduces enough variance to make any hope of keeping the cpus strictly
unsynchronized (!!) something of a pipe dream.  We start them out of sync, and
know that they'll drift.  We can correct them, but they'll drift again.
Why not just acknowledge that behavior, instead of putting effort into
(unsuccessfully) countering it?

Second, it's of marginal usefulness anyway when you consider that
rebalance_tick() is only called with HZ granularity, except for
unpredictable times from fork().  You're going to be doubling up no matter
what you do once you get more than 16 siblings (see SD_SIBLING_INIT:
busy_factor * max_interval).

Lastly, why do we care?  To turn the question around, were we seeing load
balancing colliding without this attempt at forced spacing?  And if so,
what was the effect (and patterns) of those collisions?  Having a variable
named last_balance which represents when we last *would have liked to
have* balanced seems far less useful than knowing when we really did.

Rick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
       [not found]         ` <4199957C.1020804@cyberone.com.au>
@ 2004-11-16 15:56           ` Darren Hart
  2004-11-18 16:22             ` Darren Hart
  0 siblings, 1 reply; 10+ messages in thread
From: Darren Hart @ 2004-11-16 15:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: lkml

> >This example didn't take into account:
> >	unsigned long j = jiffies + CPU_OFFSET(this_cpu);
> >Which, even if the last_balance's were equal and intervals were the same
> >(unlikely since each CPU has it's own domain and the intervals are
> >adjusted independently?),
> >
> 
> That is correct.
> 
> > would prevent them from both running on the
> >same timer tick.  So I don't think this example is accurate.  On the
> >other hand, even if it was valid, I would prefer this to running the
> >load_balance code on the same CPU and domain several ticks in a row
> >(which obviously accomplishes nothing).
> >
> 
> It is valid because the CPU_OFFSET basically just adds a constant amount
> to each CPU's 'jiffies'. My example took that into account already and
> used the absolute jiffies value. If that doesn't convince you, pretend
> that the delay were to occur to CPU0, whos CPU_OFFSET == 0.


Heh, I thought of exactly that case when I rolled out of bed this
morning.  Funny how that happens sometimes, you can be sitting in front
of the code and make what you think is a perfectly valid analysis and
then wake up the next morning with no context at all and realize you
were wrong.  That happens to me anyway :-), I can't be the only one...

Nevertheless, that is a pretty unlikely corner case.  They double up
only when last_balance+interval is the same and one of the CPUs involved
is CPU 0.

> 
> 
-- 
Darren Hart <darren@dvhart.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] scheduler: rebalance_tick interval update
  2004-11-16 15:56           ` Darren Hart
@ 2004-11-18 16:22             ` Darren Hart
  0 siblings, 0 replies; 10+ messages in thread
From: Darren Hart @ 2004-11-18 16:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: lkml, Andrew Morton, Ingo Molnar, Matt Dobson, Martin J Bligh,
	Rick Lindsley

On Tue, 2004-11-16 at 07:56 -0800, Darren Hart wrote:
> > >This example didn't take into account:
> > >	unsigned long j = jiffies + CPU_OFFSET(this_cpu);
> > >Which, even if the last_balance's were equal and intervals were the same
> > >(unlikely since each CPU has it's own domain and the intervals are
> > >adjusted independently?),
> > >
> > 
> > That is correct.
> > 
> > > would prevent them from both running on the
> > >same timer tick.  So I don't think this example is accurate.  On the
> > >other hand, even if it was valid, I would prefer this to running the
> > >load_balance code on the same CPU and domain several ticks in a row
> > >(which obviously accomplishes nothing).
> > >
> > 
> > It is valid because the CPU_OFFSET basically just adds a constant amount
> > to each CPU's 'jiffies'. My example took that into account already and
> > used the absolute jiffies value. If that doesn't convince you, pretend
> > that the delay were to occur to CPU0, whos CPU_OFFSET == 0.
> 
> 
> Heh, I thought of exactly that case when I rolled out of bed this
> morning.  Funny how that happens sometimes, you can be sitting in front
> of the code and make what you think is a perfectly valid analysis and
> then wake up the next morning with no context at all and realize you
> were wrong.  That happens to me anyway :-), I can't be the only one...
> 
> Nevertheless, that is a pretty unlikely corner case.  They double up
> only when last_balance+interval is the same and one of the CPUs involved
> is CPU 0.
> 


Hey Nick,

I haven't heard anything on this thread in a couple of days, do you
agree we should me the change?  I feel there are legitimate reasons to
use =jiffies instead of +=interval for the ->last_balanced assignment.
I have already elaborated on these and provided counter arguments to the
ones you posted that I think make a strong case.  If you still oppose
the change, what would I need to provide to convince you?


-- 
Darren Hart <darren@dvhart.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-11-18 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 22:38 [patch] scheduler: rebalance_tick interval update Darren Hart
2004-11-16  1:17 ` Nick Piggin
2004-11-16  1:53   ` Matthew Dobson
2004-11-16  2:17     ` Nick Piggin
2004-11-16  2:27       ` Nick Piggin
2004-11-16  3:50         ` Darren Hart
2004-11-16  3:51       ` Darren Hart
2004-11-16  7:00       ` Rick Lindsley
     [not found]       ` <1100576400.14742.12.camel@farah.beaverton.ibm.com>
     [not found]         ` <4199957C.1020804@cyberone.com.au>
2004-11-16 15:56           ` Darren Hart
2004-11-18 16:22             ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox