* [patch] scheduler: rebalance_tick interval update @ 2004-11-15 22:38 Darren Hart 2004-11-16 1:17 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Darren Hart @ 2004-11-15 22:38 UTC (permalink / raw) To: linux-kernel Cc: Nick Piggin, Matt Dobson, Martin J Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar The current rebalance_tick() code assigns each sched_domain's last_balance field to += interval after performing a load_balance. If interval is 10, this has the effect of saying: we want to run load_balance at time = 10, 20, 30, 40, etc... If for example last_balance=10 and for some reason rebalance_tick can't be run until 30, load_balance will be called and last_balance will be updated to 20, causing it to call load_balance again immediately the next time it is called since the interval is 10 and we are already at >30. It seems to me that it would make much more sense for last_balance to be assigned jiffies after a load_balance, then the meaning of last_balance is more exact: "this domain was last balanced at jiffies" rather than "we last handled the balance we were supposed to do at 20, at some indeterminate time". The following patch makes this change. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> --- diff -purN -X /home/mbligh/.diff.exclude /home/linux/views/linux-2.6.10-rc1-mm5/kernel/sched.c linux-2.6.10-rc1-mm5-jni/kernel/sched.c --- /home/linux/views/linux-2.6.10-rc1-mm5/kernel/sched.c 2004-11-11 05:33:53.000000000 -0800 +++ linux-2.6.10-rc1-mm5-jni/kernel/sched.c 2004-11-15 11:28:16.000000000 -0800 @@ -2201,7 +2201,7 @@ static void rebalance_tick(int this_cpu, /* We've pulled tasks over so no longer idle */ idle = NOT_IDLE; } - sd->last_balance += interval; + sd->last_balance = jiffies; } } } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-15 22:38 [patch] scheduler: rebalance_tick interval update Darren Hart @ 2004-11-16 1:17 ` Nick Piggin 2004-11-16 1:53 ` Matthew Dobson 0 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2004-11-16 1:17 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Matt Dobson, Martin J Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar Darren Hart wrote: >The current rebalance_tick() code assigns each sched_domain's >last_balance field to += interval after performing a load_balance. If >interval is 10, this has the effect of saying: we want to run >load_balance at time = 10, 20, 30, 40, etc... If for example >last_balance=10 and for some reason rebalance_tick can't be run until >30, load_balance will be called and last_balance will be updated to 20, >causing it to call load_balance again immediately the next time it is >called since the interval is 10 and we are already at >30. It seems to >me that it would make much more sense for last_balance to be assigned >jiffies after a load_balance, then the meaning of last_balance is more >exact: "this domain was last balanced at jiffies" rather than "we last >handled the balance we were supposed to do at 20, at some indeterminate >time". The following patch makes this change. > > Hi Darren, This is how I first implemented it... but I think this will cause rebalance points of each processor to tend to become synchronised (rather than staggered) as ticks get lost. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 1:17 ` Nick Piggin @ 2004-11-16 1:53 ` Matthew Dobson 2004-11-16 2:17 ` Nick Piggin 0 siblings, 1 reply; 10+ messages in thread From: Matthew Dobson @ 2004-11-16 1:53 UTC (permalink / raw) To: Nick Piggin Cc: Darren Hart, LKML, Martin J. Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar On Mon, 2004-11-15 at 17:17, Nick Piggin wrote: > Darren Hart wrote: > > >The current rebalance_tick() code assigns each sched_domain's > >last_balance field to += interval after performing a load_balance. If > >interval is 10, this has the effect of saying: we want to run > >load_balance at time = 10, 20, 30, 40, etc... If for example > >last_balance=10 and for some reason rebalance_tick can't be run until > >30, load_balance will be called and last_balance will be updated to 20, > >causing it to call load_balance again immediately the next time it is > >called since the interval is 10 and we are already at >30. It seems to > >me that it would make much more sense for last_balance to be assigned > >jiffies after a load_balance, then the meaning of last_balance is more > >exact: "this domain was last balanced at jiffies" rather than "we last > >handled the balance we were supposed to do at 20, at some indeterminate > >time". The following patch makes this change. > > > > > > Hi Darren, > > This is how I first implemented it... but I think this will cause > rebalance points of each processor to tend to become synchronised > (rather than staggered) as ticks get lost. But isn't that what this is supposed to stop: unsigned long j = jiffies + CPU_OFFSET(this_cpu); .... if (j - sd->last_balance >= interval) { if (load_balance(this_cpu, this_rq, sd, idle)) { /* We've pulled tasks over so no longer idle */ idle = NOT_IDLE; } sd->last_balance += interval; } The CPU_OFFSET() macro is designed to spread out the balancing so they don't all occur at the same time, no? -Matt ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 1:53 ` Matthew Dobson @ 2004-11-16 2:17 ` Nick Piggin 2004-11-16 2:27 ` Nick Piggin ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Nick Piggin @ 2004-11-16 2:17 UTC (permalink / raw) To: colpatch Cc: Darren Hart, LKML, Martin J. Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar Matthew Dobson wrote: >On Mon, 2004-11-15 at 17:17, Nick Piggin wrote: > >>Darren Hart wrote: >> >> >>>The current rebalance_tick() code assigns each sched_domain's >>>last_balance field to += interval after performing a load_balance. If >>>interval is 10, this has the effect of saying: we want to run >>>load_balance at time = 10, 20, 30, 40, etc... If for example >>>last_balance=10 and for some reason rebalance_tick can't be run until >>>30, load_balance will be called and last_balance will be updated to 20, >>>causing it to call load_balance again immediately the next time it is >>>called since the interval is 10 and we are already at >30. It seems to >>>me that it would make much more sense for last_balance to be assigned >>>jiffies after a load_balance, then the meaning of last_balance is more >>>exact: "this domain was last balanced at jiffies" rather than "we last >>>handled the balance we were supposed to do at 20, at some indeterminate >>>time". The following patch makes this change. >>> >>> >>> >>Hi Darren, >> >>This is how I first implemented it... but I think this will cause >>rebalance points of each processor to tend to become synchronised >>(rather than staggered) as ticks get lost. >> > > >But isn't that what this is supposed to stop: > > unsigned long j = jiffies + CPU_OFFSET(this_cpu); >.... > if (j - sd->last_balance >= interval) { > if (load_balance(this_cpu, this_rq, sd, idle)) { > /* We've pulled tasks over so no longer idle */ > idle = NOT_IDLE; > } > sd->last_balance += interval; > } > >The CPU_OFFSET() macro is designed to spread out the balancing so they >don't all occur at the same time, no? > > Yes, but if you balance n ticks since the last _rebalance_, then things will be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals, 5 jiffies apart: jiffy CPU0 CPU1 0 rebalance (next, 10) 5 rebalance (next, 15) 10 rebalance (next, 20) 15 rebalance can't be run until 30 as per Darren's example. 20 rebalance (next, 30) 30 rebalance (next, 40) rebalance (next, 40) So CPU0 and CPU1 are now synchronised. Having the next balance be calculated from the _current_ time leaves you open to all sorts of these drift issues. Another example, in some ticks, a CPU won't see the updated 'jiffies', other times it will (at least on Altix systems, this can happen). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 2:17 ` Nick Piggin @ 2004-11-16 2:27 ` Nick Piggin 2004-11-16 3:50 ` Darren Hart 2004-11-16 3:51 ` Darren Hart ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Nick Piggin @ 2004-11-16 2:27 UTC (permalink / raw) To: Nick Piggin Cc: colpatch, Darren Hart, LKML, Martin J. Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar Nick Piggin wrote: > > Another example, in some ticks, a CPU won't see the updated 'jiffies', > other > times it will (at least on Altix systems, this can happen). > > Note that if you didn't want to have this rash of balancing attempted after a CPU wasn't able to run the rebalance for a long time, the solution would be to keep adding the balance interval until it becomes greater than the current jiffies. I actually prefer it to try to make up the lost balances, just from the perspective of gathering scheduler statistics. I don't suspect it happens enough to justify adding the extra logic - Darren, are you actually seeing problems? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 2:27 ` Nick Piggin @ 2004-11-16 3:50 ` Darren Hart 0 siblings, 0 replies; 10+ messages in thread From: Darren Hart @ 2004-11-16 3:50 UTC (permalink / raw) To: Nick Piggin Cc: Matt Dobson, lkml, Martin J Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar On Tue, 2004-11-16 at 13:27 +1100, Nick Piggin wrote: > > Nick Piggin wrote: > > > > > Another example, in some ticks, a CPU won't see the updated 'jiffies', > > other > > times it will (at least on Altix systems, this can happen). > > > > > > Note that if you didn't want to have this rash of balancing attempted after > a CPU wasn't able to run the rebalance for a long time, the solution would > be to keep adding the balance interval until it becomes greater than the > current jiffies. As I mentioned in my last post, I don't think the "synchronized rebalancing" is a real concern since the interval isn't likely to be the same and the CPU_OFFSET macro is already in place to prevent this "rash of balancing" (nice term :-). > I actually prefer it to try to make up the lost balances, just from the > perspective of gathering scheduler statistics. IMO, scheduler statistics are not worth running load_balance() for no reason. (And running it two or three times in a row is clearly not accomplishing anything) > I don't suspect it happens > enough to justify adding the extra logic - Darren, are you actually seeing > problems? Not seeing in obvious problems, but the existing logic seems incorrect to me (and the term last_balanced is currently misleading). Running load_balance() multiple times in order to catch up seems wasteful to me as well. The current code says something like: run load_balance() 10 times in a second. If the second is almost up and you have only run it 6 times, it will run it 4 times in a row, that just seems wrong to me. -- Darren Hart <darren@dvhart.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 2:17 ` Nick Piggin 2004-11-16 2:27 ` Nick Piggin @ 2004-11-16 3:51 ` Darren Hart 2004-11-16 7:00 ` Rick Lindsley [not found] ` <1100576400.14742.12.camel@farah.beaverton.ibm.com> 3 siblings, 0 replies; 10+ messages in thread From: Darren Hart @ 2004-11-16 3:51 UTC (permalink / raw) To: Matt Dobson, lkml, Martin J Bligh, Rick Lindsley, Andrew Morton, Ingo Molnar On Tue, 2004-11-16 at 13:17 +1100, Nick Piggin wrote: > > Matthew Dobson wrote: > > >On Mon, 2004-11-15 at 17:17, Nick Piggin wrote: > > > >>Darren Hart wrote: > >> > >> > >>>The current rebalance_tick() code assigns each sched_domain's > >>>last_balance field to += interval after performing a load_balance. If > >>>interval is 10, this has the effect of saying: we want to run > >>>load_balance at time = 10, 20, 30, 40, etc... If for example > >>>last_balance=10 and for some reason rebalance_tick can't be run until > >>>30, load_balance will be called and last_balance will be updated to 20, > >>>causing it to call load_balance again immediately the next time it is > >>>called since the interval is 10 and we are already at >30. It seems to > >>>me that it would make much more sense for last_balance to be assigned > >>>jiffies after a load_balance, then the meaning of last_balance is more > >>>exact: "this domain was last balanced at jiffies" rather than "we last > >>>handled the balance we were supposed to do at 20, at some indeterminate > >>>time". The following patch makes this change. > >>> > >>> > >>> > >>Hi Darren, > >> > >>This is how I first implemented it... but I think this will cause > >>rebalance points of each processor to tend to become synchronised > >>(rather than staggered) as ticks get lost. > >> > > > > > >But isn't that what this is supposed to stop: > > > > unsigned long j = jiffies + CPU_OFFSET(this_cpu); > >.... > > if (j - sd->last_balance >= interval) { > > if (load_balance(this_cpu, this_rq, sd, idle)) { > > /* We've pulled tasks over so no longer idle */ > > idle = NOT_IDLE; > > } > > sd->last_balance += interval; > > } > > > >The CPU_OFFSET() macro is designed to spread out the balancing so they > >don't all occur at the same time, no? > > > > > > Yes, but if you balance n ticks since the last _rebalance_, then things will > be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals, > 5 jiffies apart: > > jiffy CPU0 CPU1 > 0 rebalance (next, 10) > 5 rebalance (next, 15) > 10 rebalance (next, 20) > 15 rebalance can't be run until > 30 as per Darren's example. > 20 rebalance (next, 30) > 30 rebalance (next, 40) rebalance (next, 40) > This example didn't take into account: unsigned long j = jiffies + CPU_OFFSET(this_cpu); Which, even if the last_balance's were equal and intervals were the same (unlikely since each CPU has it's own domain and the intervals are adjusted independently?), would prevent them from both running on the same timer tick. So I don't think this example is accurate. On the other hand, even if it was valid, I would prefer this to running the load_balance code on the same CPU and domain several ticks in a row (which obviously accomplishes nothing). -- Darren Hart <darren@dvhart.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 2:17 ` Nick Piggin 2004-11-16 2:27 ` Nick Piggin 2004-11-16 3:51 ` Darren Hart @ 2004-11-16 7:00 ` Rick Lindsley [not found] ` <1100576400.14742.12.camel@farah.beaverton.ibm.com> 3 siblings, 0 replies; 10+ messages in thread From: Rick Lindsley @ 2004-11-16 7:00 UTC (permalink / raw) To: Nick Piggin Cc: colpatch, Darren Hart, LKML, Martin J. Bligh, Andrew Morton, Ingo Molnar Yes, but if you balance n ticks since the last _rebalance_, then things will be able to drift. Let's say 2 CPUs, they balance at 10 jiffies intervals, 5 jiffies apart: [example deleted] First, why wasn't the rebalance run when it was supposed to be run? The answer isn't really all that important. Regardless of the answer, if this can happen once, I presume it can happen more than once. Despite our best efforts, rebalancing ran in the "wrong timeslot". Aside from the CPU_OFFSET math, it would seem to me that this inexplicable delay, itself, introduces enough variance to make any hope of keeping the cpus strictly unsynchronized (!!) something of a pipe dream. We start them out of sync, and know that they'll drift. We can correct them, but they'll drift again. Why not just acknowledge that behavior, instead of putting effort into (unsuccessfully) countering it? Second, it's of marginal usefulness anyway when you consider that rebalance_tick() is only called with HZ granularity, except for unpredictable times from fork(). You're going to be doubling up no matter what you do once you get more than 16 siblings (see SD_SIBLING_INIT: busy_factor * max_interval). Lastly, why do we care? To turn the question around, were we seeing load balancing colliding without this attempt at forced spacing? And if so, what was the effect (and patterns) of those collisions? Having a variable named last_balance which represents when we last *would have liked to have* balanced seems far less useful than knowing when we really did. Rick ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <1100576400.14742.12.camel@farah.beaverton.ibm.com>]
[parent not found: <4199957C.1020804@cyberone.com.au>]
* Re: [patch] scheduler: rebalance_tick interval update [not found] ` <4199957C.1020804@cyberone.com.au> @ 2004-11-16 15:56 ` Darren Hart 2004-11-18 16:22 ` Darren Hart 0 siblings, 1 reply; 10+ messages in thread From: Darren Hart @ 2004-11-16 15:56 UTC (permalink / raw) To: Nick Piggin; +Cc: lkml > >This example didn't take into account: > > unsigned long j = jiffies + CPU_OFFSET(this_cpu); > >Which, even if the last_balance's were equal and intervals were the same > >(unlikely since each CPU has it's own domain and the intervals are > >adjusted independently?), > > > > That is correct. > > > would prevent them from both running on the > >same timer tick. So I don't think this example is accurate. On the > >other hand, even if it was valid, I would prefer this to running the > >load_balance code on the same CPU and domain several ticks in a row > >(which obviously accomplishes nothing). > > > > It is valid because the CPU_OFFSET basically just adds a constant amount > to each CPU's 'jiffies'. My example took that into account already and > used the absolute jiffies value. If that doesn't convince you, pretend > that the delay were to occur to CPU0, whos CPU_OFFSET == 0. Heh, I thought of exactly that case when I rolled out of bed this morning. Funny how that happens sometimes, you can be sitting in front of the code and make what you think is a perfectly valid analysis and then wake up the next morning with no context at all and realize you were wrong. That happens to me anyway :-), I can't be the only one... Nevertheless, that is a pretty unlikely corner case. They double up only when last_balance+interval is the same and one of the CPUs involved is CPU 0. > > -- Darren Hart <darren@dvhart.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] scheduler: rebalance_tick interval update 2004-11-16 15:56 ` Darren Hart @ 2004-11-18 16:22 ` Darren Hart 0 siblings, 0 replies; 10+ messages in thread From: Darren Hart @ 2004-11-18 16:22 UTC (permalink / raw) To: Nick Piggin Cc: lkml, Andrew Morton, Ingo Molnar, Matt Dobson, Martin J Bligh, Rick Lindsley On Tue, 2004-11-16 at 07:56 -0800, Darren Hart wrote: > > >This example didn't take into account: > > > unsigned long j = jiffies + CPU_OFFSET(this_cpu); > > >Which, even if the last_balance's were equal and intervals were the same > > >(unlikely since each CPU has it's own domain and the intervals are > > >adjusted independently?), > > > > > > > That is correct. > > > > > would prevent them from both running on the > > >same timer tick. So I don't think this example is accurate. On the > > >other hand, even if it was valid, I would prefer this to running the > > >load_balance code on the same CPU and domain several ticks in a row > > >(which obviously accomplishes nothing). > > > > > > > It is valid because the CPU_OFFSET basically just adds a constant amount > > to each CPU's 'jiffies'. My example took that into account already and > > used the absolute jiffies value. If that doesn't convince you, pretend > > that the delay were to occur to CPU0, whos CPU_OFFSET == 0. > > > Heh, I thought of exactly that case when I rolled out of bed this > morning. Funny how that happens sometimes, you can be sitting in front > of the code and make what you think is a perfectly valid analysis and > then wake up the next morning with no context at all and realize you > were wrong. That happens to me anyway :-), I can't be the only one... > > Nevertheless, that is a pretty unlikely corner case. They double up > only when last_balance+interval is the same and one of the CPUs involved > is CPU 0. > Hey Nick, I haven't heard anything on this thread in a couple of days, do you agree we should me the change? I feel there are legitimate reasons to use =jiffies instead of +=interval for the ->last_balanced assignment. I have already elaborated on these and provided counter arguments to the ones you posted that I think make a strong case. If you still oppose the change, what would I need to provide to convince you? -- Darren Hart <darren@dvhart.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-11-18 16:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-15 22:38 [patch] scheduler: rebalance_tick interval update Darren Hart
2004-11-16 1:17 ` Nick Piggin
2004-11-16 1:53 ` Matthew Dobson
2004-11-16 2:17 ` Nick Piggin
2004-11-16 2:27 ` Nick Piggin
2004-11-16 3:50 ` Darren Hart
2004-11-16 3:51 ` Darren Hart
2004-11-16 7:00 ` Rick Lindsley
[not found] ` <1100576400.14742.12.camel@farah.beaverton.ibm.com>
[not found] ` <4199957C.1020804@cyberone.com.au>
2004-11-16 15:56 ` Darren Hart
2004-11-18 16:22 ` Darren Hart
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.