public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: move enough load to balance average load per task
@ 2006-04-10  6:45 Peter Williams
  2006-04-11  1:12 ` Siddha, Suresh B
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Williams @ 2006-04-10  6:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Chen, Kenneth W, Con Kolivas, Ingo Molnar, Mike Galbraith,
	Nick Piggin, Siddha, Suresh B, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

Problem:

The current implementation of find_busiest_group() recognizes that 
approximately equal average loads per task for each group/queue are 
desirable (e.g. this condition will increase the probability that the 
top N highest priority tasks on an N CPU system will be on different 
CPUs) by being slightly more aggressive when *imbalance is small but the 
average load per task in "busiest" group is more than that in "this" 
group.  Unfortunately, the amount moved from "busiest" to "this" is too 
small to reduce the average load per task on "busiest" (at best there 
will be no change and at worst it will get bigger).

Solution:

Increase the amount of load moved from "busiest" to "this" in these 
circumstances while making sure that the amount of load moved won't 
increase the (absolute) difference in the two groups' total weighted 
loads.  A task with a weighted load greater than the average needs to be 
moved to cause the average to be reduced.

NB This makes no difference to load balancing for the case where all 
tasks have nice==0.

Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>

-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

[-- Attachment #2: smpnice-help-balance-avg-loads --]
[-- Type: text/plain, Size: 1100 bytes --]

Index: MM-2.6.17-rc1-mm2/kernel/sched.c
===================================================================
--- MM-2.6.17-rc1-mm2.orig/kernel/sched.c	2006-04-10 10:46:53.000000000 +1000
+++ MM-2.6.17-rc1-mm2/kernel/sched.c	2006-04-10 14:16:32.000000000 +1000
@@ -2258,16 +2258,20 @@ find_busiest_group(struct sched_domain *
 	if (*imbalance < busiest_load_per_task) {
 		unsigned long pwr_now = 0, pwr_move = 0;
 		unsigned long tmp;
-		unsigned int imbn = 2;
 
-		if (this_nr_running) {
+		if (this_nr_running)
 			this_load_per_task /= this_nr_running;
-			if (busiest_load_per_task > this_load_per_task)
-				imbn = 1;
-		} else
+		else
 			this_load_per_task = SCHED_LOAD_SCALE;
 
-		if (max_load - this_load >= busiest_load_per_task * imbn) {
+		if (busiest_load_per_task > this_load_per_task) {
+			unsigned long dld = max_load - this_load;
+
+			if (dld > busiest_load_per_task) {
+				*imbalance = (dld + busiest_load_per_task) / 2;
+				return busiest;
+			}
+		} else if (max_load - this_load >= busiest_load_per_task * 2) {
 			*imbalance = busiest_load_per_task;
 			return busiest;
 		}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-10  6:45 [PATCH] sched: move enough load to balance average load per task Peter Williams
@ 2006-04-11  1:12 ` Siddha, Suresh B
  2006-04-11  1:57   ` Peter Williams
  2006-04-11 23:46   ` Peter Williams
  0 siblings, 2 replies; 11+ messages in thread
From: Siddha, Suresh B @ 2006-04-11  1:12 UTC (permalink / raw)
  To: Peter Williams
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Siddha, Suresh B,
	Linux Kernel Mailing List

On Mon, Apr 10, 2006 at 04:45:32PM +1000, Peter Williams wrote:
> Problem:
> 
> The current implementation of find_busiest_group() recognizes that 
> approximately equal average loads per task for each group/queue are 
> desirable (e.g. this condition will increase the probability that the 
> top N highest priority tasks on an N CPU system will be on different 
> CPUs) by being slightly more aggressive when *imbalance is small but the 
> average load per task in "busiest" group is more than that in "this" 
> group.  Unfortunately, the amount moved from "busiest" to "this" is too 
> small to reduce the average load per task on "busiest" (at best there 
> will be no change and at worst it will get bigger).

Peter, We don't need to reduce the average load per task on "busiest"
always. By moving a "busiest_load_per_task", we will increase the 
average load per task of lesser busy cpu (there by trying to achieve
the equality with busiest...)

Can you give an example scenario where this patch helps? And doesn't
the normal imabalance calculations capture those issues?

thanks,
suresh

> 
> Solution:
> 
> Increase the amount of load moved from "busiest" to "this" in these 
> circumstances while making sure that the amount of load moved won't 
> increase the (absolute) difference in the two groups' total weighted 
> loads.  A task with a weighted load greater than the average needs to be 
> moved to cause the average to be reduced.
> 
> NB This makes no difference to load balancing for the case where all 
> tasks have nice==0.
> 
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
> 
> -- 
> Peter Williams                                   pwil3058@bigpond.net.au
> 
> "Learning, n. The kind of ignorance distinguishing the studious."
>   -- Ambrose Bierce

> Index: MM-2.6.17-rc1-mm2/kernel/sched.c
> ===================================================================
> --- MM-2.6.17-rc1-mm2.orig/kernel/sched.c	2006-04-10 10:46:53.000000000 +1000
> +++ MM-2.6.17-rc1-mm2/kernel/sched.c	2006-04-10 14:16:32.000000000 +1000
> @@ -2258,16 +2258,20 @@ find_busiest_group(struct sched_domain *
>  	if (*imbalance < busiest_load_per_task) {
>  		unsigned long pwr_now = 0, pwr_move = 0;
>  		unsigned long tmp;
> -		unsigned int imbn = 2;
>  
> -		if (this_nr_running) {
> +		if (this_nr_running)
>  			this_load_per_task /= this_nr_running;
> -			if (busiest_load_per_task > this_load_per_task)
> -				imbn = 1;
> -		} else
> +		else
>  			this_load_per_task = SCHED_LOAD_SCALE;
>  
> -		if (max_load - this_load >= busiest_load_per_task * imbn) {
> +		if (busiest_load_per_task > this_load_per_task) {
> +			unsigned long dld = max_load - this_load;
> +
> +			if (dld > busiest_load_per_task) {
> +				*imbalance = (dld + busiest_load_per_task) / 2;
> +				return busiest;
> +			}
> +		} else if (max_load - this_load >= busiest_load_per_task * 2) {
>  			*imbalance = busiest_load_per_task;
>  			return busiest;
>  		}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-11  1:12 ` Siddha, Suresh B
@ 2006-04-11  1:57   ` Peter Williams
  2006-04-11  5:47     ` Siddha, Suresh B
  2006-04-11 23:46   ` Peter Williams
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Williams @ 2006-04-11  1:57 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Linux Kernel Mailing List

Siddha, Suresh B wrote:
> On Mon, Apr 10, 2006 at 04:45:32PM +1000, Peter Williams wrote:
>> Problem:
>>
>> The current implementation of find_busiest_group() recognizes that 
>> approximately equal average loads per task for each group/queue are 
>> desirable (e.g. this condition will increase the probability that the 
>> top N highest priority tasks on an N CPU system will be on different 
>> CPUs) by being slightly more aggressive when *imbalance is small but the 
>> average load per task in "busiest" group is more than that in "this" 
>> group.  Unfortunately, the amount moved from "busiest" to "this" is too 
>> small to reduce the average load per task on "busiest" (at best there 
>> will be no change and at worst it will get bigger).
> 
> Peter, We don't need to reduce the average load per task on "busiest"
> always. By moving a "busiest_load_per_task", we will increase the 
> average load per task of lesser busy cpu (there by trying to achieve
> the equality with busiest...)
> 
> Can you give an example scenario where this patch helps? And doesn't
> the normal imabalance calculations capture those issues?

Yes, I think that the normal imbalance calculations (in 
find_busiest_queue()) will generally capture the aim of having 
approximately equal average loads per task on run queues.  But this bit 
of code is a special case in that the extra aggression being taken by 
the load balancer (in response to a scenario raised by you) is being 
justified by the imbalance in the average loads per task so it behooves 
us to do the best we can to ensure that that imbalance is addressed.

I don't think this is true for try_to_wake_up() and some changes may be 
desirable there.  However, any such changes would interact with the RT 
load balancing that Ingo is working on and would need to be considered 
in conjunction with that.

Why I think "approximately equal average loads per task" is worthwhile 
secondary aim for the load balancer is because it helps restore an 
implicit aim (approximately equal numbers of tasks per run queue) that 
was present in the original version.  This in turn means that the 
distribution of priorities within the queues will be similar and this 
increases the chances that (on an N CPU system) the N highest priority 
tasks will be on different CPUs.  This is a desirable state of affairs.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-11  1:57   ` Peter Williams
@ 2006-04-11  5:47     ` Siddha, Suresh B
  0 siblings, 0 replies; 11+ messages in thread
From: Siddha, Suresh B @ 2006-04-11  5:47 UTC (permalink / raw)
  To: Peter Williams
  Cc: Siddha, Suresh B, Andrew Morton, Chen, Kenneth W, Con Kolivas,
	Ingo Molnar, Mike Galbraith, Nick Piggin,
	Linux Kernel Mailing List

On Tue, Apr 11, 2006 at 11:57:12AM +1000, Peter Williams wrote:
> Siddha, Suresh B wrote:
> > Can you give an example scenario where this patch helps? And doesn't
> > the normal imabalance calculations capture those issues?
> 
> Yes, I think that the normal imbalance calculations (in 
> find_busiest_queue()) will generally capture the aim of having 
> approximately equal average loads per task on run queues.  But this bit 
> of code is a special case in that the extra aggression being taken by 
> the load balancer (in response to a scenario raised by you) is being 

Can you give a specific example which shows the problem
and which you are trying to fix with this particular patch..

thanks,
suresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-11  1:12 ` Siddha, Suresh B
  2006-04-11  1:57   ` Peter Williams
@ 2006-04-11 23:46   ` Peter Williams
  2006-04-12  1:57     ` Siddha, Suresh B
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Williams @ 2006-04-11 23:46 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Linux Kernel Mailing List

Siddha, Suresh B wrote:
> On Mon, Apr 10, 2006 at 04:45:32PM +1000, Peter Williams wrote:
>> Problem:
>>
>> The current implementation of find_busiest_group() recognizes that 
>> approximately equal average loads per task for each group/queue are 
>> desirable (e.g. this condition will increase the probability that the 
>> top N highest priority tasks on an N CPU system will be on different 
>> CPUs) by being slightly more aggressive when *imbalance is small but the 
>> average load per task in "busiest" group is more than that in "this" 
>> group.  Unfortunately, the amount moved from "busiest" to "this" is too 
>> small to reduce the average load per task on "busiest" (at best there 
>> will be no change and at worst it will get bigger).
> 
> Peter, We don't need to reduce the average load per task on "busiest"
> always. By moving a "busiest_load_per_task", we will increase the 
> average load per task of lesser busy cpu (there by trying to achieve
> the equality with busiest...)

Well, first off, we don't always move busiest_load_per_task we move UP 
TO busiest_load_per_task so there is no way you can make definitive 
statements about what will happen to the value "this_load_per_task" as a 
result of setting *imbalance to busiest_load_per_task.  Load balancing 
is a probabilistic endeavour and we need to take steps that increase the 
probability that we get the desired result.

Without this patch there is no chance that busiest_load_per_task will 
get smaller and whether this_load_per_task will get bigger is 
indeterminate.  With this patch there IS a chance that 
busiest_load_per_task will decrease and an INCREASED chance that 
this_load_per_task will get bigger.  Ergo we have increased the 
probability that the (absolute) difference between this_load_per_task 
and busiest_load_per_task will decrease.  This is a desirable outcome.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-11 23:46   ` Peter Williams
@ 2006-04-12  1:57     ` Siddha, Suresh B
  2006-04-12  5:06       ` Peter Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Siddha, Suresh B @ 2006-04-12  1:57 UTC (permalink / raw)
  To: Peter Williams
  Cc: Siddha, Suresh B, Andrew Morton, Chen, Kenneth W, Con Kolivas,
	Ingo Molnar, Mike Galbraith, Nick Piggin,
	Linux Kernel Mailing List

On Wed, Apr 12, 2006 at 09:46:32AM +1000, Peter Williams wrote:
> Siddha, Suresh B wrote:
> > On Mon, Apr 10, 2006 at 04:45:32PM +1000, Peter Williams wrote:
> >> Problem:
> >>
> >> The current implementation of find_busiest_group() recognizes that 
> >> approximately equal average loads per task for each group/queue are 
> >> desirable (e.g. this condition will increase the probability that the 
> >> top N highest priority tasks on an N CPU system will be on different 
> >> CPUs) by being slightly more aggressive when *imbalance is small but the 
> >> average load per task in "busiest" group is more than that in "this" 
> >> group.  Unfortunately, the amount moved from "busiest" to "this" is too 
> >> small to reduce the average load per task on "busiest" (at best there 
> >> will be no change and at worst it will get bigger).
> > 
> > Peter, We don't need to reduce the average load per task on "busiest"
> > always. By moving a "busiest_load_per_task", we will increase the 
> > average load per task of lesser busy cpu (there by trying to achieve
> > the equality with busiest...)
> 
> Well, first off, we don't always move busiest_load_per_task we move UP 
> TO busiest_load_per_task so there is no way you can make definitive 
> statements about what will happen to the value "this_load_per_task" as a 
> result of setting *imbalance to busiest_load_per_task.  Load balancing 
> is a probabilistic endeavour and we need to take steps that increase the 
> probability that we get the desired result.

I agree with you. But the previous code was more conservative and may slowly
(just from theory pt of view... I don't have an example to show this..)
balance towards the desired state. With this code, I feel we are
aggressive. for example, on a DP system: if I run one high priority
and two low priority processes, they keep hopping from one processor
to another... you may argue it is because of the "top" or some other
process... I agree that it is the case.. But same thing doesn't happen
with the previous version.. I like the conservative approach...

> Without this patch there is no chance that busiest_load_per_task will 
> get smaller 

Is there an example for this?

> and whether this_load_per_task will get bigger is 
> indeterminate.  With this patch there IS a chance that 
> busiest_load_per_task will decrease and an INCREASED chance that 
> this_load_per_task will get bigger.  Ergo we have increased the 
> probability that the (absolute) difference between this_load_per_task 
> and busiest_load_per_task will decrease.  This is a desirable outcome.

All I am saying is we are more aggressive.. I don't have any issue with
the desired outcome..

thanks,
suresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-12  1:57     ` Siddha, Suresh B
@ 2006-04-12  5:06       ` Peter Williams
  2006-04-12 16:55         ` Siddha, Suresh B
       [not found]         ` <443D95DF.2090807@bigpond.net.au>
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Williams @ 2006-04-12  5:06 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Linux Kernel Mailing List

Siddha, Suresh B wrote:
> On Wed, Apr 12, 2006 at 09:46:32AM +1000, Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> On Mon, Apr 10, 2006 at 04:45:32PM +1000, Peter Williams wrote:
>>>> Problem:
>>>>
>>>> The current implementation of find_busiest_group() recognizes that 
>>>> approximately equal average loads per task for each group/queue are 
>>>> desirable (e.g. this condition will increase the probability that the 
>>>> top N highest priority tasks on an N CPU system will be on different 
>>>> CPUs) by being slightly more aggressive when *imbalance is small but the 
>>>> average load per task in "busiest" group is more than that in "this" 
>>>> group.  Unfortunately, the amount moved from "busiest" to "this" is too 
>>>> small to reduce the average load per task on "busiest" (at best there 
>>>> will be no change and at worst it will get bigger).
>>> Peter, We don't need to reduce the average load per task on "busiest"
>>> always. By moving a "busiest_load_per_task", we will increase the 
>>> average load per task of lesser busy cpu (there by trying to achieve
>>> the equality with busiest...)
>> Well, first off, we don't always move busiest_load_per_task we move UP 
>> TO busiest_load_per_task so there is no way you can make definitive 
>> statements about what will happen to the value "this_load_per_task" as a 
>> result of setting *imbalance to busiest_load_per_task.  Load balancing 
>> is a probabilistic endeavour and we need to take steps that increase the 
>> probability that we get the desired result.
> 
> I agree with you. But the previous code was more conservative and may slowly
> (just from theory pt of view... I don't have an example to show this..)
> balance towards the desired state. With this code, I feel we are
> aggressive. for example, on a DP system: if I run one high priority
> and two low priority processes, they keep hopping from one processor
> to another... you may argue it is because of the "top" or some other
> process... I agree that it is the case.. But same thing doesn't happen
> with the previous version.. I like the conservative approach...
> 
>> Without this patch there is no chance that busiest_load_per_task will 
>> get smaller 
> 
> Is there an example for this?

Yes, we just take a slight variation of your scenario that prompted the 
first patch (to which this patch is a minor modification) by adding one 
normal priority task to each of the CPUs.  This gives us a 2 CPU system 
with CPU-0 having 2 high priority tasks plus 1 normal priority task and 
CPU-1 having two normal priority tasks.  Clearly, the desirable load 
balancing outcome would be for the two high priority tasks to be on 
different CPUs otherwise we have a high priority task stuck on a run 
queue while a normal priority is running on another (less heavily 
loaded) CPU.

In order to analyze what happens during load balancing, let's use W as 
the load weight for a normal task and suppose that the load weights of 
the two high priority tasks are (W + k) and that "this" == CPU-1 in 
find_busiest_queue().  This will result in "busiest" == CPU-0 and:

this_load = 2W
this_load_per_task = W
max_load = 3W + 2k
busiest_load_per_task = W + 2k / 3
avg_load = 5W / 2 + k
max_pull = W / 2 + k
*imbalance = W / 2 + k

Whenever k < (3W / 2) this will result in *imbalance < 
busiest_load_per_task and we end up in the small imbalance code.

(max_load - this_load) = W + 2k which is greater than 
busiest_load_per_task so we decide that we want to move some load from 
"busiest" to "this".

Without this patch we would set *imbalance to busiest_load_per_task and 
the only task on "busiest" that has a weighted load less than or equal 
to this value is the normal task so this is the one that will be moved 
resulting:

this_load = 3W
this_load_per_task = W
max_load = 2W + 2k
busiest_load_per_task = W + k

Even if you reverse the roles of "busiest" and "this", this will be 
considered balanced and the system will stabilize in this undesirable 
state.  NB, as predicted, the average load per task on "this" hasn't 
changed and the average load per task on "busiest" has increased.  We 
still have the situation where a high priority task is stuck on a run 
queue while a low priority task is running on another CPU -- we've 
failed :-(.

With this patch, *imbalance will be set to (W + 4k / 3) which is bigger 
than the weighted load of the high priority tasks so one of them will be 
moved resulting in:

this_load = 3W + k
this_load_per_task = W + k / 3
max_load = 2W + k
busiest_load_per_task = W + k / 2

> 
>> and whether this_load_per_task will get bigger is 
>> indeterminate.  With this patch there IS a chance that 
>> busiest_load_per_task will decrease and an INCREASED chance that 
>> this_load_per_task will get bigger.  Ergo we have increased the 
>> probability that the (absolute) difference between this_load_per_task 
>> and busiest_load_per_task will decrease.  This is a desirable outcome.
> 
> All I am saying is we are more aggressive.. I don't have any issue with
> the desired outcome..

We need to be more aggressive but not too aggressive and I think this 
patch achieves the required balance.

NB busiest_load_per_task < *imbalance < (max_load - this_load) is true 
for this path through the code.  To be precise, *imbalance will be half 
way between busiest_load_per_task and (max_load - this_load).

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-12  5:06       ` Peter Williams
@ 2006-04-12 16:55         ` Siddha, Suresh B
  2006-04-12 23:13           ` Peter Williams
       [not found]         ` <443D95DF.2090807@bigpond.net.au>
  1 sibling, 1 reply; 11+ messages in thread
From: Siddha, Suresh B @ 2006-04-12 16:55 UTC (permalink / raw)
  To: Peter Williams
  Cc: Siddha, Suresh B, Andrew Morton, Chen, Kenneth W, Con Kolivas,
	Ingo Molnar, Mike Galbraith, Nick Piggin,
	Linux Kernel Mailing List

On Wed, Apr 12, 2006 at 03:06:52PM +1000, Peter Williams wrote:
> Siddha, Suresh B wrote:
> > Is there an example for this?
> 
> Yes, we just take a slight variation of your scenario that prompted the 
> first patch (to which this patch is a minor modification) by adding one 
> normal priority task to each of the CPUs.  This gives us a 2 CPU system 
> with CPU-0 having 2 high priority tasks plus 1 normal priority task and 
> CPU-1 having two normal priority tasks.  Clearly, the desirable load 
> balancing outcome would be for the two high priority tasks to be on 
> different CPUs otherwise we have a high priority task stuck on a run 
> queue while a normal priority is running on another (less heavily 
> loaded) CPU.
> 
> In order to analyze what happens during load balancing, let's use W as 
> the load weight for a normal task and suppose that the load weights of 
> the two high priority tasks are (W + k) and that "this" == CPU-1 in 
> find_busiest_queue().  This will result in "busiest" == CPU-0 and:
> 
> this_load = 2W
> this_load_per_task = W
> max_load = 3W + 2k
> busiest_load_per_task = W + 2k / 3
> avg_load = 5W / 2 + k
> max_pull = W / 2 + k
> *imbalance = W / 2 + k
> 
> Whenever k < (3W / 2) this will result in *imbalance < 
> busiest_load_per_task and we end up in the small imbalance code.
> 
> (max_load - this_load) = W + 2k which is greater than 
> busiest_load_per_task so we decide that we want to move some load from 
> "busiest" to "this".
> 
> Without this patch we would set *imbalance to busiest_load_per_task and 
> the only task on "busiest" that has a weighted load less than or equal 
> to this value is the normal task so this is the one that will be moved 
> resulting:
> 
> this_load = 3W
> this_load_per_task = W
> max_load = 2W + 2k
> busiest_load_per_task = W + k
> 
> Even if you reverse the roles of "busiest" and "this", this will be 
> considered balanced and the system will stabilize in this undesirable 
> state.  NB, as predicted, the average load per task on "this" hasn't 
> changed and the average load per task on "busiest" has increased.  We 
> still have the situation where a high priority task is stuck on a run 
> queue while a low priority task is running on another CPU -- we've 
> failed :-(.

for such a 'k' value, we fail anyhow. For example, how does the normal
load balance detect an imbalance in this below situation?

this_load = 3W
this_load_per_task = W
max_load = 2W + 2k
busiest_load_per_task = W + k

if we really want to distribute 'N' higher priority tasks(however small or
big is the priority difference between low and high priority tasks) on to 
'N' different cpus, we will need really different approach for load 
balancing..

thanks,
suresh

> 
> With this patch, *imbalance will be set to (W + 4k / 3) which is bigger 
> than the weighted load of the high priority tasks so one of them will be 
> moved resulting in:
> 
> this_load = 3W + k
> this_load_per_task = W + k / 3
> max_load = 2W + k
> busiest_load_per_task = W + k / 2
> 
> > 
> >> and whether this_load_per_task will get bigger is 
> >> indeterminate.  With this patch there IS a chance that 
> >> busiest_load_per_task will decrease and an INCREASED chance that 
> >> this_load_per_task will get bigger.  Ergo we have increased the 
> >> probability that the (absolute) difference between this_load_per_task 
> >> and busiest_load_per_task will decrease.  This is a desirable outcome.
> > 
> > All I am saying is we are more aggressive.. I don't have any issue with
> > the desired outcome..
> 
> We need to be more aggressive but not too aggressive and I think this 
> patch achieves the required balance.
> 
> NB busiest_load_per_task < *imbalance < (max_load - this_load) is true 
> for this path through the code.  To be precise, *imbalance will be half 
> way between busiest_load_per_task and (max_load - this_load).
> 
> Peter
> -- 
> Peter Williams                                   pwil3058@bigpond.net.au
> 
> "Learning, n. The kind of ignorance distinguishing the studious."
>   -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched: move enough load to balance average load per task
  2006-04-12 16:55         ` Siddha, Suresh B
@ 2006-04-12 23:13           ` Peter Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Williams @ 2006-04-12 23:13 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Linux Kernel Mailing List

Siddha, Suresh B wrote:
> On Wed, Apr 12, 2006 at 03:06:52PM +1000, Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> Is there an example for this?
>> Yes, we just take a slight variation of your scenario that prompted the 
>> first patch (to which this patch is a minor modification) by adding one 
>> normal priority task to each of the CPUs.  This gives us a 2 CPU system 
>> with CPU-0 having 2 high priority tasks plus 1 normal priority task and 
>> CPU-1 having two normal priority tasks.  Clearly, the desirable load 
>> balancing outcome would be for the two high priority tasks to be on 
>> different CPUs otherwise we have a high priority task stuck on a run 
>> queue while a normal priority is running on another (less heavily 
>> loaded) CPU.
>>
>> In order to analyze what happens during load balancing, let's use W as 
>> the load weight for a normal task and suppose that the load weights of 
>> the two high priority tasks are (W + k) and that "this" == CPU-1 in 
>> find_busiest_queue().  This will result in "busiest" == CPU-0 and:
>>
>> this_load = 2W
>> this_load_per_task = W
>> max_load = 3W + 2k
>> busiest_load_per_task = W + 2k / 3
>> avg_load = 5W / 2 + k
>> max_pull = W / 2 + k
>> *imbalance = W / 2 + k
>>
>> Whenever k < (3W / 2) this will result in *imbalance < 
>> busiest_load_per_task and we end up in the small imbalance code.
>>
>> (max_load - this_load) = W + 2k which is greater than 
>> busiest_load_per_task so we decide that we want to move some load from 
>> "busiest" to "this".
>>
>> Without this patch we would set *imbalance to busiest_load_per_task and 
>> the only task on "busiest" that has a weighted load less than or equal 
>> to this value is the normal task so this is the one that will be moved 
>> resulting:
>>
>> this_load = 3W
>> this_load_per_task = W
>> max_load = 2W + 2k
>> busiest_load_per_task = W + k
>>
>> Even if you reverse the roles of "busiest" and "this", this will be 
>> considered balanced and the system will stabilize in this undesirable 
>> state.  NB, as predicted, the average load per task on "this" hasn't 
>> changed and the average load per task on "busiest" has increased.  We 
>> still have the situation where a high priority task is stuck on a run 
>> queue while a low priority task is running on another CPU -- we've 
>> failed :-(.
> 
> for such a 'k' value, we fail anyhow. For example, how does the normal
> load balance detect an imbalance in this below situation?
> 
> this_load = 3W
> this_load_per_task = W
> max_load = 2W + 2k
> busiest_load_per_task = W + k

Yes, it's hard to get out of such a situation if you get into one so 
that's why changes to try_to_wake_up() may be needed.  We certainly have 
to stop the load balancing code from creating these situations as well.

> 
> if we really want to distribute 'N' higher priority tasks(however small or
> big is the priority difference between low and high priority tasks) on to 
> 'N' different cpus, we will need really different approach for load 
> balancing..

Yes, I've said similar in another thread but I agreed with Ingo when he 
said that this wasn't really a problem for the load balancer to solve. 
I expressed the same opinion as above, namely that this problem needs to 
be addressed in try_to_wake_up() (which isn't really load balancing) and 
suggested that (for high priority tasks) try_to_wake_up() should be 
modified to find either an idle CPU or (if it can't find an idle one) 
the CPU with the lowest priority current task.

However, it should be noted that Ingo is working on something for 
ensuring the distribution of RT tasks across CPUs and this is likely to 
overlap with this idea so consultation is necessary.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

* smpnice: issues with finding busiest queue
       [not found]         ` <443D95DF.2090807@bigpond.net.au>
@ 2006-04-14  0:31           ` Siddha, Suresh B
  2006-04-14  1:17             ` Peter Williams
  0 siblings, 1 reply; 11+ messages in thread
From: Siddha, Suresh B @ 2006-04-14  0:31 UTC (permalink / raw)
  To: Peter Williams
  Cc: Siddha, Suresh B, Andrew Morton, Chen, Kenneth W, Con Kolivas,
	Ingo Molnar, Mike Galbraith, Nick Piggin,
	Linux Kernel Mailing List

On Thu, Apr 13, 2006 at 10:05:51AM +1000, Peter Williams wrote:
> 
> There be dragons here :-(.
> 

At more places in this part of the world (smpnice) :)

We need to relook at find_busiest_queue()... With the current weighted
calculations, it doesn't always make sense to look at the highest weighted
runqueue in the busy group..

for example on a DP with HT system, how does the load balance behave with
Package-0 containing one high priority and one low priority, Package-1
containing one low priority(with other thread being idle)..

Package-1 thinks that it need to take the low priority thread from Package-0.
And find_busiest_queue() returns the cpu thread with highest priority task..
And ultimately(with help of active load balance) we move high priority
task to Package-1. And same continues with Package-0 now, moving high priority
task from package-1 to package-0..

Even without the presence of active load balance, load balance will fail
to balance(having two low priority tasks on one package, and high
priority task on another package) the above scenario....

We probably need to use imbalance(and more factors) to determine the busiest 
queue in the group.....

thanks,
suresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: smpnice: issues with finding busiest queue
  2006-04-14  0:31           ` smpnice: issues with finding busiest queue Siddha, Suresh B
@ 2006-04-14  1:17             ` Peter Williams
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Williams @ 2006-04-14  1:17 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Chen, Kenneth W, Con Kolivas, Ingo Molnar,
	Mike Galbraith, Nick Piggin, Linux Kernel Mailing List

Siddha, Suresh B wrote:
> On Thu, Apr 13, 2006 at 10:05:51AM +1000, Peter Williams wrote:
>> There be dragons here :-(.
>>
> 
> At more places in this part of the world (smpnice) :)
> 
> We need to relook at find_busiest_queue()... With the current weighted
> calculations, it doesn't always make sense to look at the highest weighted
> runqueue in the busy group..
> 
> for example on a DP with HT system, how does the load balance behave with
> Package-0 containing one high priority and one low priority, Package-1
> containing one low priority(with other thread being idle)..
> 
> Package-1 thinks that it need to take the low priority thread from Package-0.
> And find_busiest_queue() returns the cpu thread with highest priority task..
> And ultimately(with help of active load balance) we move high priority
> task to Package-1. And same continues with Package-0 now, moving high priority
> task from package-1 to package-0..
> 
> Even without the presence of active load balance, load balance will fail
> to balance(having two low priority tasks on one package, and high
> priority task on another package) the above scenario....
> 
> We probably need to use imbalance(and more factors) to determine the busiest 
> queue in the group.....

A patch would be nice.

-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-04-14  1:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-10  6:45 [PATCH] sched: move enough load to balance average load per task Peter Williams
2006-04-11  1:12 ` Siddha, Suresh B
2006-04-11  1:57   ` Peter Williams
2006-04-11  5:47     ` Siddha, Suresh B
2006-04-11 23:46   ` Peter Williams
2006-04-12  1:57     ` Siddha, Suresh B
2006-04-12  5:06       ` Peter Williams
2006-04-12 16:55         ` Siddha, Suresh B
2006-04-12 23:13           ` Peter Williams
     [not found]         ` <443D95DF.2090807@bigpond.net.au>
2006-04-14  0:31           ` smpnice: issues with finding busiest queue Siddha, Suresh B
2006-04-14  1:17             ` Peter Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox