public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] sched: prevent high load weight tasks suppressing balancing
       [not found] <4427873A.4010601@bigpond.net.au>
@ 2006-03-27 21:52 ` Siddha, Suresh B
  2006-03-27 23:21   ` Peter Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Siddha, Suresh B @ 2006-03-27 21:52 UTC (permalink / raw)
  To: Peter Williams
  Cc: Andrew Morton, Siddha, Suresh B, Mike Galbraith, Nick Piggin,
	Ingo Molnar, Chen, Kenneth W, Con Kolivas, linux-kernel

This breaks HT and MC optimizations.. Consider a DP system with each
physical processor having two HT logical threads.. if there are two 
runnable processes running on package-0, with this patch scheduler 
will never move one of those processes to package-1..

thanks,
suresh

On Mon, Mar 27, 2006 at 05:33:30PM +1100, Peter Williams wrote:
> Problem:
> 
> On systems with more than 2 CPUs it is possible for a single task with a 
> high smpnice load weight to suppress load balancing on other CPUs (to 
> the one that it's running on) if it is the only runnable task on its 
> CPU. E.g. consider a 4-way system (simple SMP system with no HT and 
> cores) scenario where a high priority task (nice-20) is running on P0 
> and two normal priority tasks running on P1. load balance with smp nice 
> code will never be able to detect an imbalance and hence will never move 
> one of the normal priority tasks on P1 to idle cpus P2 or P3 as P0 will 
> always be identified as the busiest CPU but it has no tasks that can be 
> moved.
> 
> Solution:
> 
> Make sure that only CPUs with tasks that can be moved get selected as 
> the busiest queue.  This involves ensuring that find_busiest_group() 
> only considers groups that have at least one CPU with more than one task 
> running as candidates for the busiest group and that 
> find_busiest_queue() only considers CPUs that have more than one task 
> running as candidates for the busiest run queue.
> 
> One effect of this is that load balancing will be abandoned earlier in 
> the sequence (i.e. before the double run queue locks are taken prior to 
> calling move_tasks() rather than in move_tasks() itself) when there are 
> no tasks that can be moved than would be the case without this patch.
> 
> Signed-off-by: Peter Williams <pwil3058@bigpond.com.au>
> 
> Peter
> PS This doesn't take into account tasks that can't be moved because they 
> are pinned to a particular CPU.  At this stage, I don't think that it's 
> worth the effort to make the changes that would enable this.
> -- 
> Peter Williams                                   pwil3058@bigpond.net.au
> 
> "Learning, n. The kind of ignorance distinguishing the studious."
>   -- Ambrose Bierce

> Index: MM-2.6.X/kernel/sched.c
> ===================================================================
> --- MM-2.6.X.orig/kernel/sched.c	2006-03-27 16:00:12.000000000 +1100
> +++ MM-2.6.X/kernel/sched.c	2006-03-27 17:02:53.000000000 +1100
> @@ -2115,6 +2115,7 @@ find_busiest_group(struct sched_domain *
>  		int local_group;
>  		int i;
>  		unsigned long sum_nr_running, sum_weighted_load;
> +		unsigned int nr_loaded_cpus = 0; /* where nr_running > 1 */
>  
>  		local_group = cpu_isset(this_cpu, group->cpumask);
>  
> @@ -2135,6 +2136,8 @@ find_busiest_group(struct sched_domain *
>  
>  			avg_load += load;
>  			sum_nr_running += rq->nr_running;
> +			if (rq->nr_running > 1)
> +				++nr_loaded_cpus;
>  			sum_weighted_load += rq->raw_weighted_load;
>  		}
>  
> @@ -2149,7 +2152,7 @@ find_busiest_group(struct sched_domain *
>  			this = group;
>  			this_nr_running = sum_nr_running;
>  			this_load_per_task = sum_weighted_load;
> -		} else if (avg_load > max_load) {
> +		} else if (avg_load > max_load && nr_loaded_cpus) {
>  			max_load = avg_load;
>  			busiest = group;
>  			busiest_nr_running = sum_nr_running;
> @@ -2158,7 +2161,7 @@ find_busiest_group(struct sched_domain *
>  		group = group->next;
>  	} while (group != sd->groups);
>  
> -	if (!busiest || this_load >= max_load || busiest_nr_running <= 1)
> +	if (!busiest || this_load >= max_load)
>  		goto out_balanced;
>  
>  	avg_load = (SCHED_LOAD_SCALE * total_load) / total_pwr;
> @@ -2260,16 +2263,16 @@ out_balanced:
>  static runqueue_t *find_busiest_queue(struct sched_group *group,
>  	enum idle_type idle)
>  {
> -	unsigned long load, max_load = 0;
> -	runqueue_t *busiest = NULL;
> +	unsigned long max_load = 0;
> +	runqueue_t *busiest = NULL, *rqi;
>  	int i;
>  
>  	for_each_cpu_mask(i, group->cpumask) {
> -		load = weighted_cpuload(i);
> +		rqi = cpu_rq(i);
>  
> -		if (load > max_load) {
> -			max_load = load;
> -			busiest = cpu_rq(i);
> +		if (rqi->raw_weighted_load > max_load && rqi->nr_running > 1) {
> +			max_load = rqi->raw_weighted_load;
> +			busiest = rqi;
>  		}
>  	}
>  


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: prevent high load weight tasks suppressing balancing
  2006-03-27 21:52 ` [PATCH] sched: prevent high load weight tasks suppressing balancing Siddha, Suresh B
@ 2006-03-27 23:21   ` Peter Williams
  2006-03-28  3:15     ` Siddha, Suresh B
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Williams @ 2006-03-27 23:21 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Mike Galbraith, Nick Piggin, Ingo Molnar,
	Chen, Kenneth W, Con Kolivas, linux-kernel

Siddha, Suresh B wrote:
> This breaks HT and MC optimizations.. Consider a DP system with each
> physical processor having two HT logical threads.. if there are two 
> runnable processes running on package-0, with this patch scheduler 
> will never move one of those processes to package-1..

Is this an active_load_balance() issue?

If it is then I suggest that the solution is to fix the 
active_load_balance() and associated code so that it works with this 
patch in place.

It would be possible to modify find_busiest_group() and 
find_busiest_queue() so that they just PREFER the busiest group to have 
at least one CPU with more than one running task and the busiest queue 
to have more than one task.  However, this would make the code 
considerably more complex and I'm reluctant to impose that on all 
architectures just to satisfy HT and MC requirements.  Are there 
configuration macros or other means that I can use to exclude this 
(proposed) code on systems where it isn't needed i.e. non HT and MC 
systems or HT and MC systems with only one package.

Personally, I think that the optimal performance of the load balancing 
code has already been considerably compromised by its unconditionally 
accommodating the requirements of active_load_balance() (which you have 
said is now only required by HT and MC systems) and that it might be 
better if active load balancing was separated out into a separate 
mechanism that could be excluded from the build on architectures that 
don't need it.  I can't help thinking that this would result in a more 
efficient active load balancing mechanism as well because the current 
one is very inefficient.

Peter
PS I don't think that this issue is sufficiently important to prevent 
the adoption of the smpnice patches while it's being resolved.
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: prevent high load weight tasks suppressing balancing
  2006-03-27 23:21   ` Peter Williams
@ 2006-03-28  3:15     ` Siddha, Suresh B
  2006-03-28  4:17       ` Peter Williams
  0 siblings, 1 reply; 4+ messages in thread
From: Siddha, Suresh B @ 2006-03-28  3:15 UTC (permalink / raw)
  To: Peter Williams
  Cc: Siddha, Suresh B, Andrew Morton, Mike Galbraith, Nick Piggin,
	Ingo Molnar, Chen, Kenneth W, Con Kolivas, linux-kernel

On Tue, Mar 28, 2006 at 10:21:38AM +1100, Peter Williams wrote:
> Siddha, Suresh B wrote:
> > This breaks HT and MC optimizations.. Consider a DP system with each
> > physical processor having two HT logical threads.. if there are two 
> > runnable processes running on package-0, with this patch scheduler 
> > will never move one of those processes to package-1..
> 
> Is this an active_load_balance() issue?

No. find_busiest_group() doesn't find an imbalance in this case..

> If it is then I suggest that the solution is to fix the 
> active_load_balance() and associated code so that it works with this 
> patch in place.
> 
> It would be possible to modify find_busiest_group() and 
> find_busiest_queue() so that they just PREFER the busiest group to have 
> at least one CPU with more than one running task and the busiest queue 
> to have more than one task.  However, this would make the code 

Please don't do that... Its not for the complexity I say NO but we 
are kind of patching the code instead of addressing the root issue..

> considerably more complex and I'm reluctant to impose that on all 
> architectures just to satisfy HT and MC requirements.  Are there 
> configuration macros or other means that I can use to exclude this 
> (proposed) code on systems where it isn't needed i.e. non HT and MC 
> systems or HT and MC systems with only one package.

There is no config option to disable that portion of the code. Interaction
of this code with mainstream code is very small. Look at the
active_load_balance() and how this comes into play with the help of
migration thread(which gets activated through load_balance)

> Personally, I think that the optimal performance of the load balancing 
> code has already been considerably compromised by its unconditionally 
> accommodating the requirements of active_load_balance() (which you have 
> said is now only required by HT and MC systems) and that it might be 
> better if active load balancing was separated out into a separate 
> mechanism that could be excluded from the build on architectures that 
> don't need it.  I can't help thinking that this would result in a more 
> efficient active load balancing mechanism as well because the current 
> one is very inefficient.

No. Upto now, this has been encapsulated very generically using cpu_power
and thats the reason why adding a sched domain for multi-core was simple.

> Peter
> PS I don't think that this issue is sufficiently important to prevent 
> the adoption of the smpnice patches while it's being resolved.

Scheduler is a very critical piece in the kernel. We need to understand and fix
all the cases..

thanks,
suresh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] sched: prevent high load weight tasks suppressing balancing
  2006-03-28  3:15     ` Siddha, Suresh B
@ 2006-03-28  4:17       ` Peter Williams
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Williams @ 2006-03-28  4:17 UTC (permalink / raw)
  To: Siddha, Suresh B
  Cc: Andrew Morton, Mike Galbraith, Nick Piggin, Ingo Molnar,
	Chen, Kenneth W, Con Kolivas, linux-kernel

Siddha, Suresh B wrote:
> On Tue, Mar 28, 2006 at 10:21:38AM +1100, Peter Williams wrote:
>> Siddha, Suresh B wrote:
>>> This breaks HT and MC optimizations.. Consider a DP system with each
>>> physical processor having two HT logical threads.. if there are two 
>>> runnable processes running on package-0, with this patch scheduler 
>>> will never move one of those processes to package-1..
>> Is this an active_load_balance() issue?
> 
> No. find_busiest_group() doesn't find an imbalance in this case..

But active_load_balance() is the only code that would want to move the 
only runnable task off a CPU, isn't it?  No other load balancing code 
will try to do this that I can see.

> 
>> If it is then I suggest that the solution is to fix the 
>> active_load_balance() and associated code so that it works with this 
>> patch in place.
>>
>> It would be possible to modify find_busiest_group() and 
>> find_busiest_queue() so that they just PREFER the busiest group to have 
>> at least one CPU with more than one running task and the busiest queue 
>> to have more than one task.  However, this would make the code 
> 
> Please don't do that... Its not for the complexity I say NO but we 
> are kind of patching the code instead of addressing the root issue..
> 
>> considerably more complex and I'm reluctant to impose that on all 
>> architectures just to satisfy HT and MC requirements.  Are there 
>> configuration macros or other means that I can use to exclude this 
>> (proposed) code on systems where it isn't needed i.e. non HT and MC 
>> systems or HT and MC systems with only one package.
> 
> There is no config option to disable that portion of the code. Interaction
> of this code with mainstream code is very small. Look at the
> active_load_balance() and how this comes into play with the help of
> migration thread(which gets activated through load_balance)

Yes, I've read that which is why I say (see below) that it's backwards 
and haphazard.

I'll make a temporary patch that does the PREFER I mentioned above to 
tide us over until a proper rewrite of the active load balancing 
functionality can be done.  After giving it some more thought I think I 
can keep the extra complexity fairly small.

> 
>> Personally, I think that the optimal performance of the load balancing 
>> code has already been considerably compromised by its unconditionally 
>> accommodating the requirements of active_load_balance() (which you have 
>> said is now only required by HT and MC systems) and that it might be 
>> better if active load balancing was separated out into a separate 
>> mechanism that could be excluded from the build on architectures that 
>> don't need it.  I can't help thinking that this would result in a more 
>> efficient active load balancing mechanism as well because the current 
>> one is very inefficient.
> 
> No. Upto now, this has been encapsulated very generically using cpu_power
> and thats the reason why adding a sched domain for multi-core was simple.

It seems to me that it's being done backwards and haphazardly.  As far 
as I can see the problem that's trying to be solved is there is a 
package that has two or more CPUs that have exactly one runnable task 
and there are other packages that have all of their CPUs idle and we 
want to move one task to each idle package, right?

If any of the CPUs in the package have more than one runnable task then 
normal load balancing will kick in which is why I say this special code 
is only required for the case where there's exactly one task for two or 
more of the CPUs in the package.

So why not write code that (every so many ticks) checks to see if a 
package meets these criteria and if it does then looks for idle packages 
(that's packages not groups or queues) and if it finds them initiates 
active load balancing?  Or some variation of that theme.

At the end of scheduler_tick() you could do (every so many ticks):

if rebalance_tick() didn't pull any tasks and this run queue has exactly 
one runnable task then
         if the package that this run queue is in meets the criteria then
                 set the run queue's active_balance flag and let the 
migration thread know that it has work to do.

Properly packaged this code could be excluded from the build on 
architectures that don't need it.

> 
>> Peter
>> PS I don't think that this issue is sufficiently important to prevent 
>> the adoption of the smpnice patches while it's being resolved.
> 
> Scheduler is a very critical piece in the kernel. We need to understand and fix
> all the cases..

Yes, but this particular problem is a very minor especially when 
compared to the general breakage of "nice" on SMP systems without the 
smpnice patch.

Peter
-- 
Peter Williams                                   pwil3058@bigpond.net.au

"Learning, n. The kind of ignorance distinguishing the studious."
  -- Ambrose Bierce

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-03-28  4:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4427873A.4010601@bigpond.net.au>
2006-03-27 21:52 ` [PATCH] sched: prevent high load weight tasks suppressing balancing Siddha, Suresh B
2006-03-27 23:21   ` Peter Williams
2006-03-28  3:15     ` Siddha, Suresh B
2006-03-28  4:17       ` Peter Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox