[PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
@ 2010-09-28  0:29 Nikhil Rao
  2010-09-28  0:29 ` [PATCH 1/3] sched: set group_imb only a task can be pulled from the busiest cpu Nikhil Rao
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Nikhil Rao @ 2010-09-28  0:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Mike Galbraith
  Cc: Venkatesh Pallipadi, linux-kernel, Nikhil Rao

Hi all,

I have attached a series of patches that improve load balancing when there is a
large weight differential between tasks. These patches are based off the
feedback Peter Zijlstra gave in an earlier post (see http://thread.gmane.org/gmane.linux.kernel/1015966).
They can be applied to v2.6.36-rc5 or -tip without conflicts.

Tested with the following setup.
- Test machine is a 16 cpu box (quad-socket, quad-core).
- Baseline is v2.6.36-rc5 kernel

We spawn 16 SCHED_IDLE soaker threads and one SCHED_NORMAL task. On the
baseline kernel, the machine has ~18% idle time. With these patches applied on
top of baseline, idle time drops to 0%.

v2.6.36-rc5

04:58:46 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
04:58:47 PM  all   81.47    0.00    0.25    0.00    0.00    0.00    0.00   18.28  13796.00
04:58:48 PM  all   81.20    0.00    0.25    0.00    0.00    0.00    0.00   18.55  13816.00
04:58:49 PM  all   80.93    0.19    0.25    0.00    0.00    0.06    0.00   18.57  13965.00
04:58:50 PM  all   81.40    0.00    0.25    0.00    0.00    0.00    0.00   18.35  13837.37
04:58:51 PM  all   81.19    0.00    0.31    0.00    0.00    0.00    0.00   18.50  13592.08
04:58:52 PM  all   81.25    0.00    0.25    0.00    0.00    0.00    0.00   18.50  13721.00
04:58:53 PM  all   81.19    0.00    0.25    0.00    0.00    0.00    0.00   18.56  13764.00
04:58:54 PM  all   81.25    0.00    0.25    0.00    0.00    0.00    0.00   18.50  13841.41
04:58:55 PM  all   80.30    0.00    1.19    0.00    0.00    0.00    0.00   18.51  14989.11
04:58:56 PM  all   80.77    0.00    0.50    0.00    0.00    0.00    0.00   18.73  13964.65
Average:     all   81.09    0.02    0.37    0.00    0.00    0.01    0.00   18.51  13929.53

v2.6.36-rc5 + patches

05:00:06 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
05:00:07 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00  16364.00
05:00:08 PM  all   99.81    0.06    0.12    0.00    0.00    0.00    0.00    0.00  16348.00
05:00:09 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00  16330.00
05:00:10 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00  16317.00
05:00:11 PM  all   99.88    0.06    0.06    0.00    0.00    0.00    0.00    0.00  16327.00
05:00:12 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00  16323.00
05:00:13 PM  all   99.88    0.00    0.12    0.00    0.00    0.00    0.00    0.00  16323.00
05:00:14 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00  16321.00
05:00:15 PM  all   99.63    0.06    0.25    0.00    0.00    0.06    0.00    0.00  16354.00
05:00:16 PM  all   99.62    0.00    0.38    0.00    0.00    0.00    0.00    0.00  19059.60
Average:     all   99.85    0.02    0.13    0.00    0.00    0.01    0.00    0.00  16604.20

Comments, feedback welcome.

-Thanks,
Nikhil

Nikhil Rao (3):
  sched: set group_imb only a task can be pulled from the busiest cpu
  sched: drop group_capacity to 1 only if remote group has no running
    tasks
  sched: do not consider SCHED_IDLE tasks to be cache hot

 kernel/sched.c      |    3 +++
 kernel/sched_fair.c |   12 ++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] sched: set group_imb only a task can be pulled from the busiest cpu
  2010-09-28  0:29 [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Nikhil Rao
@ 2010-09-28  0:29 ` Nikhil Rao
  2010-09-28  0:29 ` [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks Nikhil Rao
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Nikhil Rao @ 2010-09-28  0:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Mike Galbraith
  Cc: Venkatesh Pallipadi, linux-kernel, Nikhil Rao

When cycling through sched groups to determine the busiest group, set
group_imb only if the busiest cpu has more than 1 runnable task. This patch
fixes the case where two cpus in a group have one runnable task each, but there
is a large weight differential between these two tasks. The load balancer is
unable to migrate any task from this group, and hence do not consider this
group to be imbalanced.

Signed-off-by: Nikhil Rao <ncrao@google.com>
---
 kernel/sched_fair.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a171138..de8a6a0 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2378,7 +2378,7 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 			int local_group, const struct cpumask *cpus,
 			int *balance, struct sg_lb_stats *sgs)
 {
-	unsigned long load, max_cpu_load, min_cpu_load;
+	unsigned long load, max_cpu_load, min_cpu_load, max_nr_running;
 	int i;
 	unsigned int balance_cpu = -1, first_idle_cpu = 0;
 	unsigned long avg_load_per_task = 0;
@@ -2389,6 +2389,7 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	/* Tally up the load of all CPUs in the group */
 	max_cpu_load = 0;
 	min_cpu_load = ~0UL;
+	max_nr_running = 0;
 
 	for_each_cpu_and(i, sched_group_cpus(group), cpus) {
 		struct rq *rq = cpu_rq(i);
@@ -2406,8 +2407,10 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 			load = target_load(i, load_idx);
 		} else {
 			load = source_load(i, load_idx);
-			if (load > max_cpu_load)
+			if (load > max_cpu_load) {
 				max_cpu_load = load;
+				max_nr_running = rq->nr_running;
+			}
 			if (min_cpu_load > load)
 				min_cpu_load = load;
 		}
@@ -2447,7 +2450,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
 	if (sgs->sum_nr_running)
 		avg_load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;
 
-	if ((max_cpu_load - min_cpu_load) > 2*avg_load_per_task)
+	if ((max_cpu_load - min_cpu_load) > 2*avg_load_per_task &&
+			max_nr_running > 1)
 		sgs->group_imb = 1;
 
 	sgs->group_capacity =
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks
  2010-09-28  0:29 [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Nikhil Rao
  2010-09-28  0:29 ` [PATCH 1/3] sched: set group_imb only a task can be pulled from the busiest cpu Nikhil Rao
@ 2010-09-28  0:29 ` Nikhil Rao
  2010-09-28 23:04   ` Suresh Siddha
  2010-09-28  0:29 ` [PATCH 3/3] sched: do not consider SCHED_IDLE tasks to be cache hot Nikhil Rao
  2010-09-28 13:57 ` [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Mike Galbraith
  3 siblings, 1 reply; 15+ messages in thread
From: Nikhil Rao @ 2010-09-28  0:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Mike Galbraith
  Cc: Venkatesh Pallipadi, linux-kernel, Nikhil Rao

When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1
only if the remote sched group has no running tasks. This addresses the case
where you have two tasks on one socket and the other socket is idle, in which
case you drop the capacity to 1. If the remote group has >=1 running task, then
there is no difference from a cache-sharing perspective.

Signed-off-by: Nikhil Rao <ncrao@google.com>
---
 kernel/sched_fair.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index de8a6a0..33a7985 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2548,7 +2548,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
 		 * first, lower the sg capacity to one so that we'll try
 		 * and move all the excess tasks away.
 		 */
-		if (prefer_sibling)
+		if (prefer_sibling && !sgs.sum_nr_running)
 			sgs.group_capacity = min(sgs.group_capacity, 1UL);
 
 		if (local_group) {
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks
  2010-09-28  0:29 ` [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks Nikhil Rao
@ 2010-09-28 23:04   ` Suresh Siddha
  2010-10-11 21:20     ` Nikhil Rao
  0 siblings, 1 reply; 15+ messages in thread
From: Suresh Siddha @ 2010-09-28 23:04 UTC (permalink / raw)
  To: Nikhil Rao
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Venkatesh Pallipadi,
	linux-kernel@vger.kernel.org

On Mon, 2010-09-27 at 17:29 -0700, Nikhil Rao wrote:
> When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1
> only if the remote sched group has no running tasks. This addresses the case
> where you have two tasks on one socket and the other socket is idle, in which
> case you drop the capacity to 1. If the remote group has >=1 running task, then
> there is no difference from a cache-sharing perspective.
> 
> Signed-off-by: Nikhil Rao <ncrao@google.com>
> ---
>  kernel/sched_fair.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index de8a6a0..33a7985 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2548,7 +2548,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
>  		 * first, lower the sg capacity to one so that we'll try
>  		 * and move all the excess tasks away.
>  		 */
> -		if (prefer_sibling)
> +		if (prefer_sibling && !sgs.sum_nr_running)
>  			sgs.group_capacity = min(sgs.group_capacity, 1UL);
>  
>  		if (local_group) {

Nikhil, Doesn't this break the case of:

two sockets with dual-core and HT. Four tasks currently scheduled as:
three on socket-0 (two threads on core-0 running two tasks and 1 thread
on core-1 running one task). One on socket-1 (one thread on core-0
running a task, with other core-1 idle)

We would like to move the task from core-0 socket-0 to core-1 socket-1,
while we are load balancing at the socket level (it might be smp or numa
level depending on system).

thanks,
suresh


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks
  2010-09-28 23:04   ` Suresh Siddha
@ 2010-10-11 21:20     ` Nikhil Rao
  0 siblings, 0 replies; 15+ messages in thread
From: Nikhil Rao @ 2010-10-11 21:20 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Ingo Molnar, Peter Zijlstra, Mike Galbraith, Venkatesh Pallipadi,
	linux-kernel@vger.kernel.org

Hi Suresh,

Sorry for the delayed reply.

On Tue, Sep 28, 2010 at 4:04 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
> On Mon, 2010-09-27 at 17:29 -0700, Nikhil Rao wrote:
>> When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1
>> only if the remote sched group has no running tasks. This addresses the case
>> where you have two tasks on one socket and the other socket is idle, in which
>> case you drop the capacity to 1. If the remote group has >=1 running task, then
>> there is no difference from a cache-sharing perspective.
>>
>> Signed-off-by: Nikhil Rao <ncrao@google.com>
>> ---
>>  kernel/sched_fair.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>> index de8a6a0..33a7985 100644
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -2548,7 +2548,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
>>                * first, lower the sg capacity to one so that we'll try
>>                * and move all the excess tasks away.
>>                */
>> -             if (prefer_sibling)
>> +             if (prefer_sibling && !sgs.sum_nr_running)
>>                       sgs.group_capacity = min(sgs.group_capacity, 1UL);
>>
>>               if (local_group) {
>
> Nikhil, Doesn't this break the case of:
>
> two sockets with dual-core and HT. Four tasks currently scheduled as:
> three on socket-0 (two threads on core-0 running two tasks and 1 thread
> on core-1 running one task). One on socket-1 (one thread on core-0
> running a task, with other core-1 idle)
>
> We would like to move the task from core-0 socket-0 to core-1 socket-1,
> while we are load balancing at the socket level (it might be smp or numa
> level depending on system).
>
> thanks,
> suresh
>

Thanks for raising this issue. Yes, when you have a quad-core,
dual-socket machine, the additional check will prevent group_capacity
from dropping down to 1. In this situation, we want to decrease
group_capacity if the local group has extra capacity (i.e.
this_nr_running < this_group_weight) [credit goes to Venki for this
insight]. This also works when you have a niced task, which is what
this patch was trying to fix. I have attached a modified version of
the patch below. Does this look OK?

-Thanks,
Nikhil

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index de8a6a0..e0f697a 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2030,6 +2030,7 @@ struct sd_lb_stats {
        unsigned long this_load;
        unsigned long this_load_per_task;
        unsigned long this_nr_running;
+       unsigned long this_group_capacity;

        /* Statistics of the busiest group */
        unsigned long max_load;
@@ -2548,7 +2549,8 @@ static inline void update_sd_lb_stats(struct
sched_domain *sd, int this_cpu,
                 * first, lower the sg capacity to one so that we'll try
                 * and move all the excess tasks away.
                 */
-               if (prefer_sibling)
+               if (prefer_sibling && !local_group &&
+                   sds->this_nr_running < sds->this_group_capacity)
                        sgs.group_capacity = min(sgs.group_capacity, 1UL);

                if (local_group) {
@@ -2556,6 +2558,7 @@ static inline void update_sd_lb_stats(struct
sched_domain *sd, int this_cpu,
                        sds->this = sg;
                        sds->this_nr_running = sgs.sum_nr_running;
                        sds->this_load_per_task = sgs.sum_weighted_load;
+                       sds->this_group_capacity = sgs.group_capacity;
                } else if (update_sd_pick_busiest(sd, sds, sg, &sgs,
this_cpu)) {
                        sds->max_load = sgs.avg_load;
                        sds->busiest = sg;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] sched: do not consider SCHED_IDLE tasks to be cache hot
  2010-09-28  0:29 [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Nikhil Rao
  2010-09-28  0:29 ` [PATCH 1/3] sched: set group_imb only a task can be pulled from the busiest cpu Nikhil Rao
  2010-09-28  0:29 ` [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks Nikhil Rao
@ 2010-09-28  0:29 ` Nikhil Rao
  2010-09-28 13:57 ` [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Mike Galbraith
  3 siblings, 0 replies; 15+ messages in thread
From: Nikhil Rao @ 2010-09-28  0:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Mike Galbraith
  Cc: Venkatesh Pallipadi, linux-kernel, Nikhil Rao

This patch adds a check in task_hot to return if the task has SCHED_IDLE
policy. SCHED_IDLE tasks have very low weight, and when run with regular
weight tasks, are typically scheduled many milliseconds apart. There is no
benefit from considering SCHED_IDLE tasks cache hot for load balancing.

Signed-off-by: Nikhil Rao <ncrao@google.com>
---
 kernel/sched.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index ed09d4f..874efde 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2003,6 +2003,9 @@ task_hot(struct task_struct *p, u64 now, struct sched_domain *sd)
 	if (p->sched_class != &fair_sched_class)
 		return 0;
 
+	if (p->policy == SCHED_IDLE)
+		return 0;
+
 	/*
 	 * Buddy candidates are cache hot:
 	 */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-09-28  0:29 [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Nikhil Rao
                   ` (2 preceding siblings ...)
  2010-09-28  0:29 ` [PATCH 3/3] sched: do not consider SCHED_IDLE tasks to be cache hot Nikhil Rao
@ 2010-09-28 13:57 ` Mike Galbraith
  2010-09-28 21:15   ` Nikhil Rao
  3 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2010-09-28 13:57 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Mon, 2010-09-27 at 17:29 -0700, Nikhil Rao wrote:
> Hi all,
> 
> I have attached a series of patches that improve load balancing when there is a
> large weight differential between tasks. These patches are based off the
> feedback Peter Zijlstra gave in an earlier post (see http://thread.gmane.org/gmane.linux.kernel/1015966).
> They can be applied to v2.6.36-rc5 or -tip without conflicts.
> 
> Tested with the following setup.
> - Test machine is a 16 cpu box (quad-socket, quad-core).
> - Baseline is v2.6.36-rc5 kernel
> 
> We spawn 16 SCHED_IDLE soaker threads and one SCHED_NORMAL task. On the
> baseline kernel, the machine has ~18% idle time. With these patches applied on
> top of baseline, idle time drops to 0%.

Hm. I can get it stuck with one core idle on ym little quad.

top - 15:53:22 up 11 min, 17 users,  load average: 5.05, 4.40, 2.51
Tasks: 270 total,   7 running, 263 sleeping,   0 stopped,   0 zombie
Cpu(s): 75.3%us,  0.0%sy,  0.0%ni, 24.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 7455 root       5 -15  7996  340  256 R  100  0.0   0:59.93 1 pert
 7421 root      20   0  7996  340  256 R   50  0.0   4:20.01 3 pert
 7422 root      20   0  7996  340  256 R   50  0.0   3:45.81 2 pert
 7423 root      20   0  7996  340  256 R   50  0.0   4:09.45 2 pert
 7424 root      20   0  7996  344  256 R   50  0.0   4:12.75 3 pert



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-09-28 13:57 ` [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Mike Galbraith
@ 2010-09-28 21:15   ` Nikhil Rao
  2010-09-29  1:45     ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Nikhil Rao @ 2010-09-28 21:15 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Tue, Sep 28, 2010 at 6:57 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Mon, 2010-09-27 at 17:29 -0700, Nikhil Rao wrote:
>> Hi all,
>>
>> I have attached a series of patches that improve load balancing when there is a
>> large weight differential between tasks. These patches are based off the
>> feedback Peter Zijlstra gave in an earlier post (see http://thread.gmane.org/gmane.linux.kernel/1015966).
>> They can be applied to v2.6.36-rc5 or -tip without conflicts.
>>
>> Tested with the following setup.
>> - Test machine is a 16 cpu box (quad-socket, quad-core).
>> - Baseline is v2.6.36-rc5 kernel
>>
>> We spawn 16 SCHED_IDLE soaker threads and one SCHED_NORMAL task. On the
>> baseline kernel, the machine has ~18% idle time. With these patches applied on
>> top of baseline, idle time drops to 0%.
>
> Hm. I can get it stuck with one core idle on ym little quad.
>
> top - 15:53:22 up 11 min, 17 users,  load average: 5.05, 4.40, 2.51
> Tasks: 270 total,   7 running, 263 sleeping,   0 stopped,   0 zombie
> Cpu(s): 75.3%us,  0.0%sy,  0.0%ni, 24.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
>
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
>  7455 root       5 -15  7996  340  256 R  100  0.0   0:59.93 1 pert
>  7421 root      20   0  7996  340  256 R   50  0.0   4:20.01 3 pert
>  7422 root      20   0  7996  340  256 R   50  0.0   3:45.81 2 pert
>  7423 root      20   0  7996  340  256 R   50  0.0   4:09.45 2 pert
>  7424 root      20   0  7996  344  256 R   50  0.0   4:12.75 3 pert
>
>

Mike,

Thanks for running this. I've not been able to reproduce what you are
seeing on the few test machines that I have (different combinations of
MC, CPU and NODE domains). Can you please give me more info about
your setup?

-Thanks,
Nikhil

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-09-28 21:15   ` Nikhil Rao
@ 2010-09-29  1:45     ` Mike Galbraith
  2010-09-29 19:32       ` Nikhil Rao
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2010-09-29  1:45 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Tue, 2010-09-28 at 14:15 -0700, Nikhil Rao wrote:

> Thanks for running this. I've not been able to reproduce what you are
> seeing on the few test machines that I have (different combinations of
> MC, CPU and NODE domains). Can you please give me more info about
> your setup?

It's a plain-jane Q6600 box, so has only MC and CPU domains.

It doesn't necessarily _instantly_ "stick", can take a couple tries, or
a little time.

	-Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-09-29  1:45     ` Mike Galbraith
@ 2010-09-29 19:32       ` Nikhil Rao
  2010-10-04  3:08         ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Nikhil Rao @ 2010-09-29 19:32 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Tue, Sep 28, 2010 at 6:45 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2010-09-28 at 14:15 -0700, Nikhil Rao wrote:
>
>> Thanks for running this. I've not been able to reproduce what you are
>> seeing on the few test machines that I have (different combinations of
>> MC, CPU and NODE domains). Can you please give me more info about
>> your setup?
>
> It's a plain-jane Q6600 box, so has only MC and CPU domains.
>
> It doesn't necessarily _instantly_ "stick", can take a couple tries, or
> a little time.

The closest I have is a quad-core dual-socket machine (MC, CPU
domains). And I'm having trouble reproducing it on that machine as
well :-( I ran 5 soaker threads (one of them niced to -15) for a few
hours and didn't see the problem. Can you please give me some trace
data & schedstats to work with?

Looking at the patch/code, I suspect active migration on the CPU
scheduling domain pushes the nice 0 task (running on the same socket
as the nice -15 task) to the other socket. This leaves you with an
idle core on the nice -15 socket, and with soaker threads there is no
way to come back to a 100% utilized state. One possible explanation is
the group capacity for a sched group in the CPU sched domain is
rounded to 1 (instead of 2). I have a patch below that throws a hammer
at the problem and uses group weight instead of group capacity (this
is experimental, will refine it if it works). Can you please see if
that solves the problem?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 6d934e8..3fdd669 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2057,6 +2057,7 @@ struct sg_lb_stats {
        unsigned long sum_nr_running; /* Nr tasks running in the group */
        unsigned long sum_weighted_load; /* Weighted load of group's tasks */
        unsigned long group_capacity;
+       unsigned long group_weight;
        int group_imb; /* Is there an imbalance in the group ? */
 };

@@ -2458,6 +2459,8 @@ static inline void update_sg_lb_stats(struct
sched_domain *sd,
                DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
        if (!sgs->group_capacity)
                sgs->group_capacity = fix_small_capacity(sd, group);
+
+       sgs->group_weight = cpumask_weight(sched_group_cpus(group));
 }

 /**
@@ -2480,6 +2483,9 @@ static bool update_sd_pick_busiest(struct
sched_domain *sd,
        if (sgs->avg_load <= sds->max_load)
                return false;

+       if (sgs->sum_nr_running <= sgs->group_weight)
+               return false;
+
        if (sgs->sum_nr_running > sgs->group_capacity)
                return true;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-09-29 19:32       ` Nikhil Rao
@ 2010-10-04  3:08         ` Mike Galbraith
  2010-10-06  8:23           ` Nikhil Rao
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2010-10-04  3:08 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

Sorry for the late reply.  (fired up your patchlet bright and early so
it didn't rot in my inbox any longer;)

On Wed, 2010-09-29 at 12:32 -0700, Nikhil Rao wrote:
> On Tue, Sep 28, 2010 at 6:45 PM, Mike Galbraith <efault@gmx.de> wrote:
> > On Tue, 2010-09-28 at 14:15 -0700, Nikhil Rao wrote:
> >
> >> Thanks for running this. I've not been able to reproduce what you are
> >> seeing on the few test machines that I have (different combinations of
> >> MC, CPU and NODE domains). Can you please give me more info about
> >> your setup?
> >
> > It's a plain-jane Q6600 box, so has only MC and CPU domains.
> >
> > It doesn't necessarily _instantly_ "stick", can take a couple tries, or
> > a little time.
> 
> The closest I have is a quad-core dual-socket machine (MC, CPU
> domains). And I'm having trouble reproducing it on that machine as
> well :-( I ran 5 soaker threads (one of them niced to -15) for a few
> hours and didn't see the problem. Can you please give me some trace
> data & schedstats to work with?

Booting with isolcpus or offlining the excess should help.

> Looking at the patch/code, I suspect active migration on the CPU
> scheduling domain pushes the nice 0 task (running on the same socket
> as the nice -15 task) to the other socket. This leaves you with an
> idle core on the nice -15 socket, and with soaker threads there is no
> way to come back to a 100% utilized state. One possible explanation is
> the group capacity for a sched group in the CPU sched domain is
> rounded to 1 (instead of 2). I have a patch below that throws a hammer
> at the problem and uses group weight instead of group capacity (this
> is experimental, will refine it if it works). Can you please see if
> that solves the problem?

Nope, didn't help.  I'll poke at it, but am squabbling elsewhere atm.

	-Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-10-04  3:08         ` Mike Galbraith
@ 2010-10-06  8:23           ` Nikhil Rao
  2010-10-08  7:22             ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Nikhil Rao @ 2010-10-06  8:23 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Sun, Oct 3, 2010 at 8:08 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2010-09-29 at 12:32 -0700, Nikhil Rao wrote:
>> The closest I have is a quad-core dual-socket machine (MC, CPU
>> domains). And I'm having trouble reproducing it on that machine as
>> well :-( I ran 5 soaker threads (one of them niced to -15) for a few
>> hours and didn't see the problem. Can you please give me some trace
>> data & schedstats to work with?
>
> Booting with isolcpus or offlining the excess should help.
>

Sorry for the late reply. Booting with isolcpus did the trick, thanks.

... and now to dig into why this is happening.

-Thanks,
Nikhil

>> Looking at the patch/code, I suspect active migration on the CPU
>> scheduling domain pushes the nice 0 task (running on the same socket
>> as the nice -15 task) to the other socket. This leaves you with an
>> idle core on the nice -15 socket, and with soaker threads there is no
>> way to come back to a 100% utilized state. One possible explanation is
>> the group capacity for a sched group in the CPU sched domain is
>> rounded to 1 (instead of 2). I have a patch below that throws a hammer
>> at the problem and uses group weight instead of group capacity (this
>> is experimental, will refine it if it works). Can you please see if
>> that solves the problem?
>
> Nope, didn't help.  I'll poke at it, but am squabbling elsewhere atm.
>
>        -Mike
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-10-06  8:23           ` Nikhil Rao
@ 2010-10-08  7:22             ` Mike Galbraith
  2010-10-08 20:34               ` Nikhil Rao
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Galbraith @ 2010-10-08  7:22 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Wed, 2010-10-06 at 01:23 -0700, Nikhil Rao wrote:
> On Sun, Oct 3, 2010 at 8:08 PM, Mike Galbraith <efault@gmx.de> wrote:
> > On Wed, 2010-09-29 at 12:32 -0700, Nikhil Rao wrote:
> >> The closest I have is a quad-core dual-socket machine (MC, CPU
> >> domains). And I'm having trouble reproducing it on that machine as
> >> well :-( I ran 5 soaker threads (one of them niced to -15) for a few
> >> hours and didn't see the problem. Can you please give me some trace
> >> data & schedstats to work with?
> >
> > Booting with isolcpus or offlining the excess should help.
> >
> 
> Sorry for the late reply. Booting with isolcpus did the trick, thanks.
> 
> ... and now to dig into why this is happening.

I was poking it (again) yesterday, and it's kind of annoying.  I can't
call this behavior black/white broken.  It's freeing up a cache for a
very high priority task, which is kinda nice, but SMP nice is costing
25% of my box's processor power in this case too.  Hrmph.

	-Mike


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-10-08  7:22             ` Mike Galbraith
@ 2010-10-08 20:34               ` Nikhil Rao
  2010-10-10 10:15                 ` Mike Galbraith
  0 siblings, 1 reply; 15+ messages in thread
From: Nikhil Rao @ 2010-10-08 20:34 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Fri, Oct 8, 2010 at 12:22 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Wed, 2010-10-06 at 01:23 -0700, Nikhil Rao wrote:
>> On Sun, Oct 3, 2010 at 8:08 PM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Wed, 2010-09-29 at 12:32 -0700, Nikhil Rao wrote:
>> >> The closest I have is a quad-core dual-socket machine (MC, CPU
>> >> domains). And I'm having trouble reproducing it on that machine as
>> >> well :-( I ran 5 soaker threads (one of them niced to -15) for a few
>> >> hours and didn't see the problem. Can you please give me some trace
>> >> data & schedstats to work with?
>> >
>> > Booting with isolcpus or offlining the excess should help.
>> >
>>
>> Sorry for the late reply. Booting with isolcpus did the trick, thanks.
>>
>> ... and now to dig into why this is happening.
>
> I was poking it (again) yesterday, and it's kind of annoying.  I can't
> call this behavior black/white broken.  It's freeing up a cache for a
> very high priority task, which is kinda nice, but SMP nice is costing
> 25% of my box's processor power in this case too.  Hrmph.
>

I agree that freeing up the cache for the high priority task is a nice
side-effect of weight-based balancing. However, with sufficient number
of low weight tasks on the system, or with a small nudge to affinity
masks, the niced task will end up sharing cache with low weight tasks.
In that sense, I think this is a tad bit more black than white :-) It
would be nice to make the load balancer more cache aware, but that's
for a different RFC. :-)

Further, once a sched group reaches a certain "bad state", where the
niced task is the only task in a sched group with more than 1 cpu, it
does not recover from that state easily. This leads to the sub-optimal
utilization situation that we have been chasing down. In this
situation, even though the sched group has capacity, it does not pull
tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL.

A sched group reaches this state because either (i). a niced task is
pulled into an empty sched group, or (ii). all other tasks in the
sched group are pulled away from the group. The patches in this
patchset try to prevent the latter, i.e. prevent low weight tasks from
being pulled away from the sched group. However, there are still many
ways to end up in the bad state. From empirical evidence, it seems to
happen more probability on a machine with fewer cpus. I have verified
that with the appropriate test setup, this also happens on the
quad-socket, quad-core machines as well (i.e. set affinity of the
normal tasks to socket-0 and niced task to socket-1, and then reset
affinities).

I have attached a patch that tackles the problem in different way.
Instead of preventing the sched group from entering the bad state, it
shortcuts the checks in fbg if the group has extra capacity, where
extra capacity is defined as group_capacity > nr_running. The patch
exposes a sched feature called PREFER_UTILIZATION (disabled by
default). When this is enabled, f_b_g shortcuts the checks if the
local group has capacity. This actually works quite well. I tested
this on a quad-core dual-socket (with isolcpus) and waited for the
machine to enter the bad state. On flipping the sched feature,
utilization immediately shoots up to 100% (of non-isolated cores). I
have some data below.

This is very experimental and has not been tested beyond this case and
some basic load balance tests. If you see a better way to do this
please let me know.

w/ PREFER_UTILIZATION disabled

Cpu(s): 34.3% us,  0.2% sy,  0.0% ni, 65.1% id,  0.4% wa,  0.0% hi,  0.0% si
Mem:  16463308k total,   996368k used, 15466940k free,    12304k buffers
Swap:        0k total,        0k used,        0k free,   756244k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7651 root       5 -15  5876   84    0 R   98  0.0  37:35.97 lat
 7652 root      20   0  5876   84    0 R   49  0.0  19:49.02 lat
 7654 root      20   0  5876   84    0 R   49  0.0  20:48.93 lat
 7655 root      20   0  5876   84    0 R   49  0.0  19:25.74 lat
 7653 root      20   0  5876   84    0 R   47  0.0  20:02.16 lat

w/ PREFER_UTILIZATION enabled

Cpu(s): 52.3% us,  0.0% sy,  0.0% ni, 47.6% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:  16463308k total,  1002852k used, 15460456k free,    12304k buffers
Swap:        0k total,        0k used,        0k free,   756312k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7651 root       5 -15  5876   84    0 R  100  0.0  38:12.37 lat
 7655 root      20   0  5876   84    0 R   99  0.0  19:49.99 lat
 7652 root      20   0  5876   84    0 R   80  0.0  20:09.80 lat
 7653 root      20   0  5876   84    0 R   60  0.0  20:22.13 lat
 7654 root      20   0  5876   84    0 R   58  0.0  21:07.88 lat

---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 6d934e8..04e5553 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2030,12 +2030,14 @@ struct sd_lb_stats {
 	unsigned long this_load;
 	unsigned long this_load_per_task;
 	unsigned long this_nr_running;
+	unsigned long this_has_capacity;

 	/* Statistics of the busiest group */
 	unsigned long max_load;
 	unsigned long busiest_load_per_task;
 	unsigned long busiest_nr_running;
 	unsigned long busiest_group_capacity;
+	unsigned long busiest_has_capacity;

 	int group_imb; /* Is there imbalance in this sd */
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -2058,6 +2060,7 @@ struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long group_capacity;
 	int group_imb; /* Is there an imbalance in the group ? */
+	int group_has_capacity; /* Is there extra capacity in the group? */
 };

 /**
@@ -2458,6 +2461,9 @@ static inline void update_sg_lb_stats(struct
sched_domain *sd,
 		DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
 	if (!sgs->group_capacity)
 		sgs->group_capacity = fix_small_capacity(sd, group);
+
+	if (sgs->group_capacity > sgs->sum_nr_running)
+		sgs->group_has_capacity = 1;
 }

 /**
@@ -2556,12 +2562,14 @@ static inline void update_sd_lb_stats(struct
sched_domain *sd, int this_cpu,
 			sds->this = sg;
 			sds->this_nr_running = sgs.sum_nr_running;
 			sds->this_load_per_task = sgs.sum_weighted_load;
+			sds->this_has_capacity = sgs.group_has_capacity;
 		} else if (update_sd_pick_busiest(sd, sds, sg, &sgs, this_cpu)) {
 			sds->max_load = sgs.avg_load;
 			sds->busiest = sg;
 			sds->busiest_nr_running = sgs.sum_nr_running;
 			sds->busiest_group_capacity = sgs.group_capacity;
 			sds->busiest_load_per_task = sgs.sum_weighted_load;
+			sds->busiest_has_capacity = sgs.group_has_capacity;
 			sds->group_imb = sgs.group_imb;
 		}

@@ -2820,6 +2828,10 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
 	if (!sds.busiest || sds.busiest_nr_running == 0)
 		goto out_balanced;

+	if (sched_feat(PREFER_UTILIZATION) &&
+			sds.this_has_capacity && !sds.busiest_has_capacity)
+		goto force_balance;
+
 	if (sds.this_load >= sds.max_load)
 		goto out_balanced;

@@ -2831,6 +2843,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
 	if (100 * sds.max_load <= sd->imbalance_pct * sds.this_load)
 		goto out_balanced;

+force_balance:
 	/* Looks like there is an imbalance. Compute it */
 	calculate_imbalance(&sds, this_cpu, imbalance);
 	return sds.busiest;
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 83c66e8..9b93862 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -61,3 +61,9 @@ SCHED_FEAT(ASYM_EFF_LOAD, 1)
  * release the lock. Decreases scheduling overhead.
  */
 SCHED_FEAT(OWNER_SPIN, 1)
+
+/*
+ * Prefer utilization over fairness when balancing tasks with large weight
+ * differential.
+ */
+SCHED_FEAT(PREFER_UTILIZATION, 0)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential
  2010-10-08 20:34               ` Nikhil Rao
@ 2010-10-10 10:15                 ` Mike Galbraith
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Galbraith @ 2010-10-10 10:15 UTC (permalink / raw)
  To: Nikhil Rao; +Cc: Ingo Molnar, Peter Zijlstra, Venkatesh Pallipadi, linux-kernel

On Fri, 2010-10-08 at 13:34 -0700, Nikhil Rao wrote:

> I have attached a patch that tackles the problem in different way.
> Instead of preventing the sched group from entering the bad state, it
> shortcuts the checks in fbg if the group has extra capacity, where
> extra capacity is defined as group_capacity > nr_running. The patch
> exposes a sched feature called PREFER_UTILIZATION (disabled by
> default). When this is enabled, f_b_g shortcuts the checks if the
> local group has capacity. This actually works quite well.

Yeah, it does seem to work well.

I don't like the sched feature much though, a domain flag seems more
appropriate.  I bent your patch up a bit to correct utilization woes
during NEWIDLE balancing instead.. still seems to work fine.

---
 kernel/sched_fair.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

Index: linux-2.6.36.git/kernel/sched_fair.c
===================================================================
--- linux-2.6.36.git.orig/kernel/sched_fair.c
+++ linux-2.6.36.git/kernel/sched_fair.c
@@ -1764,6 +1764,10 @@ static void pull_task(struct rq *src_rq,
 	set_task_cpu(p, this_cpu);
 	activate_task(this_rq, p, 0);
 	check_preempt_curr(this_rq, p, 0);
+
+	/* re-arm NEWIDLE balancing when moving tasks */
+	src_rq->avg_idle = this_rq->avg_idle = 2*sysctl_sched_migration_cost;
+	this_rq->idle_stamp = 0;
 }
 
 /*
@@ -2030,12 +2034,14 @@ struct sd_lb_stats {
 	unsigned long this_load;
 	unsigned long this_load_per_task;
 	unsigned long this_nr_running;
+	unsigned long this_has_capacity;
 
 	/* Statistics of the busiest group */
 	unsigned long max_load;
 	unsigned long busiest_load_per_task;
 	unsigned long busiest_nr_running;
 	unsigned long busiest_group_capacity;
+	unsigned long busiest_has_capacity;
 
 	int group_imb; /* Is there imbalance in this sd */
 #if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
@@ -2058,6 +2064,7 @@ struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long group_capacity;
 	int group_imb; /* Is there an imbalance in the group ? */
+	int group_has_capacity; /* Is there extra capacity in the group? */
 };
 
 /**
@@ -2454,6 +2461,9 @@ static inline void update_sg_lb_stats(st
 		DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
 	if (!sgs->group_capacity)
 		sgs->group_capacity = fix_small_capacity(sd, group);
+
+	if (sgs->group_capacity > sgs->sum_nr_running)
+		sgs->group_has_capacity = 1;
 }
 
 /**
@@ -2552,12 +2562,14 @@ static inline void update_sd_lb_stats(st
 			sds->this = sg;
 			sds->this_nr_running = sgs.sum_nr_running;
 			sds->this_load_per_task = sgs.sum_weighted_load;
+			sds->this_has_capacity = sgs.group_has_capacity;
 		} else if (update_sd_pick_busiest(sd, sds, sg, &sgs, this_cpu)) {
 			sds->max_load = sgs.avg_load;
 			sds->busiest = sg;
 			sds->busiest_nr_running = sgs.sum_nr_running;
 			sds->busiest_group_capacity = sgs.group_capacity;
 			sds->busiest_load_per_task = sgs.sum_weighted_load;
+			sds->busiest_has_capacity = sgs.group_has_capacity;
 			sds->group_imb = sgs.group_imb;
 		}
 
@@ -2754,6 +2766,15 @@ static inline void calculate_imbalance(s
 		return fix_small_imbalance(sds, this_cpu, imbalance);
 
 }
+
+bool check_utilization(struct sd_lb_stats *sds)
+{
+	if (!sds->this_has_capacity || sds->busiest_has_capacity)
+		return false;
+
+	return true;
+}
+
 /******* find_busiest_group() helpers end here *********************/
 
 /**
@@ -2816,6 +2837,10 @@ find_busiest_group(struct sched_domain *
 	if (!sds.busiest || sds.busiest_nr_running == 0)
 		goto out_balanced;
 
+	/*  SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
+	if (idle == CPU_NEWLY_IDLE && check_utilization(&sds))
+		goto force_balance;
+
 	if (sds.this_load >= sds.max_load)
 		goto out_balanced;
 
@@ -2827,6 +2852,7 @@ find_busiest_group(struct sched_domain *
 	if (100 * sds.max_load <= sd->imbalance_pct * sds.this_load)
 		goto out_balanced;
 
+force_balance:
 	/* Looks like there is an imbalance. Compute it */
 	calculate_imbalance(&sds, this_cpu, imbalance);
 	return sds.busiest;
@@ -3153,10 +3179,8 @@ static void idle_balance(int this_cpu, s
 		interval = msecs_to_jiffies(sd->balance_interval);
 		if (time_after(next_balance, sd->last_balance + interval))
 			next_balance = sd->last_balance + interval;
-		if (pulled_task) {
-			this_rq->idle_stamp = 0;
+		if (pulled_task)
 			break;
-		}
 	}
 
 	raw_spin_lock(&this_rq->lock);



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-11 21:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-28  0:29 [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Nikhil Rao
2010-09-28  0:29 ` [PATCH 1/3] sched: set group_imb only a task can be pulled from the busiest cpu Nikhil Rao
2010-09-28  0:29 ` [PATCH 2/3] sched: drop group_capacity to 1 only if remote group has no running tasks Nikhil Rao
2010-09-28 23:04   ` Suresh Siddha
2010-10-11 21:20     ` Nikhil Rao
2010-09-28  0:29 ` [PATCH 3/3] sched: do not consider SCHED_IDLE tasks to be cache hot Nikhil Rao
2010-09-28 13:57 ` [PATCH 0/3][RFC] Improve load balancing when tasks have large weight differential Mike Galbraith
2010-09-28 21:15   ` Nikhil Rao
2010-09-29  1:45     ` Mike Galbraith
2010-09-29 19:32       ` Nikhil Rao
2010-10-04  3:08         ` Mike Galbraith
2010-10-06  8:23           ` Nikhil Rao
2010-10-08  7:22             ` Mike Galbraith
2010-10-08 20:34               ` Nikhil Rao
2010-10-10 10:15                 ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.