public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case
@ 2013-09-15 13:49 Vladimir Davydov
  2013-09-15 13:49 ` [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case Vladimir Davydov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Vladimir Davydov @ 2013-09-15 13:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Paul Turner, linux-kernel, devel

In busiest->group_imb case we can come to calculate_imbalance() with
local->avg_load >= busiest->avg_load >= sds->avg_load. This can result
in imbalance overflow, because it is calculated as follows

env->imbalance = min(
	max_pull * busiest->group_power,
	(sds->avg_load - local->avg_load) * local->group_power
) / SCHED_POWER_SCALE;

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix this by skipping the assignment and assuming imbalance=0 in case
local->avg_load > sds->avg_load.
--
The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
belonging to different cores on an HT-enabled machine with N logical
cpus: just look at se.nr_migrations growth.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 kernel/sched/fair.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b3fe1c..507a8a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4896,7 +4896,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * max load less than avg load(as we skip the groups at or below
 	 * its cpu_power, while calculating max_load..)
 	 */
-	if (busiest->avg_load < sds->avg_load) {
+	if (busiest->avg_load <= sds->avg_load ||
+	    local->avg_load >= sds->avg_load) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case
  2013-09-15 13:49 [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Vladimir Davydov
@ 2013-09-15 13:49 ` Vladimir Davydov
  2013-09-20 13:46   ` [tip:sched/core] sched/balancing: Fix 'local->avg_load > busiest- >avg_load' case in fix_small_imbalance() tip-bot for Vladimir Davydov
  2013-09-16  5:52 ` [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Peter Zijlstra
  2013-09-20 13:46 ` [tip:sched/core] sched/balancing: Fix 'local->avg_load > sds-> avg_load' case in calculate_imbalance() tip-bot for Vladimir Davydov
  2 siblings, 1 reply; 7+ messages in thread
From: Vladimir Davydov @ 2013-09-15 13:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: Paul Turner, linux-kernel, devel

In busiest->group_imb case we can come to fix_small_imbalance() with
local->avg_load > busiest->avg_load. This can result in wrong imbalance
fix-up, because there is the following check there where all the
members are unsigned:

if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
    (scaled_busy_load_per_task * imbn)) {
	env->imbalance = busiest->load_per_task;
	return;
}

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix it by substituting the subtraction with an equivalent addition in
the check.
--
The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
belonging to different cores on an HT-enabled machine with N logical
cpus: just look at se.nr_migrations growth.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
---
 kernel/sched/fair.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 507a8a9..bdaf1fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4823,8 +4823,8 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 		(busiest->load_per_task * SCHED_POWER_SCALE) /
 		busiest->group_power;
 
-	if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
-	    (scaled_busy_load_per_task * imbn)) {
+	if (busiest->avg_load + scaled_busy_load_per_task >=
+	    local->avg_load + (scaled_busy_load_per_task * imbn)) {
 		env->imbalance = busiest->load_per_task;
 		return;
 	}
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case
  2013-09-15 13:49 [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Vladimir Davydov
  2013-09-15 13:49 ` [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case Vladimir Davydov
@ 2013-09-16  5:52 ` Peter Zijlstra
  2013-09-16  8:06   ` Vladimir Davydov
  2013-09-20 13:46 ` [tip:sched/core] sched/balancing: Fix 'local->avg_load > sds-> avg_load' case in calculate_imbalance() tip-bot for Vladimir Davydov
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2013-09-16  5:52 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Ingo Molnar, Paul Turner, linux-kernel, devel

On Sun, Sep 15, 2013 at 05:49:13PM +0400, Vladimir Davydov wrote:
> In busiest->group_imb case we can come to calculate_imbalance() with
> local->avg_load >= busiest->avg_load >= sds->avg_load. This can result
> in imbalance overflow, because it is calculated as follows
> 
> env->imbalance = min(
> 	max_pull * busiest->group_power,
> 	(sds->avg_load - local->avg_load) * local->group_power
> ) / SCHED_POWER_SCALE;
> 
> As a result we can end up constantly bouncing tasks from one cpu to
> another if there are pinned tasks.
> 
> Fix this by skipping the assignment and assuming imbalance=0 in case
> local->avg_load > sds->avg_load.
> --
> The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
> belonging to different cores on an HT-enabled machine with N logical
> cpus: just look at se.nr_migrations growth.
> 
> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
> ---
>  kernel/sched/fair.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b3fe1c..507a8a9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4896,7 +4896,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  	 * max load less than avg load(as we skip the groups at or below
>  	 * its cpu_power, while calculating max_load..)
>  	 */
> -	if (busiest->avg_load < sds->avg_load) {
> +	if (busiest->avg_load <= sds->avg_load ||
> +	    local->avg_load >= sds->avg_load) {
>  		env->imbalance = 0;
>  		return fix_small_imbalance(env, sds);
>  	}

Why the = part? Surely 'busiest->avg_load < sds->avg_load ||
local->avg_load > sds->avg_load' avoids both underflows?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case
  2013-09-16  5:52 ` [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Peter Zijlstra
@ 2013-09-16  8:06   ` Vladimir Davydov
  2013-09-16  8:11     ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Vladimir Davydov @ 2013-09-16  8:06 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Paul Turner, linux-kernel, devel

On 09/16/2013 09:52 AM, Peter Zijlstra wrote:
> On Sun, Sep 15, 2013 at 05:49:13PM +0400, Vladimir Davydov wrote:
>> In busiest->group_imb case we can come to calculate_imbalance() with
>> local->avg_load >= busiest->avg_load >= sds->avg_load. This can result
>> in imbalance overflow, because it is calculated as follows
>>
>> env->imbalance = min(
>> 	max_pull * busiest->group_power,
>> 	(sds->avg_load - local->avg_load) * local->group_power
>> ) / SCHED_POWER_SCALE;
>>
>> As a result we can end up constantly bouncing tasks from one cpu to
>> another if there are pinned tasks.
>>
>> Fix this by skipping the assignment and assuming imbalance=0 in case
>> local->avg_load > sds->avg_load.
>> --
>> The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
>> belonging to different cores on an HT-enabled machine with N logical
>> cpus: just look at se.nr_migrations growth.
>>
>> Signed-off-by: Vladimir Davydov<vdavydov@parallels.com>
>> ---
>>   kernel/sched/fair.c |    3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9b3fe1c..507a8a9 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4896,7 +4896,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>>   	 * max load less than avg load(as we skip the groups at or below
>>   	 * its cpu_power, while calculating max_load..)
>>   	 */
>> -	if (busiest->avg_load < sds->avg_load) {
>> +	if (busiest->avg_load <= sds->avg_load ||
>> +	    local->avg_load >= sds->avg_load) {
>>   		env->imbalance = 0;
>>   		return fix_small_imbalance(env, sds);
>>   	}
> Why the = part? Surely 'busiest->avg_load < sds->avg_load ||
> local->avg_load > sds->avg_load' avoids both underflows?

Of course it does, but env->imbalance will be assigned to 0 anyway in = 
case, so why not go shortcut?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case
  2013-09-16  8:06   ` Vladimir Davydov
@ 2013-09-16  8:11     ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2013-09-16  8:11 UTC (permalink / raw)
  To: Vladimir Davydov; +Cc: Ingo Molnar, Paul Turner, linux-kernel, devel

On Mon, Sep 16, 2013 at 12:06:08PM +0400, Vladimir Davydov wrote:
> On 09/16/2013 09:52 AM, Peter Zijlstra wrote:
> >On Sun, Sep 15, 2013 at 05:49:13PM +0400, Vladimir Davydov wrote:

> >>-	if (busiest->avg_load < sds->avg_load) {
> >>+	if (busiest->avg_load <= sds->avg_load ||
> >>+	    local->avg_load >= sds->avg_load) {
> >>  		env->imbalance = 0;
> >>  		return fix_small_imbalance(env, sds);
> >>  	}
> >Why the = part? Surely 'busiest->avg_load < sds->avg_load ||
> >local->avg_load > sds->avg_load' avoids both underflows?
> 
> Of course it does, but env->imbalance will be assigned to 0 anyway in =
> case, so why not go shortcut?

D'uh right.. hadn't had my morning juice yet.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/balancing: Fix 'local->avg_load > sds-> avg_load' case in calculate_imbalance()
  2013-09-15 13:49 [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Vladimir Davydov
  2013-09-15 13:49 ` [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case Vladimir Davydov
  2013-09-16  5:52 ` [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Peter Zijlstra
@ 2013-09-20 13:46 ` tip-bot for Vladimir Davydov
  2 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Vladimir Davydov @ 2013-09-20 13:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, vdavydov

Commit-ID:  b18855500fc40da050512d9df82d2f1471e59642
Gitweb:     http://git.kernel.org/tip/b18855500fc40da050512d9df82d2f1471e59642
Author:     Vladimir Davydov <vdavydov@parallels.com>
AuthorDate: Sun, 15 Sep 2013 17:49:13 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 20 Sep 2013 11:59:36 +0200

sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()

In busiest->group_imb case we can come to calculate_imbalance() with
local->avg_load >= busiest->avg_load >= sds->avg_load. This can result
in imbalance overflow, because it is calculated as follows

env->imbalance = min(
	max_pull * busiest->group_power,
	(sds->avg_load - local->avg_load) * local->group_power) / SCHED_POWER_SCALE;

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix this by skipping the assignment and assuming imbalance=0 in case
local->avg_load > sds->avg_load.

[ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
  belonging to different cores on an HT-enabled machine with N logical
  cpus: just look at se.nr_migrations growth. ]

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/8f596cc6bc0e5e655119dc892c9bfcad26e971f4.1379252740.git.vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 11cd136..0b99aae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4896,7 +4896,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	 * max load less than avg load(as we skip the groups at or below
 	 * its cpu_power, while calculating max_load..)
 	 */
-	if (busiest->avg_load < sds->avg_load) {
+	if (busiest->avg_load <= sds->avg_load ||
+	    local->avg_load >= sds->avg_load) {
 		env->imbalance = 0;
 		return fix_small_imbalance(env, sds);
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [tip:sched/core] sched/balancing: Fix 'local->avg_load > busiest- >avg_load' case in fix_small_imbalance()
  2013-09-15 13:49 ` [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case Vladimir Davydov
@ 2013-09-20 13:46   ` tip-bot for Vladimir Davydov
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Vladimir Davydov @ 2013-09-20 13:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, peterz, tglx, vdavydov

Commit-ID:  3029ede39373c368f402a76896600d85a4f7121b
Gitweb:     http://git.kernel.org/tip/3029ede39373c368f402a76896600d85a4f7121b
Author:     Vladimir Davydov <vdavydov@parallels.com>
AuthorDate: Sun, 15 Sep 2013 17:49:14 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 20 Sep 2013 11:59:38 +0200

sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in fix_small_imbalance()

In busiest->group_imb case we can come to fix_small_imbalance() with
local->avg_load > busiest->avg_load. This can result in wrong imbalance
fix-up, because there is the following check there where all the
members are unsigned:

if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
    (scaled_busy_load_per_task * imbn)) {
	env->imbalance = busiest->load_per_task;
	return;
}

As a result we can end up constantly bouncing tasks from one cpu to
another if there are pinned tasks.

Fix it by substituting the subtraction with an equivalent addition in
the check.

[ The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
  belonging to different cores on an HT-enabled machine with N logical
  cpus: just look at se.nr_migrations growth. ]

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/ef167822e5c5b2d96cf5b0e3e4f4bdff3f0414a2.1379252740.git.vdavydov@parallels.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0b99aae..2aedacc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4823,8 +4823,8 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
 		(busiest->load_per_task * SCHED_POWER_SCALE) /
 		busiest->group_power;
 
-	if (busiest->avg_load - local->avg_load + scaled_busy_load_per_task >=
-	    (scaled_busy_load_per_task * imbn)) {
+	if (busiest->avg_load + scaled_busy_load_per_task >=
+	    local->avg_load + (scaled_busy_load_per_task * imbn)) {
 		env->imbalance = busiest->load_per_task;
 		return;
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-09-20 13:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-15 13:49 [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Vladimir Davydov
2013-09-15 13:49 ` [PATCH 2/2] sched: fix_small_imbalance: Fix local->avg_load > busiest->avg_load case Vladimir Davydov
2013-09-20 13:46   ` [tip:sched/core] sched/balancing: Fix 'local->avg_load > busiest- >avg_load' case in fix_small_imbalance() tip-bot for Vladimir Davydov
2013-09-16  5:52 ` [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case Peter Zijlstra
2013-09-16  8:06   ` Vladimir Davydov
2013-09-16  8:11     ` Peter Zijlstra
2013-09-20 13:46 ` [tip:sched/core] sched/balancing: Fix 'local->avg_load > sds-> avg_load' case in calculate_imbalance() tip-bot for Vladimir Davydov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox