[PATCH] sched: fair group: fix divide by zero

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched: fair group: fix divide by zero
@ 2008-06-11  7:12 Lai Jiangshan
  2008-06-11  7:15 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Lai Jiangshan @ 2008-06-11  7:12 UTC (permalink / raw)
  To: mingo; +Cc: peterz, Linux Kernel Mailing List

I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64)
(use 2^32, 2^33, ...., 2^63 as shares value)

# mkdir /dev/cpuctl
# mount -t cgroup -o cpu cpuctl /dev/cpuctl
# cd /dev/cpuctl
# mkdir sub
# echo 0x8000000000000000 > sub/cpu.shares
# echo $$ > sub/tasks
oops here! divide by zero.

This is because do_div() expects the 2th parameter to be 32 bits,
but unsigned long is 64 bits in x86_64.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 08ae848..d3005b4 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -368,7 +368,7 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		cfs_rq = cfs_rq_of(se);
 
 		slice *= se->load.weight;
-		do_div(slice, cfs_rq->load.weight);
+		slice = div64_u64(slice, cfs_rq->load.weight);
 	}
 
 
@@ -399,7 +399,7 @@ static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
 			weight += se->load.weight;
 
 		vslice *= NICE_0_LOAD;
-		do_div(vslice, weight);
+		vslice = div64_u64(vslice, weight);
 	}
 
 	return vslice;



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched: fair group: fix divide by zero
  2008-06-11  7:12 [PATCH] sched: fair group: fix divide by zero Lai Jiangshan
@ 2008-06-11  7:15 ` Peter Zijlstra
  2008-06-12  8:42   ` [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero) Lai Jiangshan
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2008-06-11  7:15 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: mingo, Linux Kernel Mailing List

On Wed, 2008-06-11 at 15:12 +0800, Lai Jiangshan wrote:
> I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64)
> (use 2^32, 2^33, ...., 2^63 as shares value)

I think the same thing to do is limit the shares value to something
smaller instead of using an even more expensive divide.

> # mkdir /dev/cpuctl
> # mount -t cgroup -o cpu cpuctl /dev/cpuctl
> # cd /dev/cpuctl
> # mkdir sub
> # echo 0x8000000000000000 > sub/cpu.shares
> # echo $$ > sub/tasks
> oops here! divide by zero.
> 
> This is because do_div() expects the 2th parameter to be 32 bits,
> but unsigned long is 64 bits in x86_64.
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 08ae848..d3005b4 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -368,7 +368,7 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  		cfs_rq = cfs_rq_of(se);
>  
>  		slice *= se->load.weight;
> -		do_div(slice, cfs_rq->load.weight);
> +		slice = div64_u64(slice, cfs_rq->load.weight);
>  	}
>  
> 
> @@ -399,7 +399,7 @@ static u64 sched_vslice_add(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  			weight += se->load.weight;
>  
>  		vslice *= NICE_0_LOAD;
> -		do_div(vslice, weight);
> +		vslice = div64_u64(vslice, weight);
>  	}
>  
>  	return vslice;
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero)
  2008-06-11  7:15 ` Peter Zijlstra
@ 2008-06-12  8:42   ` Lai Jiangshan
  2008-06-12  8:49     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Lai Jiangshan @ 2008-06-12  8:42 UTC (permalink / raw)
  To: Peter Zijlstra, mingo; +Cc: Linux Kernel Mailing List

Peter Zijlstra wrote:
> 
> I think the same thing to do is limit the shares value to something
> smaller instead of using an even more expensive divide.
> 

yes, you are right!

I found another bug about "the shares value is too large":

pid1 and pid2 are set affinity to cpu#0
pid1 is attached to cg1 and pid2 is attached to cg2

if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000
then pid2 got 100% usage of cpu, and pid1 0%

if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000
then pid2 got 0% usage of cpu, and pid1 100%


And a weight of a cfs_rq is the sum of weights of which entities
are queued on this cfs_rq, so the shares value should be limited
to a smaller value.

I think that (1UL << 18) is a good limited value:
1)it's not too large, we can create a lot of group before overflow
2)it's several times the weight value for nice=-19 (not too small)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/kernel/sched.c b/kernel/sched.c
index bfb8ad8..fe1b6c7 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -312,12 +312,15 @@ static DEFINE_SPINLOCK(task_group_lock);
 #endif
 
 /*
- * A weight of 0, 1 or ULONG_MAX can cause arithmetics problems.
+ * A weight of 0 or 1 can cause arithmetics problems.
+ * A weight of a cfs_rq is the sum of weights of which entities
+ * are queued on this cfs_rq, so a weight of a entity should not be
+ * too large, so as the shares value of a task group.
  * (The default weight is 1024 - so there's no practical
  *  limitation from this.)
  */
 #define MIN_SHARES	2
-#define MAX_SHARES	(ULONG_MAX - 1)
+#define MAX_SHARES	(1UL << 18)
 
 static int init_task_group_load = INIT_TASK_GROUP_LOAD;
 #endif





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero)
  2008-06-12  8:42   ` [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero) Lai Jiangshan
@ 2008-06-12  8:49     ` Peter Zijlstra
  2008-06-12 12:22       ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2008-06-12  8:49 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: mingo, Linux Kernel Mailing List

On Thu, 2008-06-12 at 16:42 +0800, Lai Jiangshan wrote:
> Peter Zijlstra wrote:
> > 
> > I think the same thing to do is limit the shares value to something
> > smaller instead of using an even more expensive divide.
> > 
> 
> yes, you are right!
> 
> I found another bug about "the shares value is too large":
> 
> pid1 and pid2 are set affinity to cpu#0
> pid1 is attached to cg1 and pid2 is attached to cg2
> 
> if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000
> then pid2 got 100% usage of cpu, and pid1 0%
> 
> if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000
> then pid2 got 0% usage of cpu, and pid1 100%
> 
> 
> And a weight of a cfs_rq is the sum of weights of which entities
> are queued on this cfs_rq, so the shares value should be limited
> to a smaller value.

Yeah, a lot of stuff will fall apart when weights grow too large, I
think somewhere the load balance code also assumes weight * NICE_0_LOAD
always fits in a long. So that immediately limits us to 32-10=22 bits
(on 32bit systems).

There might be other funnies..

> I think that (1UL << 18) is a good limited value:
> 1)it's not too large, we can create a lot of group before overflow
> 2)it's several times the weight value for nice=-19 (not too small)

Thanks!

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/kernel/sched.c b/kernel/sched.c
> index bfb8ad8..fe1b6c7 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -312,12 +312,15 @@ static DEFINE_SPINLOCK(task_group_lock);
>  #endif
>  
>  /*
> - * A weight of 0, 1 or ULONG_MAX can cause arithmetics problems.
> + * A weight of 0 or 1 can cause arithmetics problems.
> + * A weight of a cfs_rq is the sum of weights of which entities
> + * are queued on this cfs_rq, so a weight of a entity should not be
> + * too large, so as the shares value of a task group.
>   * (The default weight is 1024 - so there's no practical
>   *  limitation from this.)
>   */
>  #define MIN_SHARES	2
> -#define MAX_SHARES	(ULONG_MAX - 1)
> +#define MAX_SHARES	(1UL << 18)
>  
>  static int init_task_group_load = INIT_TASK_GROUP_LOAD;
>  #endif
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero)
  2008-06-12  8:49     ` Peter Zijlstra
@ 2008-06-12 12:22       ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2008-06-12 12:22 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Lai Jiangshan, Linux Kernel Mailing List


* Peter Zijlstra <peterz@infradead.org> wrote:

> > I think that (1UL << 18) is a good limited value:
> > 1)it's not too large, we can create a lot of group before overflow
> > 2)it's several times the weight value for nice=-19 (not too small)
> 
> Thanks!
> 
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

applied to tip/sched, thanks everyone.

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-06-12 12:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-11  7:12 [PATCH] sched: fair group: fix divide by zero Lai Jiangshan
2008-06-11  7:15 ` Peter Zijlstra
2008-06-12  8:42   ` [PATCH 1/2] sched: fair group: fix overflow(was: fix divide by zero) Lai Jiangshan
2008-06-12  8:49     ` Peter Zijlstra
2008-06-12 12:22       ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.