* [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group @ 2011-07-19 20:58 Terry Loftin 2011-07-19 21:17 ` Peter Zijlstra 0 siblings, 1 reply; 8+ messages in thread From: Terry Loftin @ 2011-07-19 20:58 UTC (permalink / raw) To: linux-kernel, Ingo Molnar, Peter Zijlstra; +Cc: Bob Montgomery Correct the protection expression in update_cpu_power() to avoid setting rq->cpu_power to zero. Signed-off-by: Terry Loftin <terry.loftin@hp.com> Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> --- diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 0c26e2d..9c50020 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) power *= scale_rt_power(cpu); power >>= SCHED_LOAD_SHIFT; - if (!power) + if ((u32)power == 0) power = 1; cpu_rq(cpu)->cpu_power = power; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-19 20:58 [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group Terry Loftin @ 2011-07-19 21:17 ` Peter Zijlstra 2011-07-19 22:20 ` Terry Loftin 2011-07-20 2:26 ` Mike Galbraith 0 siblings, 2 replies; 8+ messages in thread From: Peter Zijlstra @ 2011-07-19 21:17 UTC (permalink / raw) To: Terry Loftin; +Cc: linux-kernel, Ingo Molnar, Bob Montgomery On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > Correct the protection expression in update_cpu_power() to avoid setting > rq->cpu_power to zero. Firstly you fail to mention what kernel this is again, secondly this should never happen in the first place, so this fix is wrong. At best it papers over another bug. > Signed-off-by: Terry Loftin <terry.loftin@hp.com> > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> > --- > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 0c26e2d..9c50020 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > power *= scale_rt_power(cpu); > power >>= SCHED_LOAD_SHIFT; > > - if (!power) > + if ((u32)power == 0) > power = 1; > > cpu_rq(cpu)->cpu_power = power; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-19 21:17 ` Peter Zijlstra @ 2011-07-19 22:20 ` Terry Loftin 2011-07-19 22:30 ` Peter Zijlstra 2011-07-20 2:26 ` Mike Galbraith 1 sibling, 1 reply; 8+ messages in thread From: Terry Loftin @ 2011-07-19 22:20 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Bob Montgomery On 07/19/2011 03:17 PM, Peter Zijlstra wrote: > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: >> Correct the protection expression in update_cpu_power() to avoid setting >> rq->cpu_power to zero. > > Firstly you fail to mention what kernel this is again, secondly this > should never happen in the first place, so this fix is wrong. At best it > papers over another bug. My Apologies, this was found on kernel 2.6.32.32, but the all the related code is the same in v3.0-rc7. The patch is against v3.0-rc7. I've done some limited testing of this on 2.6.32.32 by modifying __cycles_2_ns() to add an offset to the TSC when it is read to simulate 208 days of uptime, but that kernel has only been running for a couple days. I also agree this should never happen. As the statement currently stands, it won't work - so it should either be corrected or removed. Here is the alternative patch: --- diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index 0c26e2d..f9c9a89 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -2549,9 +2549,6 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) power *= scale_rt_power(cpu); power >>= SCHED_LOAD_SHIFT; - if (!power) - power = 1; - cpu_rq(cpu)->cpu_power = power; sdg->cpu_power = power; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-19 22:20 ` Terry Loftin @ 2011-07-19 22:30 ` Peter Zijlstra 0 siblings, 0 replies; 8+ messages in thread From: Peter Zijlstra @ 2011-07-19 22:30 UTC (permalink / raw) To: Terry Loftin; +Cc: linux-kernel, Ingo Molnar, Bob Montgomery On Tue, 2011-07-19 at 16:20 -0600, Terry Loftin wrote: > On 07/19/2011 03:17 PM, Peter Zijlstra wrote: > > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > >> Correct the protection expression in update_cpu_power() to avoid setting > >> rq->cpu_power to zero. > > > > Firstly you fail to mention what kernel this is again, secondly this > > should never happen in the first place, so this fix is wrong. At best it > > papers over another bug. > > My Apologies, this was found on kernel 2.6.32.32, but the all > the related code is the same in v3.0-rc7. The patch is against > v3.0-rc7. I've done some limited testing of this on 2.6.32.32 > by modifying __cycles_2_ns() to add an offset to the TSC when > it is read to simulate 208 days of uptime, but that kernel has > only been running for a couple days. > > I also agree this should never happen. As the statement currently > stands, it won't work - so it should either be corrected or removed. > Here is the alternative patch: > > - if (!power) > - power = 1; IIRC it can actually end up being 0 if the scale factors are small enough, but what I couldn't see happening is how it can be > 2^32, which is required for your initial patch to make a difference. In that case the scale factors were _way_ out of bound, they're supposed to be [0,SCHED_POWER_SCALE] and since we divide by SCHED_POWER_SCALE after every factor the result should remain in that range. Now clearly you've found that going haywire, so we need to find where and why that happens and cure that. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-19 21:17 ` Peter Zijlstra 2011-07-19 22:20 ` Terry Loftin @ 2011-07-20 2:26 ` Mike Galbraith 2011-07-20 2:29 ` Peter Zijlstra 1 sibling, 1 reply; 8+ messages in thread From: Mike Galbraith @ 2011-07-20 2:26 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Terry Loftin, linux-kernel, Ingo Molnar, Bob Montgomery On Tue, 2011-07-19 at 23:17 +0200, Peter Zijlstra wrote: > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > > Correct the protection expression in update_cpu_power() to avoid setting > > rq->cpu_power to zero. > > Firstly you fail to mention what kernel this is again, secondly this > should never happen in the first place, so this fix is wrong. At best it > papers over another bug. > > > Signed-off-by: Terry Loftin <terry.loftin@hp.com> > > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> > > --- > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > > index 0c26e2d..9c50020 100644 > > --- a/kernel/sched_fair.c > > +++ b/kernel/sched_fair.c > > @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > > power *= scale_rt_power(cpu); > > power >>= SCHED_LOAD_SHIFT; > > > > - if (!power) > > + if ((u32)power == 0) > > power = 1; > > > > cpu_rq(cpu)->cpu_power = power; I put that (and a bunch more protection+warnings) in an enterprise kernel so it would not explode, but would gather some data. The entire world has been utterly silent, except for a gaggle of POWER7 boxen, which manage to convince scale_rt_power() to return negative values. Turning on PRINTK_TIME made these boxen go silent. A printk with timestamps, which doesn't happen, hides the problem. Tilt. -Mike ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-20 2:26 ` Mike Galbraith @ 2011-07-20 2:29 ` Peter Zijlstra 2011-07-20 3:32 ` Mike Galbraith 0 siblings, 1 reply; 8+ messages in thread From: Peter Zijlstra @ 2011-07-20 2:29 UTC (permalink / raw) To: Mike Galbraith; +Cc: Terry Loftin, linux-kernel, Ingo Molnar, Bob Montgomery On Wed, 2011-07-20 at 04:26 +0200, Mike Galbraith wrote: > On Tue, 2011-07-19 at 23:17 +0200, Peter Zijlstra wrote: > > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > > > Correct the protection expression in update_cpu_power() to avoid setting > > > rq->cpu_power to zero. > > > > Firstly you fail to mention what kernel this is again, secondly this > > should never happen in the first place, so this fix is wrong. At best it > > papers over another bug. > > > > > Signed-off-by: Terry Loftin <terry.loftin@hp.com> > > > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> > > > --- > > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > > > index 0c26e2d..9c50020 100644 > > > --- a/kernel/sched_fair.c > > > +++ b/kernel/sched_fair.c > > > @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > > > power *= scale_rt_power(cpu); > > > power >>= SCHED_LOAD_SHIFT; > > > > > > - if (!power) > > > + if ((u32)power == 0) > > > power = 1; > > > > > > cpu_rq(cpu)->cpu_power = power; > > I put that (and a bunch more protection+warnings) in an enterprise > kernel so it would not explode, but would gather some data. The entire > world has been utterly silent, except for a gaggle of POWER7 boxen, > which manage to convince scale_rt_power() to return negative values. > > Turning on PRINTK_TIME made these boxen go silent. A printk with > timestamps, which doesn't happen, hides the problem. Tilt. Did those kernels contain the scale_rt_power() hunk from commit aa483808516ca5cacfa0e5849691f64fec25828e? Venki thought that might cure sure woes, but since we never could reproduce... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-20 2:29 ` Peter Zijlstra @ 2011-07-20 3:32 ` Mike Galbraith 2011-09-01 17:16 ` Simon Kirby 0 siblings, 1 reply; 8+ messages in thread From: Mike Galbraith @ 2011-07-20 3:32 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Terry Loftin, linux-kernel, Ingo Molnar, Bob Montgomery On Wed, 2011-07-20 at 04:29 +0200, Peter Zijlstra wrote: > On Wed, 2011-07-20 at 04:26 +0200, Mike Galbraith wrote: > > On Tue, 2011-07-19 at 23:17 +0200, Peter Zijlstra wrote: > > > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > > > > Correct the protection expression in update_cpu_power() to avoid setting > > > > rq->cpu_power to zero. > > > > > > Firstly you fail to mention what kernel this is again, secondly this > > > should never happen in the first place, so this fix is wrong. At best it > > > papers over another bug. > > > > > > > Signed-off-by: Terry Loftin <terry.loftin@hp.com> > > > > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> > > > > --- > > > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > > > > index 0c26e2d..9c50020 100644 > > > > --- a/kernel/sched_fair.c > > > > +++ b/kernel/sched_fair.c > > > > @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > > > > power *= scale_rt_power(cpu); > > > > power >>= SCHED_LOAD_SHIFT; > > > > > > > > - if (!power) > > > > + if ((u32)power == 0) > > > > power = 1; > > > > > > > > cpu_rq(cpu)->cpu_power = power; > > > > I put that (and a bunch more protection+warnings) in an enterprise > > kernel so it would not explode, but would gather some data. The entire > > world has been utterly silent, except for a gaggle of POWER7 boxen, > > which manage to convince scale_rt_power() to return negative values. > > > > Turning on PRINTK_TIME made these boxen go silent. A printk with > > timestamps, which doesn't happen, hides the problem. Tilt. > > Did those kernels contain the scale_rt_power() hunk from commit > aa483808516ca5cacfa0e5849691f64fec25828e? Venki thought that might cure > sure woes, but since we never could reproduce... Yeah, that commit is present. -Mike ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group 2011-07-20 3:32 ` Mike Galbraith @ 2011-09-01 17:16 ` Simon Kirby 0 siblings, 0 replies; 8+ messages in thread From: Simon Kirby @ 2011-09-01 17:16 UTC (permalink / raw) To: Mike Galbraith, Peter Zijlstra Cc: Terry Loftin, linux-kernel, Ingo Molnar, Bob Montgomery On Wed, Jul 20, 2011 at 05:32:08AM +0200, Mike Galbraith wrote: > On Wed, 2011-07-20 at 04:29 +0200, Peter Zijlstra wrote: > > On Wed, 2011-07-20 at 04:26 +0200, Mike Galbraith wrote: > > > On Tue, 2011-07-19 at 23:17 +0200, Peter Zijlstra wrote: > > > > On Tue, 2011-07-19 at 14:58 -0600, Terry Loftin wrote: > > > > > Correct the protection expression in update_cpu_power() to avoid setting > > > > > rq->cpu_power to zero. > > > > > > > > Firstly you fail to mention what kernel this is again, secondly this > > > > should never happen in the first place, so this fix is wrong. At best it > > > > papers over another bug. > > > > > > > > > Signed-off-by: Terry Loftin <terry.loftin@hp.com> > > > > > Signed-off-by: Bob Montgomery <bob.montgomery@hp.com> > > > > > --- > > > > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > > > > > index 0c26e2d..9c50020 100644 > > > > > --- a/kernel/sched_fair.c > > > > > +++ b/kernel/sched_fair.c > > > > > @@ -2549,7 +2549,7 @@ static void update_cpu_power(struct sched_domain *sd, int cpu) > > > > > power *= scale_rt_power(cpu); > > > > > power >>= SCHED_LOAD_SHIFT; > > > > > > > > > > - if (!power) > > > > > + if ((u32)power == 0) > > > > > power = 1; > > > > > > > > > > cpu_rq(cpu)->cpu_power = power; > > > > > > I put that (and a bunch more protection+warnings) in an enterprise > > > kernel so it would not explode, but would gather some data. The entire > > > world has been utterly silent, except for a gaggle of POWER7 boxen, > > > which manage to convince scale_rt_power() to return negative values. > > > > > > Turning on PRINTK_TIME made these boxen go silent. A printk with > > > timestamps, which doesn't happen, hides the problem. Tilt. > > > > Did those kernels contain the scale_rt_power() hunk from commit > > aa483808516ca5cacfa0e5849691f64fec25828e? Venki thought that might cure > > sure woes, but since we never could reproduce... > > Yeah, that commit is present. We just hit what seems to be this bug on a box running 2.6.36 since around the time it was built (Nov 8, 2010). It's a 16 core box (dual quad with HT) and runs all sorts of stuff all day long, and suddenly hit this divide error. Commit aa483808516ca5cacfa0e5849691f64fec25828e is present. find_busiest_group() seems to have been inlined into load_balance(): 0xffffffff8104d55f <+1151>: jne 0xffffffff8104d568 <load_balance+1160> 0xffffffff8104d561 <+1153>: mov $0x1,%cl 0xffffffff8104d563 <+1155>: mov $0x1,%esi 0xffffffff8104d568 <+1160>: movslq -0x16c(%rbp),%rdx 0xffffffff8104d56f <+1167>: mov $0x14bc0,%rax 0xffffffff8104d576 <+1174>: mov -0x7e601720(,%rdx,8),%rdx 0xffffffff8104d57e <+1182>: mov %rcx,0x7e0(%rax,%rdx,1) 0xffffffff8104d586 <+1190>: mov %esi,0x8(%r8) 0xffffffff8104d58a <+1194>: nopw 0x0(%rax,%rax,1) 0xffffffff8104d590 <+1200>: mov -0x138(%rbp),%rcx 0xffffffff8104d597 <+1207>: mov -0x68(%rbp),%rsi 0xffffffff8104d59b <+1211>: xor %edx,%edx 0xffffffff8104d59d <+1213>: mov 0x8(%rcx),%edi 0xffffffff8104d5a0 <+1216>: mov %rsi,%rax 0xffffffff8104d5a3 <+1219>: mov -0x60(%rbp),%rcx 0xffffffff8104d5a7 <+1223>: shl $0xa,%rax 0xffffffff8104d5ab <+1227>: div %rdi <-------------- 0xffffffff8104d5ae <+1230>: mov %rax,-0x70(%rbp) 0xffffffff8104d5b2 <+1234>: xor %eax,%eax 0xffffffff8104d5b4 <+1236>: test %rcx,%rcx 0xffffffff8104d5b7 <+1239>: je 0xffffffff8104d5c5 <load_balance+1253> rax, rdx, and rdi were 0 here. Fuzzy picture available on request. Other than some indirection, I don't see any changes in this area that would fix this bug since 2.6.36, either. Perhaps the !power test in update_cpu_power() should be copied to update_group_power()? This still seems like papering over another issue, though... Perhaps this might discover something: diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index bc8ee99..b31cd3d 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -2682,6 +2682,7 @@ static void update_group_power(struct sched_domain *sd, int cpu) } while (group != child->groups); sdg->sgp->power = power; + BUG_ON(!power); } /* Simon- ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-09-01 17:37 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-07-19 20:58 [PATCH 1/2] sched: Fix "divide error: 0000" in find_busiest_group Terry Loftin 2011-07-19 21:17 ` Peter Zijlstra 2011-07-19 22:20 ` Terry Loftin 2011-07-19 22:30 ` Peter Zijlstra 2011-07-20 2:26 ` Mike Galbraith 2011-07-20 2:29 ` Peter Zijlstra 2011-07-20 3:32 ` Mike Galbraith 2011-09-01 17:16 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox