* [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
@ 2010-10-20 7:20 Andrew Dickinson
2010-10-20 11:17 ` Peter Zijlstra
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Dickinson @ 2010-10-20 7:20 UTC (permalink / raw)
To: linux-kernel
This is a patch to fix the corner case where we're crashing with
divide_error in find_busiest_group (see
https://bugzilla.kernel.org/show_bug.cgi?id=16991).
I don't fully understand what the case is that causes sds.total_pwr to
be zero in find_busiest_group, but this patch guards against the
divide-by-zero bug.
I also added safe-guarding around other routines in the scheduler code
where we're dividing by power; that's more of a just-in-case and I'm
definitely open for debate on that.
diff -ruwp a/kernel/sched_fair.c b/kernel/sched_fair.c
--- a/kernel/sched_fair.c 2010-10-19 23:47:51.000000000 -0700
+++ b/kernel/sched_fair.c 2010-10-20 00:08:17.000000000 -0700
@@ -1344,7 +1344,9 @@ find_idlest_group(struct sched_domain *s
}
/* Adjust by relative CPU power of the group */
- avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ avg_load = (avg_load * SCHED_LOAD_SCALE);
+ if (group->cpu_power)
+ avg_load /= group->cpu_power;
if (local_group) {
this_load = avg_load;
@@ -2409,7 +2411,9 @@ static inline void update_sg_lb_stats(st
update_group_power(sd, this_cpu);
/* Adjust by relative CPU power of the group */
- sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
+ sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE);
+ if (group->cpu_power)
+ sgs->avg_load /= group->cpu_power;
/*
* Consider the group unbalanced when the imbalance is larger
@@ -2692,7 +2696,7 @@ find_busiest_group(struct sched_domain *
if (!(*balance))
goto ret;
- if (!sds.busiest || sds.busiest_nr_running == 0)
+ if (!sds.busiest || sds.busiest_nr_running == 0 || sds.total_pwr == 0)
goto out_balanced;
if (sds.this_load >= sds.max_load)
@@ -2757,7 +2761,9 @@ find_busiest_queue(struct sched_group *g
* the load can be moved away from the cpu that is potentially
* running at a lower capacity.
*/
- wl = (wl * SCHED_LOAD_SCALE) / power;
+ wl = (wl * SCHED_LOAD_SCALE);
+ if (power)
+ wl /= power;
if (wl > max_load) {
max_load = wl;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
2010-10-20 7:20 [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 Andrew Dickinson
@ 2010-10-20 11:17 ` Peter Zijlstra
2010-11-01 17:31 ` Andrew Dickinson
0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2010-10-20 11:17 UTC (permalink / raw)
To: Andrew Dickinson; +Cc: linux-kernel, Ingo Molnar
On Wed, 2010-10-20 at 00:20 -0700, Andrew Dickinson wrote:
> This is a patch to fix the corner case where we're crashing with
> divide_error in find_busiest_group (see
> https://bugzilla.kernel.org/show_bug.cgi?id=16991).
> I don't fully understand what the case is that causes sds.total_pwr to
> be zero in find_busiest_group, but this patch guards against the
> divide-by-zero bug.
>
> I also added safe-guarding around other routines in the scheduler code
> where we're dividing by power; that's more of a just-in-case and I'm
> definitely open for debate on that.
No.. papering over crap like this is not done. In that BZ there's a
number of suggestions of how/where to track down the actual root cause,
but apparently nobody is interested in doing that.
(I can't reproduce so I can't actually do anything about it).
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
2010-10-20 11:17 ` Peter Zijlstra
@ 2010-11-01 17:31 ` Andrew Dickinson
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Dickinson @ 2010-11-01 17:31 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar
Peter,
I agree that getting to root-cause is important, but this is still an
unchecked exception. Is your concern about "papering over" due to the
fact that this patch doesn't emit an error message/increment a
counter/etc? I think that there's some middle ground here. One
wouldn't blindly assume that malloc() returned non-zero, right?
Similarly, if dividing, one should check that the denominator is not
zero. :D
Regarding reproducing this bug. All of the evidence that I've seen
(both in the BZ reports and my own experience) suggest that this
happens only after 6+ months of uptime on heavily loaded systems. In
my case, it happened across a fleet of 60+ hosts within a 1-2 week
time-frame; each host is passing an average of 500kpps continuously
during this time-frame. All of them previously had an uptime of
approximately 7 months.
Is there a middle ground here where we can handle the exception safely
and emit a message to help get more debugging information to try to
track this down? The BZ report does have a recommended patch to emit
some WARN_ON messages, I'd be happy to include that in this patch as
well. Would that help?
-A
On Wed, Oct 20, 2010 at 4:17 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-10-20 at 00:20 -0700, Andrew Dickinson wrote:
>> This is a patch to fix the corner case where we're crashing with
>> divide_error in find_busiest_group (see
>> https://bugzilla.kernel.org/show_bug.cgi?id=16991).
>> I don't fully understand what the case is that causes sds.total_pwr to
>> be zero in find_busiest_group, but this patch guards against the
>> divide-by-zero bug.
>>
>> I also added safe-guarding around other routines in the scheduler code
>> where we're dividing by power; that's more of a just-in-case and I'm
>> definitely open for debate on that.
>
> No.. papering over crap like this is not done. In that BZ there's a
> number of suggestions of how/where to track down the actual root cause,
> but apparently nobody is interested in doing that.
>
> (I can't reproduce so I can't actually do anything about it).
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-11-01 17:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-20 7:20 [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 Andrew Dickinson
2010-10-20 11:17 ` Peter Zijlstra
2010-11-01 17:31 ` Andrew Dickinson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox