[PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
@ 2010-10-20  7:20 Andrew Dickinson
  2010-10-20 11:17 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Dickinson @ 2010-10-20  7:20 UTC (permalink / raw)
  To: linux-kernel

This is a patch to fix the corner case where we're crashing with
divide_error in find_busiest_group (see
https://bugzilla.kernel.org/show_bug.cgi?id=16991).
I don't fully understand what the case is that causes sds.total_pwr to
be zero in find_busiest_group, but this patch guards against the
divide-by-zero bug.

I also added safe-guarding around other routines in the scheduler code
where we're dividing by power; that's more of a just-in-case and I'm
definitely open for debate on that.

diff -ruwp a/kernel/sched_fair.c b/kernel/sched_fair.c
--- a/kernel/sched_fair.c	2010-10-19 23:47:51.000000000 -0700
+++ b/kernel/sched_fair.c	2010-10-20 00:08:17.000000000 -0700
@@ -1344,7 +1344,9 @@ find_idlest_group(struct sched_domain *s
 		}

 		/* Adjust by relative CPU power of the group */
-		avg_load = (avg_load * SCHED_LOAD_SCALE) / group->cpu_power;
+		avg_load = (avg_load * SCHED_LOAD_SCALE);
+		if (group->cpu_power)
+			avg_load /= group->cpu_power;

 		if (local_group) {
 			this_load = avg_load;
@@ -2409,7 +2411,9 @@ static inline void update_sg_lb_stats(st
 	update_group_power(sd, this_cpu);

 	/* Adjust by relative CPU power of the group */
-	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
+	sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE);
+	if (group->cpu_power)
+		sgs->avg_load /= group->cpu_power;

 	/*
 	 * Consider the group unbalanced when the imbalance is larger
@@ -2692,7 +2696,7 @@ find_busiest_group(struct sched_domain *
 	if (!(*balance))
 		goto ret;

-	if (!sds.busiest || sds.busiest_nr_running == 0)
+	if (!sds.busiest || sds.busiest_nr_running == 0 || sds.total_pwr == 0)
 		goto out_balanced;

 	if (sds.this_load >= sds.max_load)
@@ -2757,7 +2761,9 @@ find_busiest_queue(struct sched_group *g
 		 * the load can be moved away from the cpu that is potentially
 		 * running at a lower capacity.
 		 */
-		wl = (wl * SCHED_LOAD_SCALE) / power;
+		wl = (wl * SCHED_LOAD_SCALE);
+		if (power)
+			wl /= power;

 		if (wl > max_load) {
 			max_load = wl;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
  2010-10-20  7:20 [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 Andrew Dickinson
@ 2010-10-20 11:17 ` Peter Zijlstra
  2010-11-01 17:31   ` Andrew Dickinson
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2010-10-20 11:17 UTC (permalink / raw)
  To: Andrew Dickinson; +Cc: linux-kernel, Ingo Molnar

On Wed, 2010-10-20 at 00:20 -0700, Andrew Dickinson wrote:
> This is a patch to fix the corner case where we're crashing with
> divide_error in find_busiest_group (see
> https://bugzilla.kernel.org/show_bug.cgi?id=16991).
> I don't fully understand what the case is that causes sds.total_pwr to
> be zero in find_busiest_group, but this patch guards against the
> divide-by-zero bug.
> 
> I also added safe-guarding around other routines in the scheduler code
> where we're dividing by power; that's more of a just-in-case and I'm
> definitely open for debate on that.

No.. papering over crap like this is not done. In that BZ there's a
number of suggestions of how/where to track down the actual root cause,
but apparently nobody is interested in doing that.

(I can't reproduce so I can't actually do anything about it).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7
  2010-10-20 11:17 ` Peter Zijlstra
@ 2010-11-01 17:31   ` Andrew Dickinson
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Dickinson @ 2010-11-01 17:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar

Peter,

I agree that getting to root-cause is important, but this is still an
unchecked exception.  Is your concern about "papering over" due to the
fact that this patch doesn't emit an error message/increment a
counter/etc?  I think that there's some middle ground here.  One
wouldn't blindly assume that malloc() returned non-zero, right?
Similarly, if dividing, one should check that the denominator is not
zero. :D

Regarding reproducing this bug.  All of the evidence that I've seen
(both in the BZ reports and my own experience) suggest that this
happens only after 6+ months of uptime on heavily loaded systems.  In
my case, it happened across a fleet of 60+ hosts within a 1-2 week
time-frame; each host is passing an average of 500kpps continuously
during this time-frame.  All of them previously had an uptime of
approximately 7 months.

Is there a middle ground here where we can handle the exception safely
and emit a message to help get more debugging information to try to
track this down?  The BZ report does have a recommended patch to emit
some WARN_ON messages, I'd be happy to include that in this patch as
well.  Would that help?

-A

On Wed, Oct 20, 2010 at 4:17 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2010-10-20 at 00:20 -0700, Andrew Dickinson wrote:
>> This is a patch to fix the corner case where we're crashing with
>> divide_error in find_busiest_group (see
>> https://bugzilla.kernel.org/show_bug.cgi?id=16991).
>> I don't fully understand what the case is that causes sds.total_pwr to
>> be zero in find_busiest_group, but this patch guards against the
>> divide-by-zero bug.
>>
>> I also added safe-guarding around other routines in the scheduler code
>> where we're dividing by power; that's more of a just-in-case and I'm
>> definitely open for debate on that.
>
> No.. papering over crap like this is not done. In that BZ there's a
> number of suggestions of how/where to track down the actual root cause,
> but apparently nobody is interested in doing that.
>
> (I can't reproduce so I can't actually do anything about it).
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-11-01 17:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-20  7:20 [PATCH] sched_fair.c:find_busiest_group(), kernel 2.6.35.7 Andrew Dickinson
2010-10-20 11:17 ` Peter Zijlstra
2010-11-01 17:31   ` Andrew Dickinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox