[PATCH 0/2] sched: Fix "divide error: 0000" in find_busiest_group

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Terry Loftin <terry.loftin@hp.com>
To: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Bob Montgomery <bob.montgomery@hp.com>
Subject: [PATCH 0/2] sched: Fix "divide error: 0000" in find_busiest_group
Date: Tue, 19 Jul 2011 14:58:42 -0600	[thread overview]
Message-ID: <4E25F002.2080503@hp.com> (raw)

Howdy,

The divide error occurs in inlined function update_sg_lb_stats() in
kernel/sched.c when we adjust the relative CPU power of a group by
dividing group_load by group->cpu_power:

    /* Adjust by relative CPU power of the group */ sgs->avg_load =
    (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

In this case, group->cpu_power is zero.  This was set in
update_cpu_power(), which depends on scale_rt_power() among other things.
scale_rt_power() is based in part on the rq->clock and rq->age_stamp
values for the runqueue:

    total = sched_avg_period() + (rq->clock - rq->age_stamp);

The clock and age_stamp values are in nanoseconds and come from
__cycles_2_ns() which converts the CPU tsc counter to nanoseconds.
On 64-bit systems, the computation returned from __cycles_2_ns() wraps
when the nanosecond value is 54 bits or larger (about 208.5 days).

The rq->age_stamp is designed to follow the clock value but does not
account for the fact that the clock value may wrap, and it is never reset.
After rq->clock wraps, the expression (rq->clock - rq->age_stamp) leads
to large negative values which in turn lead to very large values for
scale_rt_power().

In update_cpu_power(), an unsigned long local variable, 'power', is
used to hold the intermediate result, including the return value from
scale_rt_power(), before it is placed in an unsigned int rq->cpu_power.
If the power calculated in update_cpu_power() is > 32 bits, but all
the low order bits are zero, then the value will be truncated and
rq->cpu_power will be set to zero, leading to the divide by zero error.
There is a protective check immediately before the assignment, but it
compares the full 64-bit value instead of the 32-bit portion that will
be stored in rq->cpu_power.

I have analyzed two crash dumps from systems that were up 220 and 230
days to confirm this.

-T

Signed-off-by: Terry Loftin <terry.loftin@hp.com>
Signed-off-by: Bob Montgomery <bob.montgomery@hp.com>

                 reply	other threads:[~2011-07-19 20:58 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E25F002.2080503@hp.com \
    --to=terry.loftin@hp.com \
    --cc=bob.montgomery@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox