From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Ingo Molnar <mingo@elte.hu>
Cc: Jes Sorensen <jes@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>, Yinghai Lu <yinghai@kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@redhat.com>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Latest Linus tree oopses on Nehalem box
Date: Fri, 21 Aug 2009 13:58:54 +0200 [thread overview]
Message-ID: <1250855934.7538.30.camel@twins> (raw)
In-Reply-To: <20090821114645.GD24647@elte.hu>
On Fri, 2009-08-21 at 13:46 +0200, Ingo Molnar wrote:
> * Jes Sorensen <jes@sgi.com> wrote:
>
> > Hi,
> >
> > I am seeing this one with the latest Linus' git tree as of this
> > morning on a Nehalem box. Using the defconfig + megaraid driver.
> >
> > Not sure if this is already fixed, or if someone already knows
> > whats wrong? Smells like a yet another BIOS bug - yes the BIOS on
> > this thing is rubbish.
>
> my Nehalem (16 logical cpus) boots fine:
>
> aldebaran:~> uname -a
> Linux aldebaran 2.6.31-rc6-tip-01272-g9919e28-dirty #1518 SMP Fri
> Aug 21 11:13:12 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> > [ 6.664800] RIP: 0010:[<ffffffff810391e7>] [<ffffffff810391e7>]
> > find_busiest_group+0x620/0x6fd
>
> Nothing similar is open at the moment.
>
> There's only one open .31 scheduler regression bug at the moment: a
> rare division by zero bug that sometimes crashes boxes - the bigger
> the box the likelier the crash.
That's actually a -tip only regression caused by
a5004278f0525dcb9aa43703ef77bf371ea837cd.
I thought to had found the race that caused the /0 (the below patch),
but testing has proven me wrong. Still looking at that.
---
Subject: sched: Avoid division by zero
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Aug 07 21:53:17 CEST 2009
Patch a5004278f0525dcb9aa43703ef77bf371ea837cd (sched: Fix cgroup smp
fairness) introduced the possibility of a divide-by-zero because
load-balancing is not synchronized between sched_domains.
This can cause the state of cpus to change between the first and
second loop over the sched domain in tg_shares_up().
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1522,7 +1522,8 @@ static void __set_se_shares(struct sched
*/
static void
update_group_shares_cpu(struct task_group *tg, int cpu,
- unsigned long sd_shares, unsigned long sd_rq_weight)
+ unsigned long sd_shares, unsigned long sd_rq_weight,
+ unsigned long sd_eff_weight)
{
unsigned long rq_weight;
unsigned long shares;
@@ -1535,13 +1536,15 @@ update_group_shares_cpu(struct task_grou
if (!rq_weight) {
boost = 1;
rq_weight = NICE_0_LOAD;
+ if (sd_rq_weight == sd_eff_weight)
+ sd_eff_weight += NICE_0_LOAD;
+ sd_rq_weight = sd_eff_weight;
}
/*
- * \Sum shares * rq_weight
- * shares = -----------------------
- * \Sum rq_weight
- *
+ * \Sum_j shares_j * rq_weight_i
+ * shares_i = -----------------------------
+ * \Sum_j rq_weight_j
*/
shares = (sd_shares * rq_weight) / sd_rq_weight;
shares = clamp_t(unsigned long, shares, MIN_SHARES, MAX_SHARES);
@@ -1593,14 +1596,8 @@ static int tg_shares_up(struct task_grou
if (!sd->parent || !(sd->parent->flags & SD_LOAD_BALANCE))
shares = tg->shares;
- for_each_cpu(i, sched_domain_span(sd)) {
- unsigned long sd_rq_weight = rq_weight;
-
- if (!tg->cfs_rq[i]->rq_weight)
- sd_rq_weight = eff_weight;
-
- update_group_shares_cpu(tg, i, shares, sd_rq_weight);
- }
+ for_each_cpu(i, sched_domain_span(sd))
+ update_group_shares_cpu(tg, i, shares, rq_weight, eff_weight);
return 0;
}
next prev parent reply other threads:[~2009-08-21 11:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-21 10:53 Latest Linus tree oopses on Nehalem box Jes Sorensen
2009-08-21 11:46 ` Ingo Molnar
2009-08-21 11:58 ` Peter Zijlstra [this message]
2009-08-21 14:42 ` [tip:sched/core] sched: Avoid division by zero tip-bot for Peter Zijlstra
2009-08-25 19:11 ` Peter Zijlstra
2009-08-26 9:16 ` Yinghai Lu
2009-08-26 9:25 ` Peter Zijlstra
2009-08-27 11:08 ` [PATCH] sched: Avoid division by zero - really Peter Zijlstra
2009-08-27 12:19 ` Eric Dumazet
2009-08-27 12:32 ` Peter Zijlstra
2009-08-28 6:30 ` [tip:sched/core] sched: Fix " tip-bot for Peter Zijlstra
2009-08-21 13:04 ` Latest Linus tree oopses on Nehalem box Jes Sorensen
2009-08-21 13:26 ` Ingo Molnar
2009-08-21 13:35 ` Jes Sorensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1250855934.7538.30.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=hpa@zytor.com \
--cc=jens.axboe@oracle.com \
--cc=jes@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox