All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Ingo Molnar <mingo@elte.hu>
Cc: Jes Sorensen <jes@sgi.com>, Jens Axboe <jens.axboe@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>, Yinghai Lu <yinghai@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: Latest Linus tree oopses on Nehalem box
Date: Fri, 21 Aug 2009 13:58:54 +0200	[thread overview]
Message-ID: <1250855934.7538.30.camel@twins> (raw)
In-Reply-To: <20090821114645.GD24647@elte.hu>

On Fri, 2009-08-21 at 13:46 +0200, Ingo Molnar wrote:
> * Jes Sorensen <jes@sgi.com> wrote:
> 
> > Hi,
> >
> > I am seeing this one with the latest Linus' git tree as of this 
> > morning on a Nehalem box. Using the defconfig + megaraid driver.
> >
> > Not sure if this is already fixed, or if someone already knows 
> > whats wrong? Smells like a yet another BIOS bug - yes the BIOS on 
> > this thing is rubbish.
> 
> my Nehalem (16 logical cpus) boots fine:
> 
>  aldebaran:~> uname -a
>  Linux aldebaran 2.6.31-rc6-tip-01272-g9919e28-dirty #1518 SMP Fri 
>  Aug 21 11:13:12 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux
> 
> > [    6.664800] RIP: 0010:[<ffffffff810391e7>]  [<ffffffff810391e7>]  
> > find_busiest_group+0x620/0x6fd 
> 
> Nothing similar is open at the moment.
> 
> There's only one open .31 scheduler regression bug at the moment: a 
> rare division by zero bug that sometimes crashes boxes - the bigger 
> the box the likelier the crash.

That's actually a -tip only regression caused by
a5004278f0525dcb9aa43703ef77bf371ea837cd.

I thought to had found the race that caused the /0 (the below patch),
but testing has proven me wrong. Still looking at that.

---
Subject: sched: Avoid division by zero
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Aug 07 21:53:17 CEST 2009

Patch a5004278f0525dcb9aa43703ef77bf371ea837cd (sched: Fix cgroup smp
fairness) introduced the possibility of a divide-by-zero because
load-balancing is not synchronized between sched_domains.

This can cause the state of cpus to change between the first and
second loop over the sched domain in tg_shares_up().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 kernel/sched.c |   23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -1522,7 +1522,8 @@ static void __set_se_shares(struct sched
  */
 static void
 update_group_shares_cpu(struct task_group *tg, int cpu,
-			unsigned long sd_shares, unsigned long sd_rq_weight)
+			unsigned long sd_shares, unsigned long sd_rq_weight,
+			unsigned long sd_eff_weight)
 {
 	unsigned long rq_weight;
 	unsigned long shares;
@@ -1535,13 +1536,15 @@ update_group_shares_cpu(struct task_grou
 	if (!rq_weight) {
 		boost = 1;
 		rq_weight = NICE_0_LOAD;
+		if (sd_rq_weight == sd_eff_weight)
+			sd_eff_weight += NICE_0_LOAD;
+		sd_rq_weight = sd_eff_weight;
 	}
 
 	/*
-	 *           \Sum shares * rq_weight
-	 * shares =  -----------------------
-	 *               \Sum rq_weight
-	 *
+	 *             \Sum_j shares_j * rq_weight_i
+	 * shares_i =  -----------------------------
+	 *                  \Sum_j rq_weight_j
 	 */
 	shares = (sd_shares * rq_weight) / sd_rq_weight;
 	shares = clamp_t(unsigned long, shares, MIN_SHARES, MAX_SHARES);
@@ -1593,14 +1596,8 @@ static int tg_shares_up(struct task_grou
 	if (!sd->parent || !(sd->parent->flags & SD_LOAD_BALANCE))
 		shares = tg->shares;
 
-	for_each_cpu(i, sched_domain_span(sd)) {
-		unsigned long sd_rq_weight = rq_weight;
-
-		if (!tg->cfs_rq[i]->rq_weight)
-			sd_rq_weight = eff_weight;
-
-		update_group_shares_cpu(tg, i, shares, sd_rq_weight);
-	}
+	for_each_cpu(i, sched_domain_span(sd))
+		update_group_shares_cpu(tg, i, shares, rq_weight, eff_weight);
 
 	return 0;
 }



  reply	other threads:[~2009-08-21 11:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-21 10:53 Latest Linus tree oopses on Nehalem box Jes Sorensen
2009-08-21 11:46 ` Ingo Molnar
2009-08-21 11:58   ` Peter Zijlstra [this message]
2009-08-21 14:42     ` [tip:sched/core] sched: Avoid division by zero tip-bot for Peter Zijlstra
2009-08-25 19:11       ` Peter Zijlstra
2009-08-26  9:16         ` Yinghai Lu
2009-08-26  9:25           ` Peter Zijlstra
2009-08-27 11:08           ` [PATCH] sched: Avoid division by zero - really Peter Zijlstra
2009-08-27 12:19             ` Eric Dumazet
2009-08-27 12:32               ` Peter Zijlstra
2009-08-28  6:30             ` [tip:sched/core] sched: Fix " tip-bot for Peter Zijlstra
2009-08-21 13:04   ` Latest Linus tree oopses on Nehalem box Jes Sorensen
2009-08-21 13:26     ` Ingo Molnar
2009-08-21 13:35       ` Jes Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1250855934.7538.30.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=hpa@zytor.com \
    --cc=jens.axboe@oracle.com \
    --cc=jes@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.