From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755577AbZHUL7N (ORCPT ); Fri, 21 Aug 2009 07:59:13 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755507AbZHUL7M (ORCPT ); Fri, 21 Aug 2009 07:59:12 -0400 Received: from viefep16-int.chello.at ([62.179.121.36]:3210 "EHLO viefep16-int.chello.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755301AbZHUL7L (ORCPT ); Fri, 21 Aug 2009 07:59:11 -0400 X-SourceIP: 213.93.53.227 Subject: Re: Latest Linus tree oopses on Nehalem box From: Peter Zijlstra To: Ingo Molnar Cc: Jes Sorensen , Jens Axboe , Thomas Gleixner , "H. Peter Anvin" , Yinghai Lu , linux-kernel , Ingo Molnar , Linus Torvalds In-Reply-To: <20090821114645.GD24647@elte.hu> References: <4A8E7CBE.3020209@sgi.com> <20090821114645.GD24647@elte.hu> Content-Type: text/plain Date: Fri, 21 Aug 2009 13:58:54 +0200 Message-Id: <1250855934.7538.30.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.26.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2009-08-21 at 13:46 +0200, Ingo Molnar wrote: > * Jes Sorensen wrote: > > > Hi, > > > > I am seeing this one with the latest Linus' git tree as of this > > morning on a Nehalem box. Using the defconfig + megaraid driver. > > > > Not sure if this is already fixed, or if someone already knows > > whats wrong? Smells like a yet another BIOS bug - yes the BIOS on > > this thing is rubbish. > > my Nehalem (16 logical cpus) boots fine: > > aldebaran:~> uname -a > Linux aldebaran 2.6.31-rc6-tip-01272-g9919e28-dirty #1518 SMP Fri > Aug 21 11:13:12 CEST 2009 x86_64 x86_64 x86_64 GNU/Linux > > > [ 6.664800] RIP: 0010:[] [] > > find_busiest_group+0x620/0x6fd > > Nothing similar is open at the moment. > > There's only one open .31 scheduler regression bug at the moment: a > rare division by zero bug that sometimes crashes boxes - the bigger > the box the likelier the crash. That's actually a -tip only regression caused by a5004278f0525dcb9aa43703ef77bf371ea837cd. I thought to had found the race that caused the /0 (the below patch), but testing has proven me wrong. Still looking at that. --- Subject: sched: Avoid division by zero From: Peter Zijlstra Date: Fri Aug 07 21:53:17 CEST 2009 Patch a5004278f0525dcb9aa43703ef77bf371ea837cd (sched: Fix cgroup smp fairness) introduced the possibility of a divide-by-zero because load-balancing is not synchronized between sched_domains. This can cause the state of cpus to change between the first and second loop over the sched domain in tg_shares_up(). Signed-off-by: Peter Zijlstra --- kernel/sched.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -1522,7 +1522,8 @@ static void __set_se_shares(struct sched */ static void update_group_shares_cpu(struct task_group *tg, int cpu, - unsigned long sd_shares, unsigned long sd_rq_weight) + unsigned long sd_shares, unsigned long sd_rq_weight, + unsigned long sd_eff_weight) { unsigned long rq_weight; unsigned long shares; @@ -1535,13 +1536,15 @@ update_group_shares_cpu(struct task_grou if (!rq_weight) { boost = 1; rq_weight = NICE_0_LOAD; + if (sd_rq_weight == sd_eff_weight) + sd_eff_weight += NICE_0_LOAD; + sd_rq_weight = sd_eff_weight; } /* - * \Sum shares * rq_weight - * shares = ----------------------- - * \Sum rq_weight - * + * \Sum_j shares_j * rq_weight_i + * shares_i = ----------------------------- + * \Sum_j rq_weight_j */ shares = (sd_shares * rq_weight) / sd_rq_weight; shares = clamp_t(unsigned long, shares, MIN_SHARES, MAX_SHARES); @@ -1593,14 +1596,8 @@ static int tg_shares_up(struct task_grou if (!sd->parent || !(sd->parent->flags & SD_LOAD_BALANCE)) shares = tg->shares; - for_each_cpu(i, sched_domain_span(sd)) { - unsigned long sd_rq_weight = rq_weight; - - if (!tg->cfs_rq[i]->rq_weight) - sd_rq_weight = eff_weight; - - update_group_shares_cpu(tg, i, shares, sd_rq_weight); - } + for_each_cpu(i, sched_domain_span(sd)) + update_group_shares_cpu(tg, i, shares, rq_weight, eff_weight); return 0; }