From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030208AbWGaQPM (ORCPT ); Mon, 31 Jul 2006 12:15:12 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1030211AbWGaQPM (ORCPT ); Mon, 31 Jul 2006 12:15:12 -0400 Received: from mga06.intel.com ([134.134.136.21]:40964 "EHLO orsmga101.jf.intel.com") by vger.kernel.org with ESMTP id S1030208AbWGaQPJ (ORCPT ); Mon, 31 Jul 2006 12:15:09 -0400 X-IronPort-AV: i="4.07,199,1151910000"; d="scan'208"; a="99238384:sNHT171688083" Date: Mon, 31 Jul 2006 09:04:40 -0700 From: "Siddha, Suresh B" To: Ingo Molnar Cc: Paul Jackson , Nick Piggin , Srivatsa Vaddagiri , Suresh Siddha , Simon.Derr@bull.net, Jack Steiner , linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [BUG] sched: big numa dynamic sched domain memory corruption Message-ID: <20060731090440.A2311@unix-os.sc.intel.com> References: <20060731070734.19126.40501.sendpatchset@v0> <20060731071242.GA31377@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20060731071242.GA31377@elte.hu>; from mingo@elte.hu on Mon, Jul 31, 2006 at 09:12:42AM +0200 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 31, 2006 at 09:12:42AM +0200, Ingo Molnar wrote: > > * Paul Jackson wrote: > > > @@ -5675,12 +5675,13 @@ void build_sched_domains(const cpumask_t > > int group; > > struct sched_domain *sd = NULL, *p; > > cpumask_t nodemask = node_to_cpumask(cpu_to_node(i)); > > + int cpus_per_node = cpus_weight(nodemask); > > > > cpus_and(nodemask, nodemask, *cpu_map); > > > > #ifdef CONFIG_NUMA > > - if (cpus_weight(*cpu_map) > > - > SD_NODES_PER_DOMAIN*cpus_weight(nodemask)) { > > + if (cpus_weight(cpu_online_map) > > + > SD_NODES_PER_DOMAIN*cpus_per_node) { > > if (!sched_group_allnodes) { > > sched_group_allnodes > > = kmalloc(sizeof(struct sched_group) > > even if the bug is not fully understood in time, i think we should queue > the patch above for v2.6.18. (with the small nit that you should put the I believe that this problem doesn't happen with the current mainline code. Paul can you please test the mainline code and confirm? After going through SLES10 code and current mainline code, my understanding is that SLES10 has this bug but not mainline. thanks, suresh