From: Myron Stowe <myron.stowe@hp.com>
To: Nikanth Karthikesan <knikanth@suse.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
Bjorn Helgaas <bjorn.helgaas@hp.com>, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <peterz@infradead.org>,
Venkatesh Pallipadi <venki@google.com>,
Nikhil Rao <ncrao@google.com>,
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
linux-kernel@vger.kernel.org, rjenties@google.com
Subject: Re: divide error in select_task_rq_fair()
Date: Fri, 12 Nov 2010 07:06:35 -0700 [thread overview]
Message-ID: <1289570795.2814.23.camel@zim> (raw)
In-Reply-To: <201011121152.30204.knikanth@suse.de>
On Fri, 2010-11-12 at 11:52 +0530, Nikanth Karthikesan wrote:
> On Thursday 11 November 2010 23:58:04 Myron Stowe wrote:
> > On Fri, 2010-11-05 at 07:17 +0100, Eric Dumazet wrote:
> > > Le jeudi 04 novembre 2010 à 20:00 -0600, Bjorn Helgaas a écrit :
> > > > Is that going to help you debug the problem? The solution is not going
> > > > to be something like "set NR_CPUS=x". If NR_CPUS is too small, the
> > > > machine should still *boot*, even if we can't use all the CPUs in the
> > > > box.
> > >
> > > Yes, it will help to understand the layout of cpu / domains and make
> > > appropriate changes.
> > >
> > > Alternative is you send me such a machine :=)
> >
> > I opened a BZ on this issue as it seems to be a regression -
> > https://bugzilla.kernel.org/show_bug.cgi?id=22662
> >
> > I also, as indicated in the BZ, bisected the kernel which gave the
> > following results and reverting 50f2d7f682f9c0ed58191d0982fe77888d59d162
> > did re-enable booting on the box in question (an HP dl980g7). Let me
> > know what further info you need or patches to test for debugging this.
> >
> > Thanks,
> >
> > commit 50f2d7f682f9c0ed58191d0982fe77888d59d162
> > Author: Nikanth Karthikesan <knikanth@suse.de>
> > Date: Thu Sep 30 17:34:10 2010 +0530
> >
> > x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
> >
> > commit d9c2d5ac6af87b4491bff107113aaf16f6c2b2d9 "x86, numa: Use
> > near(er) online node instead of roundrobin for NUMA" changed NUMA
> > initialization on Intel to choose the nearest online node or first node.
> > Fake NUMA would be better of with round-robin initialization, instead of
> > the all CPUS on first node. Change the choice of first node, back to
> > round-robin.
> >
> > For testing NUMA kernel behaviour without cpusets and NUMA aware
> > applications, it would be better to have cpus in different nodes,
> > rather than all in a single node. With cpusets migration of tasks
> > scenarios cannot not be tested.
> >
> > I guess having it round-robin shouldn't affect the use cases for all
> > cpus on the first node.
> >
> > The code comments in arch/x86/mm/numa_64.c:759 indicate that this used
> > to be the case, which was changed by commit d9c2d5ac6. It changed from
> > roundrobin to nearer or first node. And I couldn't find any reason for
> > this change in its changelog.
> >
> > Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
> > Cc: David Rientjes <rientjes@google.com>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >
> > > Thanks
> >
>
> Can you try with this patch?
Hi Nikanth:
I won't be working today - I'm taking my daughter for a college campus
visit (she is a senior in High School this year) - but I will try out
this patch this weekend and get back to you with the results.
Myron
>
> Thanks
> Nikanth
>
> Fallback to first node, if the node is not online.
>
> Fixes regression of commit 50f2d7f682f9c0ed58191d0982fe77888d59d162
> x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA
>
> When some of the NUMA nodes are disabled, and the CPUs are assigned
> in round-robin fashion, CPUs might be assigned to disabled nodes
> resulting in the crash. While using round-robin assignment, check if the
> node is online. If the node is not online, use the first online node.
>
> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
>
> ---
>
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index d16c2c5..f31237c 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -287,6 +287,8 @@ static void __cpuinit srat_detect_node(struct cpuinfo_x86 *c)
> if (node == NUMA_NO_NODE || !node_online(node)) {
> /* reuse the value from init_cpu_to_node() */
> node = cpu_to_node(cpu);
> + if (!node_online(node))
> + node = first_node(node_online_map);
> }
> numa_set_node(cpu, node);
> #endif
>
--
Myron Stowe Linux Kernel Developer
Fort Collins, CO Office of Corporate Strategy and Technology
next prev parent reply other threads:[~2010-11-12 14:06 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-04 4:12 divide error in select_task_rq_fair() Bjorn Helgaas
2010-11-04 5:19 ` Eric Dumazet
2010-11-04 14:28 ` Bjorn Helgaas
2010-11-04 14:37 ` Eric Dumazet
2010-11-05 2:00 ` Bjorn Helgaas
2010-11-05 6:17 ` Eric Dumazet
2010-11-11 18:28 ` Myron Stowe
2010-11-12 6:22 ` Nikanth Karthikesan
2010-11-12 14:06 ` Myron Stowe [this message]
2010-11-14 17:36 ` Myron Stowe
2010-11-14 19:11 ` Yinghai Lu
2010-11-18 23:32 ` Myron Stowe
2010-11-22 5:25 ` [PATCH] x86, acpi: Parse all SRAT cpu entries even have cpu num limitation Yinghai Lu
2010-12-15 22:09 ` divide error in select_task_rq_fair() Venkatesh Pallipadi
2010-11-14 1:15 ` Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1289570795.2814.23.camel@zim \
--to=myron.stowe@hp.com \
--cc=bjorn.helgaas@hp.com \
--cc=eric.dumazet@gmail.com \
--cc=knikanth@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=ncrao@google.com \
--cc=peterz@infradead.org \
--cc=rjenties@google.com \
--cc=venki@google.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.