linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Yinghai Lu <yinghai@kernel.org>, Brian Gerst <brgerst@gmail.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Shaohui Zheng <shaohui.zheng@intel.com>,
	David Rientjes <rientjes@google.com>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@linux.intel.com>
Subject: Re: [PATCH] x86-64, NUMA: reimplement cpu node map initialization for fake numa
Date: Tue, 12 Apr 2011 13:00:37 +0900	[thread overview]
Message-ID: <20110412040037.GK9673@mtj.dyndns.org> (raw)
In-Reply-To: <20110411105837.0065.A69D9226@jp.fujitsu.com>

Hey,

On Mon, Apr 11, 2011 at 10:58:21AM +0900, KOSAKI Motohiro wrote:
> 	1) revert all of your x86-64/mm chagesets
> 	2) undo only numa_emulation change (my proposal)
> 	3) make a radical improvement now and apply it without linux-next
> 	   testing phase.
> 
> I dislike 1) and 3) beucase, 1) we know the breakage is where come from.
> then we have no reason to revert all. 3) I hate untested patch simply.

Yeah, sure, we need to fix it but let's at least try to understand
what's broken and assess which is the best approach before rushing
with a quick fix.  It's not like it breaks common boot scenarios or
we're in late -rc cycles.

So, before the change, if the machine had neither ACPI nor AMD NUMA
configuration, fake_physnodes() would have assigned node 0 to all
CPUs, while new code would RR assign availabile nodes.  For !emulation
case, both behave the same because, well, there can be only one node.
With emulation, it becomes different.  CPUs are RR'd across the
emulated nodes and this breaks the siblings belong to the same node
assumption.

> A few addional explanation is here: scheduler group for MC is created based
> on cpu_llc_shared_mask(). And it was created set_cpu_sibling_map().
> Unfortunatelly, it is constructed very later against numa_init_array().
> Thus, numa_init_array() changing is no simple work and no low risk work.
> 
> In the other word, I didn't talk about which is correct (or proper) 
> algorithm, I did only talk about logic undo has least regression risk. 
> So, I still think making new RR numa assignment should be deferred 
> .40 or .41 and apply my bandaid patch now. However if you have an 
> alternative fixing patch, I can review and discuss it, of cource.

Would something like the following work?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c2871d3..bad8a10 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -320,6 +320,18 @@ static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
 	cpumask_set_cpu(cpu2, cpu_core_mask(cpu1));
 	cpumask_set_cpu(cpu1, cpu_llc_shared_mask(cpu2));
 	cpumask_set_cpu(cpu2, cpu_llc_shared_mask(cpu1));
+
+	/*
+	 * It's assumed that sibling CPUs live on the same NUMA node, which
+	 * might not hold if NUMA configuration is broken or emulated.
+	 * Enforce it.
+	 */
+	if (early_cpu_to_node(cpu1) != early_cpu_to_node(cpu2)) {
+		pr_warning("CPU %d in node %d and CPU %d in node %d are siblings, forcing same node\n",
+			   cpu1, early_cpu_to_node(cpu1),
+			   cpu2, early_cpu_to_node(cpu2));
+		numa_set_node(cpu2, early_cpu_to_node(cpu1));
+	}
 }
 
 

  reply	other threads:[~2011-04-12  4:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20110408235739.A6B0.A69D9226@jp.fujitsu.com>
2011-04-08 16:43 ` [PATCH] x86-64, NUMA: reimplement cpu node map initialization for fake numa Tejun Heo
2011-04-11  1:58   ` KOSAKI Motohiro
2011-04-12  4:00     ` Tejun Heo [this message]
2011-04-12  4:38       ` KOSAKI Motohiro
2011-04-12  6:31         ` KOSAKI Motohiro
2011-04-12  7:13           ` Tejun Heo
2011-04-13  7:02             ` [PATCH] x86-64, NUMA: fix fakenuma boot failure KOSAKI Motohiro
2011-04-13 19:32               ` Tejun Heo
2011-04-14  0:51                 ` [PATCH v2] " KOSAKI Motohiro
2011-04-14 15:05                   ` Tejun Heo
2011-04-15 11:39                     ` [PATCH v3] " KOSAKI Motohiro
2011-04-15 15:35                       ` Tejun Heo
2011-04-15 19:24                       ` [tip:x86/urgent] x86, NUMA: Fix " tip-bot for KOSAKI Motohiro
2011-04-14  6:44                 ` [PATCH] x86-64, NUMA: fix " Ingo Molnar
2011-04-14 14:49                   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110412040037.GK9673@mtj.dyndns.org \
    --to=tj@kernel.org \
    --cc=brgerst@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=hpa@linux.intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rientjes@google.com \
    --cc=shaohui.zheng@intel.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).