linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Be
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	Yinghai Lu <yinghai@kernel.org>
Subject: [PATCH 01/50] x86, numa: fix boot without RAM on node0 again
Date: Tue, 13 Jul 2010 00:09:55 -0700	[thread overview]
Message-ID: <1279005044-24777-2-git-send-email-yinghai@kernel.org> (raw)
In-Reply-To: <1279005044-24777-1-git-send-email-yinghai@kernel.org>

|commit e534c7c5f8d6e9fc46f57fab067c7e48d8ceb172
|Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
|Date:   Wed May 26 14:44:58 2010 -0700
|
|    numa: x86_64: use generic percpu var numa_node_id() implementation
|
|    x86 arch specific changes to use generic numa_node_id() based on generic
|    percpu variable infrastructure.  Back out x86's custom version of
|    numa_node_id()

broke numa system that doesn't have ram on node0 when MEMORY_HOTPLUG is enabled.

because cpu_up() will call cpu_to_node() before per_cpu(numa_node) is setup for
APs.

When Node0 doesn't have RAM, on x86, cpus already round it to nearest node with
RAM in x86_cpu_to_node_map. and per_cpu(numa_node) is not set up until in c_init for APs.

when later cpu_up() calling cpu_to_node() will get 0 again, and make it online
even there is no RAM on node0. so later all APs can not booted up, and later
will have panic.

[    1.611101] On node 0 totalpages: 0
.........
[    2.608558] On node 0 totalpages: 0
[    2.612065] Brought up 1 CPUs
[    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
...
   93.225341] calling  loop_init+0x0/0x1a4 @ 1
[   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
[   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
[   93.264621] Call Trace:
[   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
[   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
[   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
[   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
[   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
[   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
[   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
[   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
[   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
[   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
[   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
[   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
[   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
[   93.392047] loop: out of memory
...

Try to assign per_cpu(numa_node) early

Signed-off-by: Yinghai <yinghai@kernel.org>
---
 arch/x86/kernel/setup_percpu.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index de3b63a..b03959c 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -238,6 +238,15 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
 			early_per_cpu_map(x86_cpu_to_node_map, cpu);
+		/*
+		 * make sure boot cpu numa_node is right, when boot cpu is on
+		 *  the node that doesn't have mem installed
+		 * also cpu_up() will call cpu_to_node() for APs when
+		 *  MEMORY_HOTPLUG is defined, before per_cpu(numa_node) is set
+		 *  up later with c_init aka intel_init/amd_init
+		 * So set them all (boot cpu and all APs)
+		 */
+		set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
 #endif
 #endif
 		/*
@@ -257,14 +266,6 @@ void __init setup_per_cpu_areas(void)
 	early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
 #endif
 
-#if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
-	/*
-	 * make sure boot cpu numa_node is right, when boot cpu is on the
-	 * node that doesn't have mem installed
-	 */
-	set_cpu_numa_node(boot_cpu_id, early_cpu_to_node(boot_cpu_id));
-#endif
-
 	/* Setup node to cpumask map */
 	setup_node_to_cpumask_map();
 
-- 
1.6.4.2

WARNING: multiple messages have this Message-ID (diff)
From: Yinghai Lu <yinghai@kernel.org>
To: Ingo Molnar <mingo@elte.hu>, Thomas Gleixner <tglx@linutronix.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Miller <davem@davemloft.net>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	Yinghai Lu <yinghai@kernel.org>
Subject: [PATCH 01/50] x86, numa: fix boot without RAM on node0 again
Date: Tue, 13 Jul 2010 00:09:55 -0700	[thread overview]
Message-ID: <1279005044-24777-2-git-send-email-yinghai@kernel.org> (raw)
Message-ID: <20100713070955.QVI4zXpK0OxrAn7RdrSAOQyZhcZoVCnGTkzVGIx0H1A@z> (raw)
In-Reply-To: <1279005044-24777-1-git-send-email-yinghai@kernel.org>

|commit e534c7c5f8d6e9fc46f57fab067c7e48d8ceb172
|Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
|Date:   Wed May 26 14:44:58 2010 -0700
|
|    numa: x86_64: use generic percpu var numa_node_id() implementation
|
|    x86 arch specific changes to use generic numa_node_id() based on generic
|    percpu variable infrastructure.  Back out x86's custom version of
|    numa_node_id()

broke numa system that doesn't have ram on node0 when MEMORY_HOTPLUG is enabled.

because cpu_up() will call cpu_to_node() before per_cpu(numa_node) is setup for
APs.

When Node0 doesn't have RAM, on x86, cpus already round it to nearest node with
RAM in x86_cpu_to_node_map. and per_cpu(numa_node) is not set up until in c_init for APs.

when later cpu_up() calling cpu_to_node() will get 0 again, and make it online
even there is no RAM on node0. so later all APs can not booted up, and later
will have panic.

[    1.611101] On node 0 totalpages: 0
.........
[    2.608558] On node 0 totalpages: 0
[    2.612065] Brought up 1 CPUs
[    2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
...
   93.225341] calling  loop_init+0x0/0x1a4 @ 1
[   93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
[   93.246539] Pid: 1, comm: swapper Tainted: G        W   2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
[   93.264621] Call Trace:
[   93.266533]  [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
[   93.270710]  [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
[   93.285849]  [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
[   93.291811]  [<ffffffff81407956>] alloc_disk+0x11/0x13
[   93.306157]  [<ffffffff81503e51>] loop_alloc+0xa7/0x180
[   93.310538]  [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
[   93.324909]  [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
[   93.329650]  [<ffffffff810001f2>] do_one_initcall+0x57/0x136
[   93.345197]  [<ffffffff827486d0>] kernel_init+0x184/0x20e
[   93.348146]  [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
[   93.365194]  [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
[   93.369305]  [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
[   93.386011]  [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
[   93.392047] loop: out of memory
...

Try to assign per_cpu(numa_node) early

Signed-off-by: Yinghai <yinghai@kernel.org>
---
 arch/x86/kernel/setup_percpu.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c
index de3b63a..b03959c 100644
--- a/arch/x86/kernel/setup_percpu.c
+++ b/arch/x86/kernel/setup_percpu.c
@@ -238,6 +238,15 @@ void __init setup_per_cpu_areas(void)
 #ifdef CONFIG_NUMA
 		per_cpu(x86_cpu_to_node_map, cpu) =
 			early_per_cpu_map(x86_cpu_to_node_map, cpu);
+		/*
+		 * make sure boot cpu numa_node is right, when boot cpu is on
+		 *  the node that doesn't have mem installed
+		 * also cpu_up() will call cpu_to_node() for APs when
+		 *  MEMORY_HOTPLUG is defined, before per_cpu(numa_node) is set
+		 *  up later with c_init aka intel_init/amd_init
+		 * So set them all (boot cpu and all APs)
+		 */
+		set_cpu_numa_node(cpu, early_cpu_to_node(cpu));
 #endif
 #endif
 		/*
@@ -257,14 +266,6 @@ void __init setup_per_cpu_areas(void)
 	early_per_cpu_ptr(x86_cpu_to_node_map) = NULL;
 #endif
 
-#if defined(CONFIG_X86_64) && defined(CONFIG_NUMA)
-	/*
-	 * make sure boot cpu numa_node is right, when boot cpu is on the
-	 * node that doesn't have mem installed
-	 */
-	set_cpu_numa_node(boot_cpu_id, early_cpu_to_node(boot_cpu_id));
-#endif
-
 	/* Setup node to cpumask map */
 	setup_node_to_cpumask_map();
 
-- 
1.6.4.2


  parent reply	other threads:[~2010-07-13  7:15 UTC|newest]

Thread overview: 103+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-13  7:09 [PATCH -v24 00/50] Use memblock with x86 Yinghai Lu
2010-07-13  7:09 ` Yinghai Lu
2010-07-13  7:09 ` Yinghai Lu [this message]
2010-07-13  7:09   ` [PATCH 01/50] x86, numa: fix boot without RAM on node0 again Yinghai Lu
2010-07-13  7:09 ` [PATCH 02/50] x86,mm: fix 32bit numa sparsemem Yinghai Lu
2010-07-13  7:09   ` Yinghai Lu
2010-07-13  7:09 ` [PATCH 03/50] lmb: rename to memblock Yinghai Lu
2010-07-13  7:09 ` [PATCH 04/50] memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region Yinghai Lu
2010-07-13  7:09   ` Yinghai Lu
2010-07-13  7:09 ` [PATCH 05/50] memblock: No reason to include asm/memblock.h late Yinghai Lu
2010-07-13  7:09   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 06/50] memblock: Introduce for_each_memblock() and new accessors, and use it Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 07/50] memblock: Remove nid_range argument, arch provides memblock_nid_range() instead Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 08/50] memblock: Factor the lowest level alloc function Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 09/50] memblock: Expose MEMBLOCK_ALLOC_ANYWHERE Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 10/50] memblock: Introduce default allocation limit and use it to replace explicit ones Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 11/50] memblock: Remove rmo_size, burry it in arch/powerpc where it belongs Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 12/50] memblock: Change u64 to phys_addr_t Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 13/50] memblock: Remove unused memblock.debug struct member Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 14/50] memblock: Remove memblock_type.size and add memblock.memory_size instead Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 15/50] memblock: Move memblock arrays to static storage in memblock.c and make their size a variable Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 16/50] memblock: Add debug markers at the end of the array Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 17/50] memblock: Make memblock_find_region() out of memblock_alloc_region() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 18/50] memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0 Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 19/50] memblock: Move memblock_init() to the bottom of the file Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 20/50] memblock: split memblock_find_base() out of __memblock_alloc_base() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 21/50] memblock: Move functions around into a more sensible order Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 22/50] memblock: Add array resizing support Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 23/50] memblock: Add arch function to control coalescing of memblock memory regions Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 24/50] memblock: Add "start" argument to memblock_find_base() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 25/50] memblock: NUMA allocate can now use early_pfn_map Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 26/50] memblock: Separate memblock_alloc_nid() and memblock_alloc_try_nid() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 27/50] memblock: Make memblock_alloc_try_nid() fallback to MEMBLOCK_ALLOC_ANYWHERE Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 28/50] memblock: Add debugfs files to dump the arrays content Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 29/50] memblock: Prepare x86 to use memblock to replace early_res Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 30/50] memblock: Print new doubled array location info Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 31/50] memblock: Export MEMBLOCK_ERROR again Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 32/50] memblock: Prepare to include linux/memblock.h in core file Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 33/50] memblock: Add ARCH_DISCARD_MEMBLOCK to put memblock code to .init Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 34/50] memblock: Add memblock_find_in_range() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 35/50] x86, memblock: Add memblock_x86_find_in_range_size() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 36/50] bootmem, x86: Add weak version of reserve_bootmem_generic Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 37/50] x86, memblock: Add memblock_x86_to_bootmem() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 38/50] x86,memblock: Add memblock_x86_reserve_range/memblock_x86_free_range Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 39/50] x86, memblock: Add get_free_all_memory_range() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 40/50] x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 41/50] memblock: Add find_memory_core_early() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 42/50] x86, memblock: Add memblock_x86_find_in_range_node() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 43/50] x86, memblock: Add memblock_x86_free_memory_in_range() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 44/50] x86, memblock: Add memblock_x86_memory_in_range() Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 45/50] x86, memblock: Use memblock_debug to control debug message print out Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13 20:37   ` Bjorn Helgaas
2010-07-13 20:40     ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 46/50] x86: Use memblock to replace early_res Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 47/50] x86: Replace e820_/_early string with memblock_ Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 48/50] x86: Remove not used early_res code Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 49/50] x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu
2010-07-13  7:10 ` [PATCH 50/50] x86: remove old bootmem code Yinghai Lu
2010-07-13  7:10   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1279005044-24777-2-git-send-email-yinghai@kernel.org \
    --to=yinghai@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=hannes@cmpxchg.org \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).