public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 32bit NUMA and fakeNUMA broken for AMD CPUs
@ 2011-06-21 15:41 Conny Seidel
  2011-06-26 10:22 ` Tejun Heo
  0 siblings, 1 reply; 28+ messages in thread
From: Conny Seidel @ 2011-06-21 15:41 UTC (permalink / raw)
  To: LKML, Tejun Heo

[-- Attachment #1: Type: text/plain, Size: 5910 bytes --]

Hi,

the commit 797390d8554b1e07aabea37d0140933b0412dba0 breaks 32bit on AMD
with native NUMA and fakeNUMA.

Native NUMA still boots, when the kernel parameter numa=off is added to
the cmdline.

[    0.000000] BUG: unable to handle kernel paging request at 000012b0
[    0.000000] IP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2
[    0.000000] *pdpt = 0000000000000000 *pde = f000eef3f000ee00
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] last sysfs file:
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00164-g797390d #1 To Be Filled By O.E.M. To Be Filled By O.E.M./E350M1
[    0.000000] EIP: 0060:[<c1aa13ce>] EFLAGS: 00010012 CPU: 0
[    0.000000] EIP is at memmap_init_zone+0x6c/0xf2
[    0.000000] EAX: 00000000 EBX: 000a8000 ECX: 000a7fff EDX: f2c00b80
[    0.000000] ESI: 000a8000 EDI: f2c00800 EBP: c19ffe54 ESP: c19ffe34
[    0.000000]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.000000] Process swapper (pid: 0, ti=c19fe000 task=c1a07f60 task.ti=c19fe000)
[    0.000000] Stack:
[    0.000000]  00000002 00000000 0023f000 00000000 10000000 00000a00 f2c00000 f2c00b58
[    0.000000]  c19ffeb0 c1a80f24 000375fe 00000000 f2c00800 00000800 00000100 00000030
[    0.000000]  c1abb768 0000003c 00000000 00000000 00000004 00207a02 f2c00800 000375fe
[    0.000000] Call Trace:
[    0.000000]  [<c1a80f24>] free_area_init_node+0x358/0x385
[    0.000000]  [<c1a81384>] free_area_init_nodes+0x420/0x487
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c102489e>] ? memory_present+0x66/0x6f
[    0.000000]  [<c1a79326>] paging_init+0x114/0x11b
[    0.000000]  [<c101742f>] ? native_apic_mem_read+0x8/0x19
[    0.000000]  [<c1a6cb13>] setup_arch+0xb37/0xc0a
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c1a69554>] start_kernel+0x76/0x316
[    0.000000]  [<c1a690a8>] i386_start_kernel+0xa8/0xb0
[    0.000000] Code: 0a c1 e0 1d 89 45 ec 8b 45 e4 03 3c 85 e8 5b a6 c1 e9 8a 00 00 00 89 f0 89 f3 c1 e8 0e 0f be 80 a8 57 a6 c1 8b 04 85 e8 5b a6 c1 <2b> 98 b0 12 00 00 c1 e3 05 03 98 ac 12 00 00 8b 03 25 ff ff ff
[    0.000000] EIP: [<c1aa13ce>] memmap_init_zone+0x6c/0xf2 SS:ESP 0068:c19ffe34
[    0.000000] CR2: 00000000000012b0
[    0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] Pid: 0, comm: swapper Tainted: G      D     2.6.39-rc5-00164-g797390d #1
[    0.000000] Call Trace:
[    0.000000]  [<c1637213>] panic+0x55/0x151
[    0.000000]  [<c10507c9>] ? blocking_notifier_call_chain+0x11/0x13
[    0.000000]  [<c1038340>] do_exit+0x99/0x6fa
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c10356de>] ? kmsg_dump+0x3c/0xbe
[    0.000000]  [<c163a569>] oops_end+0x97/0x9f
[    0.000000]  [<c101e9a4>] no_context+0x144/0x14e
[    0.000000]  [<c101eada>] __bad_area_nosemaphore+0x12c/0x134
[    0.000000]  [<c1a83a75>] ? memblock_add_region+0xbf/0x4af
[    0.000000]  [<c101eaf4>] bad_area_nosemaphore+0x12/0x15
[    0.000000]  [<c163beb0>] do_page_fault+0x1e8/0x3c8
[    0.000000]  [<c1a82c5e>] ? __alloc_memory_core_early+0x86/0x94
[    0.000000]  [<c163bcc8>] ? spurious_fault+0xf2/0xf2
[    0.000000]  [<c1639c6b>] error_code+0x5f/0x64
[    0.000000]  [<c163bcc8>] ? spurious_fault+0xf2/0xf2
[    0.000000]  [<c1aa13ce>] ? memmap_init_zone+0x6c/0xf2
[    0.000000]  [<c1a80f24>] free_area_init_node+0x358/0x385
[    0.000000]  [<c1a81384>] free_area_init_nodes+0x420/0x487
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c102489e>] ? memory_present+0x66/0x6f
[    0.000000]  [<c1a79326>] paging_init+0x114/0x11b
[    0.000000]  [<c101742f>] ? native_apic_mem_read+0x8/0x19
[    0.000000]  [<c1a6cb13>] setup_arch+0xb37/0xc0a
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1638f6d>] ? _raw_spin_unlock_irqrestore+0x19/0x25
[    0.000000]  [<c1637323>] ? printk+0x14/0x16
[    0.000000]  [<c1a69554>] start_kernel+0x76/0x316
[    0.000000]  [<c1a690a8>] i386_start_kernel+0xa8/0xb0



commit 797390d8554b1e07aabea37d0140933b0412dba0
Author: Tejun Heo <tj@kernel.org>
Date:   Mon May 2 14:18:52 2011 +0200

    x86-32, NUMA: use sparse_memory_present_with_active_regions()

    Instead of calling memory_present() for each region from NUMA init,
    call sparse_memory_present_with_active_regions() from paging_init()
    similarly to x86-64.

    For flat and numaq, this results in exactly the same memory_present()
    calls.  For srat, if there are multiple memory chunks for a node,
    after this change, memory_present() will be called separately for each
    chunk instead of being called once to encompass the whole range, which
    doesn't cause any harm and actually is the better behavior.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Yinghai Lu <yinghai@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>


##
##################################################################
# Email : conny.seidel@amd.com            GnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##################################################################
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach      #
# General Managers: Alberto Bozzoi                               #
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#               HRB Nr. 43632                                    #
##################################################################

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread
* [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
@ 2011-07-12  7:44 Tejun Heo
  2011-07-12  7:45 ` [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
  2011-07-13  5:33 ` [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ tip-bot for Tejun Heo
  0 siblings, 2 replies; 28+ messages in thread
From: Tejun Heo @ 2011-07-12  7:44 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Thomas Gleixner
  Cc: Conny Seidel, x86, linux-kernel, Hans Rosenfeld

>From 9f5e6296923d7cf47738dfcd38ab9e333d3fd356 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Fri, 1 Jul 2011 18:22:39 +0200

DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
---
This one is identical as the original posting[1].  Only the second
patch is updated.  Please schedule for 3.1-rc1.  Also available on the
following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git review-x86-mm-base

Thanks.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

 arch/x86/include/asm/mmzone_32.h |    6 +++---
 arch/x86/mm/numa_32.c            |    6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/mmzone_32.h b/arch/x86/include/asm/mmzone_32.h
index ffa037f..55728e1 100644
--- a/arch/x86/include/asm/mmzone_32.h
+++ b/arch/x86/include/asm/mmzone_32.h
@@ -34,15 +34,15 @@ static inline void resume_map_numa_kva(pgd_t *pgd) {}
  *    64Gb / 4096bytes/page = 16777216 pages
  */
 #define MAX_NR_PAGES 16777216
-#define MAX_ELEMENTS 1024
-#define PAGES_PER_ELEMENT (MAX_NR_PAGES/MAX_ELEMENTS)
+#define MAX_SECTIONS 1024
+#define PAGES_PER_SECTION (MAX_NR_PAGES/MAX_SECTIONS)
 
 extern s8 physnode_map[];
 
 static inline int pfn_to_nid(unsigned long pfn)
 {
 #ifdef CONFIG_NUMA
-	return((int) physnode_map[(pfn) / PAGES_PER_ELEMENT]);
+	return((int) physnode_map[(pfn) / PAGES_PER_SECTION]);
 #else
 	return 0;
 #endif
diff --git a/arch/x86/mm/numa_32.c b/arch/x86/mm/numa_32.c
index 849a975..3adebe7 100644
--- a/arch/x86/mm/numa_32.c
+++ b/arch/x86/mm/numa_32.c
@@ -41,7 +41,7 @@
  *     physnode_map[16-31] = 1;
  *     physnode_map[32- ] = -1;
  */
-s8 physnode_map[MAX_ELEMENTS] __read_mostly = { [0 ... (MAX_ELEMENTS - 1)] = -1};
+s8 physnode_map[MAX_SECTIONS] __read_mostly = { [0 ... (MAX_SECTIONS - 1)] = -1};
 EXPORT_SYMBOL(physnode_map);
 
 void memory_present(int nid, unsigned long start, unsigned long end)
@@ -52,8 +52,8 @@ void memory_present(int nid, unsigned long start, unsigned long end)
 			nid, start, end);
 	printk(KERN_DEBUG "  Setting physnode_map array to node %d for pfns:\n", nid);
 	printk(KERN_DEBUG "  ");
-	for (pfn = start; pfn < end; pfn += PAGES_PER_ELEMENT) {
-		physnode_map[pfn / PAGES_PER_ELEMENT] = nid;
+	for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) {
+		physnode_map[pfn / PAGES_PER_SECTION] = nid;
 		printk(KERN_CONT "%lx ", pfn);
 	}
 	printk(KERN_CONT "\n");
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2011-07-13  5:34 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-21 15:41 32bit NUMA and fakeNUMA broken for AMD CPUs Conny Seidel
2011-06-26 10:22 ` Tejun Heo
     [not found]   ` <20110626223807.47cef5c6.conny.seidel_amd.com@marah.osrc.amd.com>
2011-06-28  9:41     ` [PATCH tip:x86/urgent] x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines Tejun Heo
2011-06-28 12:35       ` Conny Seidel
2011-07-01 15:26       ` [tip:x86/urgent] " tip-bot for Tejun Heo
     [not found]     ` <20110628174613.GP478@escobedo.osrc.amd.com>
2011-06-29  9:44       ` 32bit NUMA and fakeNUMA broken for AMD CPUs Tejun Heo
2011-06-29 10:51         ` Tejun Heo
2011-06-29 12:34         ` Tejun Heo
2011-06-29 12:55           ` Hans Rosenfeld
2011-06-29 13:03             ` Tejun Heo
2011-06-29 16:15               ` Tejun Heo
2011-06-30 13:13                 ` Hans Rosenfeld
2011-06-30 15:55                   ` Tejun Heo
2011-06-30 16:32                     ` Hans Rosenfeld
2011-06-30 16:42                       ` Tejun Heo
2011-06-30 17:04                         ` Hans Rosenfeld
2011-07-01 16:22         ` [PATCH x86/urgent 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
2011-07-01 16:23           ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
2011-07-09  8:32             ` Tejun Heo
2011-07-09  8:42               ` H. Peter Anvin
2011-07-11  8:34                 ` [PATCH x86/urgent] x86: Disable AMD_NUMA for 32bit for now Tejun Heo
2011-07-11 14:01                   ` Tejun Heo
2011-07-11 18:58                   ` [tip:x86/urgent] " tip-bot for Tejun Heo
2011-07-11 14:20                 ` [PATCH x86/urgent 2/2] x86: Implement pfn -> nid mapping granularity check Hans Rosenfeld
2011-07-13  5:34       ` [tip:x86/numa] x86, numa: " tip-bot for Tejun Heo
  -- strict thread matches above, loose matches on Subject: below --
2011-07-12  7:44 [PATCH x86/mm 1/2] x86: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ Tejun Heo
2011-07-12  7:45 ` [PATCH x86/mm 2/2] x86: Implement pfn -> nid mapping granularity check Tejun Heo
2011-07-13  5:33 ` [tip:x86/numa] x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ tip-bot for Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox