From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andre Przywara Subject: Re: Dom0 crash with old style AMD NUMA detection Date: Fri, 21 Sep 2012 19:49:31 +0200 Message-ID: <505CA8AB.6000808@amd.com> References: <501BC20F.3040205@amd.com> <20120803123628.GB10670@andromeda.dapyr.net> <20120817142237.GA8467@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120817142237.GA8467@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: Konrad Rzeszutek Wilk , Jeremy Fitzhardinge , xen-devel List-Id: xen-devel@lists.xenproject.org On 08/17/2012 04:22 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote: >> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote: Sorry Konrad, almost forgot. Comment (and Ack) below... >>> we see Dom0 crashes due to the kernel detecting the NUMA topology not by >>> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA). >>> >>> This will detect the actual NUMA config of the physical machine, but >>> will crash about the mismatch with Dom0's virtual memory. Variation of >>> the theme: Dom0 sees what it's not supposed to see. >>> >>> This happens with the said config option enabled and on a machine where >>> this scanning is still enabled (K8 and Fam10h, not Bulldozer class) >>> >>> We have this dump then: >>> [ 0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1 >>> distance=10 >>> [ 0.000000] Scanning NUMA topology in Northbridge 24 >>> [ 0.000000] Number of physical nodes 4 >>> [ 0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000 >>> [ 0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000 >>> [ 0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000 >>> [ 0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000 >>> [ 0.000000] Initmem setup node 0 0000000000000000-0000000040000000 >>> [ 0.000000] NODE_DATA [000000003ffd9000 - 000000003fffffff] >>> [ 0.000000] Initmem setup node 1 0000000040000000-0000000138000000 >>> [ 0.000000] NODE_DATA [0000000137fd9000 - 0000000137ffffff] >>> [ 0.000000] Initmem setup node 2 0000000138000000-00000001f8000000 >>> [ 0.000000] NODE_DATA [00000001f095e000 - 00000001f0984fff] >>> [ 0.000000] Initmem setup node 3 00000001f8000000-0000000238000000 >>> [ 0.000000] Cannot find 159744 bytes in node 3 >>> [ 0.000000] BUG: unable to handle kernel NULL pointer dereference at >>> (null) >>> [ 0.000000] IP: [] __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] PGD 0 >>> [ 0.000000] Oops: 0000 [#1] SMP >>> [ 0.000000] CPU 0 >>> [ 0.000000] Modules linked in: >>> [ 0.000000] >>> [ 0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar >>> [ 0.000000] RIP: e030:[] [] >>> __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] RSP: e02b:ffffffff81c01de8 EFLAGS: 00010046 >>> [ 0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: >>> 0000000000000000 >>> [ 0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: >>> 0000000000000000 >>> [ 0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: >>> 0000000000000000 >>> [ 0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: >>> 0000000000000000 >>> [ 0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: >>> 0000000000000003 >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffff81ced000(0000) >>> knlGS:0000000000000000 >>> [ 0.000000] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: >>> 0000000000000660 >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>> 0000000000000000 >>> [ 0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: >>> 0000000000000000 >>> [ 0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, >>> task ffffffff81c0d020) >>> [ 0.000000] Stack: >>> [ 0.000000] 00000000000000c0 0000000000000003 0000000000000000 >>> 000000000000003f >>> [ 0.000000] ffffffff81c01e68 ffffffff81d23024 0000000000400000 >>> 0000000000000002 >>> [ 0.000000] 0000000000080000 ffff8801f055e000 ffff8801f055e1f8 >>> 0000000000000000 >>> [ 0.000000] Call Trace: >>> [ 0.000000] [] >>> sparse_early_usemaps_alloc_node+0x64/0x178 >>> [ 0.000000] [] sparse_init+0xe4/0x25a >>> [ 0.000000] [] paging_init+0x13/0x22 >>> [ 0.000000] [] setup_arch+0x9c6/0xa9b >>> [ 0.000000] [] ? printk+0x3c/0x3e >>> [ 0.000000] [] start_kernel+0xe5/0x468 >>> [ 0.000000] [] x86_64_start_reservations+0xba/0xc1 >>> [ 0.000000] [] ? xen_setup_runstate_info+0x2c/0x36 >>> [ 0.000000] [] xen_start_kernel+0x565/0x56c >>> [ 0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 >>> be 2a >>> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f >>> <41> 8b >>> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de >>> [ 0.000000] RIP [] __alloc_bootmem_node+0x43/0x96 >>> [ 0.000000] RSP >>> [ 0.000000] CR2: 0000000000000000 >>> [ 0.000000] ---[ end trace a7919e7f17c0a725 ]--- >>> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! >>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting. >>> >>> >>> >>> The obvious solution would be to explicitly deny northbridge scanning >>> when running as Dom0, though I am not sure how to implement this without >>> upsetting the other kernel folks about "that crappy Xen thing" again ;-) >> >> Heh. >> Is there a numa=0 option that could be used to override it to turn it >> off? > > Not compile tested.. but was thinking something like this: > > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c > index 43fd630..838cc1f 100644 > --- a/arch/x86/xen/setup.c > +++ b/arch/x86/xen/setup.c > @@ -17,6 +17,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -528,4 +529,7 @@ void __init xen_arch_setup(void) > disable_cpufreq(); > WARN_ON(set_pm_idle_to_default()); > fiddle_vdso(); > +#ifdef CONFIG_NUMA > + numa_off = 1; > +#endif > } > Acked-by: Andre Przywara I compiled and boot-tested this on my (single node ;-) test box. First bare-metal, dmesg: No NUMA configuration found Then again, but with numa=off on the cmd-line: NUMA turned off Then under Xen as Dom0 kernel: NUMA turned off So the code behaves under Xen as one would have explicitly specified numa=off, which is what we want. I couldn't get hold of the test machine (old K8 server) that the bug was once triggered, that's why I'm reluctant to give my Tested-by. Will try this ASAP. Regards, Andre. -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany