xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Dom0 crash with old style AMD NUMA detection
@ 2012-08-03 12:20 Andre Przywara
  2012-08-03 12:36 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-08-03 12:20 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge; +Cc: xen-devel

Hi,

we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).

This will detect the actual NUMA config of the physical machine, but 
will crash about the mismatch with Dom0's virtual memory. Variation of 
the theme: Dom0 sees what it's not supposed to see.

This happens with the said config option enabled and on a machine where 
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)

We have this dump then:
[    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
distance=10
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 4
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
[    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
[    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
[    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
[    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
[    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
[    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
[    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
[    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
[    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
[    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
[    0.000000] Cannot find 159744 bytes in node 3
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
(null)
[    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000] PGD 0
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
[    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
__alloc_bootmem_node+0x43/0x96
[    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
[    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
0000000000000000
[    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
0000000000000000
[    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
0000000000000000
[    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
0000000000000000
[    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
0000000000000003
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
knlGS:0000000000000000
[    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
0000000000000660
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
0000000000000000
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
task ffffffff81c0d020)
[    0.000000] Stack:
[    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
000000000000003f
[    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
0000000000000002
[    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
0000000000000000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81d23024>] 
sparse_early_usemaps_alloc_node+0x64/0x178
[    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
[    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
[    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
[    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
[    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
[    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
be 2a
01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
<41> 8b
bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
[    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000]  RSP <ffffffff81c01de8>
[    0.000000] CR2: 0000000000000000
[    0.000000] ---[ end trace a7919e7f17c0a725 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.



The obvious solution would be to explicitly deny northbridge scanning 
when running as Dom0, though I am not sure how to implement this without 
upsetting the other kernel folks about "that crappy Xen thing" again ;-)

Could someone propose a fix for this (I am OoO for the next two weeks).

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-08-03 12:20 Dom0 crash with old style AMD NUMA detection Andre Przywara
@ 2012-08-03 12:36 ` Konrad Rzeszutek Wilk
  2012-08-17 14:22   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-03 12:36 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Jeremy Fitzhardinge, xen-devel, Konrad Rzeszutek Wilk

On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:
> Hi,
> 
> we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
> 
> This will detect the actual NUMA config of the physical machine, but 
> will crash about the mismatch with Dom0's virtual memory. Variation of 
> the theme: Dom0 sees what it's not supposed to see.
> 
> This happens with the said config option enabled and on a machine where 
> this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
> 
> We have this dump then:
> [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> distance=10
> [    0.000000] Scanning NUMA topology in Northbridge 24
> [    0.000000] Number of physical nodes 4
> [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
> [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
> [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
> [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
> [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
> [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
> [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
> [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
> [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
> [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
> [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
> [    0.000000] Cannot find 159744 bytes in node 3
> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
> (null)
> [    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> [    0.000000] PGD 0
> [    0.000000] Oops: 0000 [#1] SMP
> [    0.000000] CPU 0
> [    0.000000] Modules linked in:
> [    0.000000]
> [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
> [    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
> __alloc_bootmem_node+0x43/0x96
> [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
> [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
> 0000000000000000
> [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
> 0000000000000000
> [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
> 0000000000000000
> [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
> 0000000000000000
> [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
> 0000000000000003
> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
> knlGS:0000000000000000
> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
> 0000000000000660
> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
> 0000000000000000
> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
> task ffffffff81c0d020)
> [    0.000000] Stack:
> [    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
> 000000000000003f
> [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
> 0000000000000002
> [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
> 0000000000000000
> [    0.000000] Call Trace:
> [    0.000000]  [<ffffffff81d23024>] 
> sparse_early_usemaps_alloc_node+0x64/0x178
> [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
> [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
> [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
> [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
> [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
> [    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
> [    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
> [    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
> [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
> be 2a
> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
> <41> 8b
> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
> [    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> [    0.000000]  RSP <ffffffff81c01de8>
> [    0.000000] CR2: 0000000000000000
> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> 
> 
> 
> The obvious solution would be to explicitly deny northbridge scanning 
> when running as Dom0, though I am not sure how to implement this without 
> upsetting the other kernel folks about "that crappy Xen thing" again ;-)

Heh.
Is there a numa=0 option that could be used to override it to turn it
off?
> 
> Could someone propose a fix for this (I am OoO for the next two weeks).
> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-08-03 12:36 ` Konrad Rzeszutek Wilk
@ 2012-08-17 14:22   ` Konrad Rzeszutek Wilk
  2012-09-14 18:58     ` Konrad Rzeszutek Wilk
  2012-09-21 17:49     ` Andre Przywara
  0 siblings, 2 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-08-17 14:22 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andre Przywara, Jeremy Fitzhardinge, xen-devel

On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:
> > Hi,
> > 
> > we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
> > ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
> > 
> > This will detect the actual NUMA config of the physical machine, but 
> > will crash about the mismatch with Dom0's virtual memory. Variation of 
> > the theme: Dom0 sees what it's not supposed to see.
> > 
> > This happens with the said config option enabled and on a machine where 
> > this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
> > 
> > We have this dump then:
> > [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
> > distance=10
> > [    0.000000] Scanning NUMA topology in Northbridge 24
> > [    0.000000] Number of physical nodes 4
> > [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
> > [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
> > [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
> > [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
> > [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
> > [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
> > [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
> > [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
> > [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
> > [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
> > [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
> > [    0.000000] Cannot find 159744 bytes in node 3
> > [    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
> > (null)
> > [    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> > [    0.000000] PGD 0
> > [    0.000000] Oops: 0000 [#1] SMP
> > [    0.000000] CPU 0
> > [    0.000000] Modules linked in:
> > [    0.000000]
> > [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
> > [    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
> > __alloc_bootmem_node+0x43/0x96
> > [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
> > [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
> > 0000000000000000
> > [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
> > 0000000000000000
> > [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
> > 0000000000000000
> > [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
> > 0000000000000000
> > [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
> > 0000000000000003
> > [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
> > knlGS:0000000000000000
> > [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
> > 0000000000000660
> > [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> > 0000000000000000
> > [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
> > 0000000000000000
> > [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
> > task ffffffff81c0d020)
> > [    0.000000] Stack:
> > [    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
> > 000000000000003f
> > [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
> > 0000000000000002
> > [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
> > 0000000000000000
> > [    0.000000] Call Trace:
> > [    0.000000]  [<ffffffff81d23024>] 
> > sparse_early_usemaps_alloc_node+0x64/0x178
> > [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
> > [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
> > [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
> > [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
> > [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
> > [    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
> > [    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
> > [    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
> > [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
> > be 2a
> > 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
> > <41> 8b
> > bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
> > [    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
> > [    0.000000]  RSP <ffffffff81c01de8>
> > [    0.000000] CR2: 0000000000000000
> > [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> > 
> > 
> > 
> > The obvious solution would be to explicitly deny northbridge scanning 
> > when running as Dom0, though I am not sure how to implement this without 
> > upsetting the other kernel folks about "that crappy Xen thing" again ;-)
> 
> Heh.
> Is there a numa=0 option that could be used to override it to turn it
> off?

Not compile tested.. but was thinking something like this:

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 43fd630..838cc1f 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,6 +17,7 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/acpi.h>
+#include <asm/numa.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
 	disable_cpufreq();
 	WARN_ON(set_pm_idle_to_default());
 	fiddle_vdso();
+#ifdef CONFIG_NUMA
+	numa_off = 1;
+#endif
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-08-17 14:22   ` Konrad Rzeszutek Wilk
@ 2012-09-14 18:58     ` Konrad Rzeszutek Wilk
  2012-09-17  7:29       ` Andre Przywara
  2012-09-21 17:49     ` Andre Przywara
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-14 18:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andre Przywara, Jeremy Fitzhardinge, xen-devel

> > > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > > (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> > > 
> > > 
> > > 
> > > The obvious solution would be to explicitly deny northbridge scanning 
> > > when running as Dom0, though I am not sure how to implement this without 
> > > upsetting the other kernel folks about "that crappy Xen thing" again ;-)
> > 
> > Heh.
> > Is there a numa=0 option that could be used to override it to turn it
> > off?
> 
> Not compile tested.. but was thinking something like this:

ping?
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 43fd630..838cc1f 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>  #include <asm/e820.h>
>  #include <asm/setup.h>
>  #include <asm/acpi.h>
> +#include <asm/numa.h>
>  #include <asm/xen/hypervisor.h>
>  #include <asm/xen/hypercall.h>
>  
> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>  	disable_cpufreq();
>  	WARN_ON(set_pm_idle_to_default());
>  	fiddle_vdso();
> +#ifdef CONFIG_NUMA
> +	numa_off = 1;
> +#endif
>  }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-14 18:58     ` Konrad Rzeszutek Wilk
@ 2012-09-17  7:29       ` Andre Przywara
  2012-09-17 19:14         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-09-17  7:29 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, Dario Faggioli,
	xen-devel

On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
>>>>
>>>>
>>>>
>>>> The obvious solution would be to explicitly deny northbridge scanning
>>>> when running as Dom0, though I am not sure how to implement this without
>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-)
>>>
>>> Heh.
>>> Is there a numa=0 option that could be used to override it to turn it
>>> off?
>>
>> Not compile tested.. but was thinking something like this:
>
> ping?

That looks good to me - at least for the time being.
I just want to check how this interacts with upcoming Dom0 NUMA support. 
It wouldn't be too clever if we deliberately disable NUMA and future Xen 
version will allow us to use it. So let me check if I can confine this 
turn-off to the fallback K8 northbridge reading.

Thanks,
Andre.

>>
>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>> index 43fd630..838cc1f 100644
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -17,6 +17,7 @@
>>   #include <asm/e820.h>
>>   #include <asm/setup.h>
>>   #include <asm/acpi.h>
>> +#include <asm/numa.h>
>>   #include <asm/xen/hypervisor.h>
>>   #include <asm/xen/hypercall.h>
>>
>> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>>   	disable_cpufreq();
>>   	WARN_ON(set_pm_idle_to_default());
>>   	fiddle_vdso();
>> +#ifdef CONFIG_NUMA
>> +	numa_off = 1;
>> +#endif
>>   }
>



-- 
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-17  7:29       ` Andre Przywara
@ 2012-09-17 19:14         ` Konrad Rzeszutek Wilk
  2012-09-18  9:57           ` Andre Przywara
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-17 19:14 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Dario Faggioli, Konrad Rzeszutek Wilk

On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> >>>>(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> >>>>
> >>>>
> >>>>
> >>>>The obvious solution would be to explicitly deny northbridge scanning
> >>>>when running as Dom0, though I am not sure how to implement this without
> >>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-)
> >>>
> >>>Heh.
> >>>Is there a numa=0 option that could be used to override it to turn it
> >>>off?
> >>
> >>Not compile tested.. but was thinking something like this:
> >
> >ping?
> 
> That looks good to me - at least for the time being.

OK, can I've your Tested-by/Acked-by on it pls?

> I just want to check how this interacts with upcoming Dom0 NUMA
> support. It wouldn't be too clever if we deliberately disable NUMA

We can always revert this patch in future versions of Linux.
> and future Xen version will allow us to use it. So let me check if I
> can confine this turn-off to the fallback K8 northbridge reading.

This potentially could work, but I would prefer to not do it for 3.6.

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a4790bf..b4edce4 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,6 +17,7 @@
 #include <asm/e820.h>
 #include <asm/setup.h>
 #include <asm/acpi.h>
+#include <asm/numa.h>
 #include <asm/xen/hypervisor.h>
 #include <asm/xen/hypercall.h>
 
@@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
 	if(ret != 0)
 		setup_clear_cpu_cap(sysenter_feature);
 }
+#ifdef CONFIG_AMD_NUMA
+int __cpuinit xen_amd_k8(void)
+{
+	int num;
+
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+		return -ENOENT;
+
+	for (num = 0; num < 32; num++) {
+		u32 header;
+
+		header = read_pci_config(0, num, 0, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
+			continue;
 
+		header = read_pci_config(0, num, 1, 0x00);
+		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
+			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
+			continue;
+		return num;
+	}
+	return -ENOENT;
+#endif
 void __cpuinit xen_enable_syscall(void)
 {
 #ifdef CONFIG_X86_64
@@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
 	disable_cpufreq();
 	WARN_ON(set_pm_idle_to_default());
 	fiddle_vdso();
+#ifdef CONFIG_AMD_NUMA
+	if (xen_amd_k8() >= 0)
+		numa_off=1;
+#endif
 }

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-17 19:14         ` Konrad Rzeszutek Wilk
@ 2012-09-18  9:57           ` Andre Przywara
  2012-09-18 13:44             ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-09-18  9:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Dario Faggioli, Konrad Rzeszutek Wilk

On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>>>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The obvious solution would be to explicitly deny northbridge scanning
>>>>>> when running as Dom0, though I am not sure how to implement this without
>>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-)
>>>>>
>>>>> Heh.
>>>>> Is there a numa=0 option that could be used to override it to turn it
>>>>> off?
>>>>
>>>> Not compile tested.. but was thinking something like this:
>>>
>>> ping?
>>
>> That looks good to me - at least for the time being.
>
> OK, can I've your Tested-by/Acked-by on it pls?
>
>> I just want to check how this interacts with upcoming Dom0 NUMA
>> support. It wouldn't be too clever if we deliberately disable NUMA
>
> We can always revert this patch in future versions of Linux.

I don't like this idea. Then we have Linux kernel up to 3.5 working and 
say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That would be 
pretty unfortunate.

I haven't checked back with Dario, but I'd suspect that we use ACPI for 
injecting NUMA topology into Dom0. Even if not, a general "numa=off" for 
Dom0 is too much of a sledgehammer for me.

>> and future Xen version will allow us to use it. So let me check if I
>> can confine this turn-off to the fallback K8 northbridge reading.
>
> This potentially could work, but I would prefer to not do it for 3.6.

Mmh, I don't get the idea of your patch below. One can always read the 
NUMA topology from the AMD northbridge, but this is deprecated if favor 
of ACPI. The amdtopology.c stuff was only there to enable NUMA for very 
early Opterons, where BIOSes didn't provide (sane) SRAT tables.
Though we disallow ACPI for NUMA on Dom0, this northbridge scanning 
unfortunately "shines through" the virtualization, actually revealing 
the system's NUMA topology, which is usually much different from Dom0's one.

So instead I want to do more something like this:

diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
index bfacd2c..7811c0d 100644
--- a/arch/x86/include/asm/numa.h
+++ b/arch/x86/include/asm/numa.h
@@ -20,6 +20,8 @@

  extern int numa_off;

+extern bool deny_amd_nb_numa_scan;
+
  /*
   * __apicid_to_node[] stores the raw mapping between physical apicid and
   * node and is used to initialize cpu_to_node mapping.
diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 5247d01..f223a67 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -29,6 +29,8 @@

  static unsigned char __initdata nodeids[8];

+bool deny_amd_nb_numa_scan = 0;
+
  static __init int find_northbridge(void)
  {
  	int num;
@@ -78,6 +80,9 @@ int __init amd_numa_init(void)
  	u32 nodeid, reg;
  	unsigned int bits, cores, apicid_base;

+	if (deny_amd_nb_numa_scan)
+		return -ENOENT;
+
  	if (!early_pci_allowed())
  		return -EINVAL;

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index d11ca11..6db63c0 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
  	}
  #endif

+	deny_amd_nb_numa_scan = 1;
+
  	memcpy(boot_command_line, xen_start_info->cmd_line,
  	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
  	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);

This would just turn off this one kind of NUMA discovery for Dom0.
The patch is admittedly a bit rough (not sure about the proper placement 
into #ifdef's, for instance) and not well tested yet.
Also one could think about using a more general variable name to cover 
other hardware things in the future that Dom0 shouldn't use.
So this isn't something still for 3.6, probably not even for 3.7.

What about if we drop the patch for this problem at all for 3.6 and 
recommend "numa=off" as a workaround? This is much less sticky than a 
kernel patch and could appear in the Xen wiki, for instance.
After all this isn't a strict regression (appears with every 3.x kernel, 
AFAICT).
Most of the time the northbridge scanning will yield bogus results, so 
the kernel eventually discards it, but sometimes it seems to slip 
through and causes trouble.
Also it does not trigger on newer (Bulldozer) class CPUs, since we 
deliberately avoided adding the new northbridge PCI-ID for this routine.

Regards,
Andre.

>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index a4790bf..b4edce4 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>   #include <asm/e820.h>
>   #include <asm/setup.h>
>   #include <asm/acpi.h>
> +#include <asm/numa.h>
>   #include <asm/xen/hypervisor.h>
>   #include <asm/xen/hypercall.h>
>
> @@ -483,7 +484,32 @@ void __cpuinit xen_enable_sysenter(void)
>   	if(ret != 0)
>   		setup_clear_cpu_cap(sysenter_feature);
>   }
> +#ifdef CONFIG_AMD_NUMA
> +int __cpuinit xen_amd_k8(void)
> +{
> +	int num;
> +
> +	if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
> +		return -ENOENT;
> +
> +	for (num = 0; num < 32; num++) {
> +		u32 header;
> +
> +		header = read_pci_config(0, num, 0, 0x00);
> +		if (header != (PCI_VENDOR_ID_AMD | (0x1100<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1200<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1300<<16)))
> +			continue;
>
> +		header = read_pci_config(0, num, 1, 0x00);
> +		if (header != (PCI_VENDOR_ID_AMD | (0x1101<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1201<<16)) &&
> +			header != (PCI_VENDOR_ID_AMD | (0x1301<<16)))
> +			continue;
> +		return num;
> +	}
> +	return -ENOENT;
> +#endif
>   void __cpuinit xen_enable_syscall(void)
>   {
>   #ifdef CONFIG_X86_64
> @@ -542,4 +568,8 @@ void __init xen_arch_setup(void)
>   	disable_cpufreq();
>   	WARN_ON(set_pm_idle_to_default());
>   	fiddle_vdso();
> +#ifdef CONFIG_AMD_NUMA
> +	if (xen_amd_k8() >= 0)
> +		numa_off=1;
> +#endif
>   }
>



-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-18  9:57           ` Andre Przywara
@ 2012-09-18 13:44             ` Konrad Rzeszutek Wilk
  2012-09-18 16:50               ` Andre Przywara
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-18 13:44 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Dario Faggioli, Konrad Rzeszutek Wilk

On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:
> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> >On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
> >>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>>[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> >>>>>>(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>The obvious solution would be to explicitly deny northbridge scanning
> >>>>>>when running as Dom0, though I am not sure how to implement this without
> >>>>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-)
> >>>>>
> >>>>>Heh.
> >>>>>Is there a numa=0 option that could be used to override it to turn it
> >>>>>off?
> >>>>
> >>>>Not compile tested.. but was thinking something like this:
> >>>
> >>>ping?
> >>
> >>That looks good to me - at least for the time being.
> >
> >OK, can I've your Tested-by/Acked-by on it pls?
> >
> >>I just want to check how this interacts with upcoming Dom0 NUMA
> >>support. It wouldn't be too clever if we deliberately disable NUMA
> >
> >We can always revert this patch in future versions of Linux.
> 
> I don't like this idea. Then we have Linux kernel up to 3.5 working
> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
> would be pretty unfortunate.

Huh? v3.5 working? But it never worked? I would say turn off the NUMA
detection (keep in mind it still will set up the dummy NUMA stuff)
until there are some PV NUMA capability and then we can revert it.

> 
> I haven't checked back with Dario, but I'd suspect that we use ACPI
> for injecting NUMA topology into Dom0. Even if not, a general
> "numa=off" for Dom0 is too much of a sledgehammer for me.

How would you inject it in Dom0? It s a PV guest so the hypervisor would
have to tweak the SRAT/SLIT tables. That is not going to happen
in the very short term.. And I don't recall seeing any patches, so
the dom0 NUMA support is right now non-existent?

> 
> >>and future Xen version will allow us to use it. So let me check if I
> >>can confine this turn-off to the fallback K8 northbridge reading.
> >
> >This potentially could work, but I would prefer to not do it for 3.6.
> 
> Mmh, I don't get the idea of your patch below. One can always read
> the NUMA topology from the AMD northbridge, but this is deprecated
> if favor of ACPI. The amdtopology.c stuff was only there to enable
> NUMA for very early Opterons, where BIOSes didn't provide (sane)
> SRAT tables.
> Though we disallow ACPI for NUMA on Dom0, this northbridge scanning
> unfortunately "shines through" the virtualization, actually
> revealing the system's NUMA topology, which is usually much
> different from Dom0's one.

Right, but isn't that what you found broke? It wasn't ACPI NUMA
but the old-style K8 northbridge information? That is what we are
trying to fix.

> 
> So instead I want to do more something like this:
> 
> diff --git a/arch/x86/include/asm/numa.h b/arch/x86/include/asm/numa.h
> index bfacd2c..7811c0d 100644
> --- a/arch/x86/include/asm/numa.h
> +++ b/arch/x86/include/asm/numa.h
> @@ -20,6 +20,8 @@
> 
>  extern int numa_off;
> 
> +extern bool deny_amd_nb_numa_scan;
> +
>  /*
>   * __apicid_to_node[] stores the raw mapping between physical apicid and
>   * node and is used to initialize cpu_to_node mapping.
> diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
> index 5247d01..f223a67 100644
> --- a/arch/x86/mm/amdtopology.c
> +++ b/arch/x86/mm/amdtopology.c
> @@ -29,6 +29,8 @@
> 
>  static unsigned char __initdata nodeids[8];
> 
> +bool deny_amd_nb_numa_scan = 0;
> +
>  static __init int find_northbridge(void)
>  {
>  	int num;
> @@ -78,6 +80,9 @@ int __init amd_numa_init(void)
>  	u32 nodeid, reg;
>  	unsigned int bits, cores, apicid_base;
> 
> +	if (deny_amd_nb_numa_scan)
> +		return -ENOENT;
> +
>  	if (!early_pci_allowed())
>  		return -EINVAL;
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index d11ca11..6db63c0 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -532,6 +532,8 @@ void __init xen_arch_setup(void)
>  	}
>  #endif
> 
> +	deny_amd_nb_numa_scan = 1;
> +
>  	memcpy(boot_command_line, xen_start_info->cmd_line,
>  	       MAX_GUEST_CMDLINE > COMMAND_LINE_SIZE ?
>  	       COMMAND_LINE_SIZE : MAX_GUEST_CMDLINE);
> 
> This would just turn off this one kind of NUMA discovery for Dom0.
> The patch is admittedly a bit rough (not sure about the proper
> placement into #ifdef's, for instance) and not well tested yet.
> Also one could think about using a more general variable name to
> cover other hardware things in the future that Dom0 shouldn't use.
> So this isn't something still for 3.6, probably not even for 3.7.
> 
> What about if we drop the patch for this problem at all for 3.6 and
> recommend "numa=off" as a workaround? This is much less sticky than
> a kernel patch and could appear in the Xen wiki, for instance.

I hate workarounds. People end up using them forever and they get
codified.

> After all this isn't a strict regression (appears with every 3.x
> kernel, AFAICT).
> Most of the time the northbridge scanning will yield bogus results,
> so the kernel eventually discards it, but sometimes it seems to slip
> through and causes trouble.
> Also it does not trigger on newer (Bulldozer) class CPUs, since we
> deliberately avoided adding the new northbridge PCI-ID for this
> routine.

Right, you end up using the ACPI NUMA in them.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-18 16:50               ` Andre Przywara
@ 2012-09-18 14:55                 ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-18 14:55 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Dario Faggioli, Konrad Rzeszutek Wilk

On Tue, Sep 18, 2012 at 06:50:14PM +0200, Andre Przywara wrote:
> On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote:
> >On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:
> >>On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
> >>>On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
> >>>>On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
> >>>>>>>>[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> >>>>>>>>(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>The obvious solution would be to explicitly deny northbridge scanning
> >>>>>>>>when running as Dom0, though I am not sure how to implement this without
> >>>>>>>>upsetting the other kernel folks about "that crappy Xen thing" again ;-)
> >>>>>>>
> >>>>>>>Heh.
> >>>>>>>Is there a numa=0 option that could be used to override it to turn it
> >>>>>>>off?
> >>>>>>
> >>>>>>Not compile tested.. but was thinking something like this:
> >>>>>
> >>>>>ping?
> >>>>
> >>>>That looks good to me - at least for the time being.
> >>>
> >>>OK, can I've your Tested-by/Acked-by on it pls?
> >>>
> >>>>I just want to check how this interacts with upcoming Dom0 NUMA
> >>>>support. It wouldn't be too clever if we deliberately disable NUMA
> >>>
> >>>We can always revert this patch in future versions of Linux.
> >>
> >>I don't like this idea. Then we have Linux kernel up to 3.5 working
> >>and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
> >>would be pretty unfortunate.
> >
> >Huh? v3.5 working? But it never worked? I would say turn off the NUMA
> >detection (keep in mind it still will set up the dummy NUMA stuff)
> >until there are some PV NUMA capability and then we can revert it.
> 
> I was under the impression that somehow the Dom0 NUMA would be made
> compatible, using some of the existing discovery mechanisms. So we
> would enable the hypervisor, and Dom0 would just magically start
> working. I am probably rooted too much in the HVM world ;-)
> 
> >>
> >>I haven't checked back with Dario, but I'd suspect that we use ACPI
> >>for injecting NUMA topology into Dom0. Even if not, a general
> >>"numa=off" for Dom0 is too much of a sledgehammer for me.
> >
> >How would you inject it in Dom0? It s a PV guest so the hypervisor would
> >have to tweak the SRAT/SLIT tables. That is not going to happen
> >in the very short term.. And I don't recall seeing any patches, so
> >the dom0 NUMA support is right now non-existent?
> 
> Right, I just don't wanted to slam the door deliberately. Thinking
> more about this, we probably need some kind of PV enablement in
> Dom0, even if we could somehow use the ACPI tables (and thus the
> ACPI parsing code).
> If this is the case, we could at the same time remove this "force
> numa off" patch.
> 
> I am almost convinced by now.
> Just waiting for Dario's opinion for a few more hours and will send
> my final opinion later today. If you cannot wait, tell me.

Couple of days is OK with me. My deadline is Friday as I would like
to send a git pull to Linus and include this patch if it makes sense.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-18 13:44             ` Konrad Rzeszutek Wilk
@ 2012-09-18 16:50               ` Andre Przywara
  2012-09-18 14:55                 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-09-18 16:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Dario Faggioli, Konrad Rzeszutek Wilk

On 09/18/2012 03:44 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, Sep 18, 2012 at 11:57:33AM +0200, Andre Przywara wrote:
>> On 09/17/2012 09:14 PM, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Sep 17, 2012 at 09:29:22AM +0200, Andre Przywara wrote:
>>>> On 09/14/2012 08:58 PM, Konrad Rzeszutek Wilk wrote:
>>>>>>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>>>>>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> The obvious solution would be to explicitly deny northbridge scanning
>>>>>>>> when running as Dom0, though I am not sure how to implement this without
>>>>>>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-)
>>>>>>>
>>>>>>> Heh.
>>>>>>> Is there a numa=0 option that could be used to override it to turn it
>>>>>>> off?
>>>>>>
>>>>>> Not compile tested.. but was thinking something like this:
>>>>>
>>>>> ping?
>>>>
>>>> That looks good to me - at least for the time being.
>>>
>>> OK, can I've your Tested-by/Acked-by on it pls?
>>>
>>>> I just want to check how this interacts with upcoming Dom0 NUMA
>>>> support. It wouldn't be too clever if we deliberately disable NUMA
>>>
>>> We can always revert this patch in future versions of Linux.
>>
>> I don't like this idea. Then we have Linux kernel up to 3.5 working
>> and say from 3.8 on again, but 3.6 and 3.7 cannot use NUMA. That
>> would be pretty unfortunate.
>
> Huh? v3.5 working? But it never worked? I would say turn off the NUMA
> detection (keep in mind it still will set up the dummy NUMA stuff)
> until there are some PV NUMA capability and then we can revert it.

I was under the impression that somehow the Dom0 NUMA would be made 
compatible, using some of the existing discovery mechanisms. So we would 
enable the hypervisor, and Dom0 would just magically start working. I am 
probably rooted too much in the HVM world ;-)

>>
>> I haven't checked back with Dario, but I'd suspect that we use ACPI
>> for injecting NUMA topology into Dom0. Even if not, a general
>> "numa=off" for Dom0 is too much of a sledgehammer for me.
>
> How would you inject it in Dom0? It s a PV guest so the hypervisor would
> have to tweak the SRAT/SLIT tables. That is not going to happen
> in the very short term.. And I don't recall seeing any patches, so
> the dom0 NUMA support is right now non-existent?

Right, I just don't wanted to slam the door deliberately. Thinking more 
about this, we probably need some kind of PV enablement in Dom0, even if 
we could somehow use the ACPI tables (and thus the ACPI parsing code).
If this is the case, we could at the same time remove this "force numa 
off" patch.

I am almost convinced by now.
Just waiting for Dario's opinion for a few more hours and will send my 
final opinion later today. If you cannot wait, tell me.


Andre.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-21 17:49     ` Andre Przywara
@ 2012-09-21 17:48       ` Konrad Rzeszutek Wilk
  2012-09-21 23:46         ` Andre Przywara
  0 siblings, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-21 17:48 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Konrad Rzeszutek Wilk

> Acked-by: Andre Przywara <andre.przywara@amd.com>
> 
> I compiled and boot-tested this on my (single node ;-) test box.
> First bare-metal, dmesg: No NUMA configuration found
> Then again, but with numa=off on the cmd-line: NUMA turned off
> Then under Xen as Dom0 kernel: NUMA turned off
> 
> So the code behaves under Xen as one would have explicitly specified
> numa=off, which is what we want.

Right.
> I couldn't get hold of the test machine (old K8 server) that the bug
> was once triggered, that's why I'm reluctant to give my Tested-by.
> Will try this ASAP.

OK, will wait with this - it would be a bit silly if the patch did not
fix the issue :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-08-17 14:22   ` Konrad Rzeszutek Wilk
  2012-09-14 18:58     ` Konrad Rzeszutek Wilk
@ 2012-09-21 17:49     ` Andre Przywara
  2012-09-21 17:48       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-09-21 17:49 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel

On 08/17/2012 04:22 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Aug 03, 2012 at 08:36:28AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Fri, Aug 03, 2012 at 02:20:31PM +0200, Andre Przywara wrote:

Sorry Konrad, almost forgot.
Comment (and Ack) below...

>>> we see Dom0 crashes due to the kernel detecting the NUMA topology not by
>>> ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).
>>>
>>> This will detect the actual NUMA config of the physical machine, but
>>> will crash about the mismatch with Dom0's virtual memory. Variation of
>>> the theme: Dom0 sees what it's not supposed to see.
>>>
>>> This happens with the said config option enabled and on a machine where
>>> this scanning is still enabled (K8 and Fam10h, not Bulldozer class)
>>>
>>> We have this dump then:
>>> [    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
>>> distance=10
>>> [    0.000000] Scanning NUMA topology in Northbridge 24
>>> [    0.000000] Number of physical nodes 4
>>> [    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
>>> [    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
>>> [    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
>>> [    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
>>> [    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
>>> [    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
>>> [    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
>>> [    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
>>> [    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
>>> [    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
>>> [    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
>>> [    0.000000] Cannot find 159744 bytes in node 3
>>> [    0.000000] BUG: unable to handle kernel NULL pointer dereference at
>>> (null)
>>> [    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
>>> [    0.000000] PGD 0
>>> [    0.000000] Oops: 0000 [#1] SMP
>>> [    0.000000] CPU 0
>>> [    0.000000] Modules linked in:
>>> [    0.000000]
>>> [    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
>>> [    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>]
>>> __alloc_bootmem_node+0x43/0x96
>>> [    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
>>> [    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX:
>>> 0000000000000000
>>> [    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI:
>>> 0000000000000000
>>> [    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09:
>>> 0000000000000000
>>> [    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12:
>>> 0000000000000000
>>> [    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15:
>>> 0000000000000003
>>> [    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000)
>>> knlGS:0000000000000000
>>> [    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4:
>>> 0000000000000660
>>> [    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>> 0000000000000000
>>> [    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7:
>>> 0000000000000000
>>> [    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000,
>>> task ffffffff81c0d020)
>>> [    0.000000] Stack:
>>> [    0.000000]  00000000000000c0 0000000000000003 0000000000000000
>>> 000000000000003f
>>> [    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000
>>> 0000000000000002
>>> [    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8
>>> 0000000000000000
>>> [    0.000000] Call Trace:
>>> [    0.000000]  [<ffffffff81d23024>]
>>> sparse_early_usemaps_alloc_node+0x64/0x178
>>> [    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
>>> [    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
>>> [    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
>>> [    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
>>> [    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
>>> [    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
>>> [    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
>>> [    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
>>> [    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59
>>> be 2a
>>> 01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f
>>> <41>  8b
>>> bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
>>> [    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
>>> [    0.000000]  RSP<ffffffff81c01de8>
>>> [    0.000000] CR2: 0000000000000000
>>> [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
>>> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
>>> (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.
>>>
>>>
>>>
>>> The obvious solution would be to explicitly deny northbridge scanning
>>> when running as Dom0, though I am not sure how to implement this without
>>> upsetting the other kernel folks about "that crappy Xen thing" again ;-)
>>
>> Heh.
>> Is there a numa=0 option that could be used to override it to turn it
>> off?
>
> Not compile tested.. but was thinking something like this:
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 43fd630..838cc1f 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -17,6 +17,7 @@
>   #include<asm/e820.h>
>   #include<asm/setup.h>
>   #include<asm/acpi.h>
> +#include<asm/numa.h>
>   #include<asm/xen/hypervisor.h>
>   #include<asm/xen/hypercall.h>
>
> @@ -528,4 +529,7 @@ void __init xen_arch_setup(void)
>   	disable_cpufreq();
>   	WARN_ON(set_pm_idle_to_default());
>   	fiddle_vdso();
> +#ifdef CONFIG_NUMA
> +	numa_off = 1;
> +#endif
>   }
>

Acked-by: Andre Przywara <andre.przywara@amd.com>

I compiled and boot-tested this on my (single node ;-) test box.
First bare-metal, dmesg: No NUMA configuration found
Then again, but with numa=off on the cmd-line: NUMA turned off
Then under Xen as Dom0 kernel: NUMA turned off

So the code behaves under Xen as one would have explicitly specified 
numa=off, which is what we want.
I couldn't get hold of the test machine (old K8 server) that the bug was 
once triggered, that's why I'm reluctant to give my Tested-by.
Will try this ASAP.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-21 17:48       ` Konrad Rzeszutek Wilk
@ 2012-09-21 23:46         ` Andre Przywara
  2012-09-24 13:48           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 14+ messages in thread
From: Andre Przywara @ 2012-09-21 23:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Konrad Rzeszutek Wilk

On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote:
>> Acked-by: Andre Przywara<andre.przywara@amd.com>
>>
>> I compiled and boot-tested this on my (single node ;-) test box.
>> First bare-metal, dmesg: No NUMA configuration found
>> Then again, but with numa=off on the cmd-line: NUMA turned off
>> Then under Xen as Dom0 kernel: NUMA turned off
>>
>> So the code behaves under Xen as one would have explicitly specified
>> numa=off, which is what we want.
>
> Right.
>> I couldn't get hold of the test machine (old K8 server) that the bug
>> was once triggered, that's why I'm reluctant to give my Tested-by.
>> Will try this ASAP.
>
> OK, will wait with this - it would be a bit silly if the patch did not
> fix the issue :-)

Thanks for you patience. I tried some machines, it not only affects K8s, 
but also Barcelonas and Magny-Cours.
Boot those with a Xen HV and restrict Dom0's memory to something well 
below the first node's size (say dom0_mem=512M). If the 3.x Dom0 kernel 
has CONFIG_AMD_NUMA compiled in, the box will crash, because the 
hardware's NUMA info read from the northbridge does not fit to Dom0's 
understanding of it's memory.
With your fix the box booted fine, NUMA is turned off and everyone is happy.
Double checked by commenting the numa_off=1 line in your patch: crash 
again. So this line definitely fixes this.

Tested-by: Andre Przywara <andre.przywara@amd.com>

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Dom0 crash with old style AMD NUMA detection
  2012-09-21 23:46         ` Andre Przywara
@ 2012-09-24 13:48           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-09-24 13:48 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Konrad Rzeszutek Wilk, Jeremy Fitzhardinge, xen-devel,
	Konrad Rzeszutek Wilk

On Sat, Sep 22, 2012 at 01:46:57AM +0200, Andre Przywara wrote:
> On 09/21/2012 07:48 PM, Konrad Rzeszutek Wilk wrote:
> >>Acked-by: Andre Przywara<andre.przywara@amd.com>
> >>
> >>I compiled and boot-tested this on my (single node ;-) test box.
> >>First bare-metal, dmesg: No NUMA configuration found
> >>Then again, but with numa=off on the cmd-line: NUMA turned off
> >>Then under Xen as Dom0 kernel: NUMA turned off
> >>
> >>So the code behaves under Xen as one would have explicitly specified
> >>numa=off, which is what we want.
> >
> >Right.
> >>I couldn't get hold of the test machine (old K8 server) that the bug
> >>was once triggered, that's why I'm reluctant to give my Tested-by.
> >>Will try this ASAP.
> >
> >OK, will wait with this - it would be a bit silly if the patch did not
> >fix the issue :-)
> 
> Thanks for you patience. I tried some machines, it not only affects
> K8s, but also Barcelonas and Magny-Cours.
> Boot those with a Xen HV and restrict Dom0's memory to something
> well below the first node's size (say dom0_mem=512M). If the 3.x
> Dom0 kernel has CONFIG_AMD_NUMA compiled in, the box will crash,
> because the hardware's NUMA info read from the northbridge does not
> fit to Dom0's understanding of it's memory.
> With your fix the box booted fine, NUMA is turned off and everyone is happy.
> Double checked by commenting the numa_off=1 line in your patch:
> crash again. So this line definitely fixes this.
> 
> Tested-by: Andre Przywara <andre.przywara@amd.com>

OK, send out a git pull for it today. If Linus doesn't take it, I will just have
to do it in v3.7 time-frame and do the stable kernel backport.

Thanks again for testing and reporting this!
> 
> Regards,
> Andre.
> 
> -- 
> Andre Przywara
> AMD-Operating System Research Center (OSRC), Dresden, Germany
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-09-24 13:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-03 12:20 Dom0 crash with old style AMD NUMA detection Andre Przywara
2012-08-03 12:36 ` Konrad Rzeszutek Wilk
2012-08-17 14:22   ` Konrad Rzeszutek Wilk
2012-09-14 18:58     ` Konrad Rzeszutek Wilk
2012-09-17  7:29       ` Andre Przywara
2012-09-17 19:14         ` Konrad Rzeszutek Wilk
2012-09-18  9:57           ` Andre Przywara
2012-09-18 13:44             ` Konrad Rzeszutek Wilk
2012-09-18 16:50               ` Andre Przywara
2012-09-18 14:55                 ` Konrad Rzeszutek Wilk
2012-09-21 17:49     ` Andre Przywara
2012-09-21 17:48       ` Konrad Rzeszutek Wilk
2012-09-21 23:46         ` Andre Przywara
2012-09-24 13:48           ` Konrad Rzeszutek Wilk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).