All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Jeremy Fitzhardinge <jeremy@goop.org>
Cc: xen-devel <xen-devel@lists.xen.org>
Subject: Dom0 crash with old style AMD NUMA detection
Date: Fri, 3 Aug 2012 14:20:31 +0200	[thread overview]
Message-ID: <501BC20F.3040205@amd.com> (raw)

Hi,

we see Dom0 crashes due to the kernel detecting the NUMA topology not by 
ACPI, but directly from the northbridge (CONFIG_AMD_NUMA).

This will detect the actual NUMA config of the physical machine, but 
will crash about the mismatch with Dom0's virtual memory. Variation of 
the theme: Dom0 sees what it's not supposed to see.

This happens with the said config option enabled and on a machine where 
this scanning is still enabled (K8 and Fam10h, not Bulldozer class)

We have this dump then:
[    0.000000] NUMA: Warning: node ids are out of bound, from=-1 to=-1
distance=10
[    0.000000] Scanning NUMA topology in Northbridge 24
[    0.000000] Number of physical nodes 4
[    0.000000] Node 0 MemBase 0000000000000000 Limit 0000000040000000
[    0.000000] Node 1 MemBase 0000000040000000 Limit 0000000138000000
[    0.000000] Node 2 MemBase 0000000138000000 Limit 00000001f8000000
[    0.000000] Node 3 MemBase 00000001f8000000 Limit 0000000238000000
[    0.000000] Initmem setup node 0 0000000000000000-0000000040000000
[    0.000000]   NODE_DATA [000000003ffd9000 - 000000003fffffff]
[    0.000000] Initmem setup node 1 0000000040000000-0000000138000000
[    0.000000]   NODE_DATA [0000000137fd9000 - 0000000137ffffff]
[    0.000000] Initmem setup node 2 0000000138000000-00000001f8000000
[    0.000000]   NODE_DATA [00000001f095e000 - 00000001f0984fff]
[    0.000000] Initmem setup node 3 00000001f8000000-0000000238000000
[    0.000000] Cannot find 159744 bytes in node 3
[    0.000000] BUG: unable to handle kernel NULL pointer dereference at 
(null)
[    0.000000] IP: [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000] PGD 0
[    0.000000] Oops: 0000 [#1] SMP
[    0.000000] CPU 0
[    0.000000] Modules linked in:
[    0.000000]
[    0.000000] Pid: 0, comm: swapper Not tainted 3.3.6 #1 AMD Dinar/Dinar
[    0.000000] RIP: e030:[<ffffffff81d220e6>]  [<ffffffff81d220e6>] 
__alloc_bootmem_node+0x43/0x96
[    0.000000] RSP: e02b:ffffffff81c01de8  EFLAGS: 00010046
[    0.000000] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 
0000000000000000
[    0.000000] RDX: 0000000000000040 RSI: 00000000000000c0 RDI: 
0000000000000000
[    0.000000] RBP: ffffffff81c01e08 R08: 0000000000000000 R09: 
0000000000000000
[    0.000000] R10: 0000000000098000 R11: 0000000000000000 R12: 
0000000000000000
[    0.000000] R13: 0000000000000000 R14: 0000000000000040 R15: 
0000000000000003
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff81ced000(0000) 
knlGS:0000000000000000
[    0.000000] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 0000000000000000 CR3: 0000000001c05000 CR4: 
0000000000000660
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 
0000000000000000
[    0.000000] Process swapper (pid: 0, threadinfo ffffffff81c00000, 
task ffffffff81c0d020)
[    0.000000] Stack:
[    0.000000]  00000000000000c0 0000000000000003 0000000000000000 
000000000000003f
[    0.000000]  ffffffff81c01e68 ffffffff81d23024 0000000000400000 
0000000000000002
[    0.000000]  0000000000080000 ffff8801f055e000 ffff8801f055e1f8 
0000000000000000
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff81d23024>] 
sparse_early_usemaps_alloc_node+0x64/0x178
[    0.000000]  [<ffffffff81d23348>] sparse_init+0xe4/0x25a
[    0.000000]  [<ffffffff81d16840>] paging_init+0x13/0x22
[    0.000000]  [<ffffffff81d07fbb>] setup_arch+0x9c6/0xa9b
[    0.000000]  [<ffffffff81683954>] ? printk+0x3c/0x3e
[    0.000000]  [<ffffffff81d01a38>] start_kernel+0xe5/0x468
[    0.000000]  [<ffffffff81d012cf>] x86_64_start_reservations+0xba/0xc1
[    0.000000]  [<ffffffff81007153>] ? xen_setup_runstate_info+0x2c/0x36
[    0.000000]  [<ffffffff81d050ee>] xen_start_kernel+0x565/0x56c
[    0.000000] Code: 79 bc 3e ff 85 c0 74 23 80 3d 19 e9 21 00 00 75 59 
be 2a
01 00 00 48 c7 c7 d0 55 a8 81 e8 b6 dc 31 ff c6 05 ff e8 21 00 01 eb 3f 
<41> 8b
bc 24 60 60 02 00 49 83 c8 ff 4c 89 e9 4c 89 f2 48 89 de
[    0.000000] RIP  [<ffffffff81d220e6>] __alloc_bootmem_node+0x43/0x96
[    0.000000]  RSP <ffffffff81c01de8>
[    0.000000] CR2: 0000000000000000
[    0.000000] ---[ end trace a7919e7f17c0a725 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
(XEN) Domain 0 crashed: 'noreboot' set - not rebooting.



The obvious solution would be to explicitly deny northbridge scanning 
when running as Dom0, though I am not sure how to implement this without 
upsetting the other kernel folks about "that crappy Xen thing" again ;-)

Could someone propose a fix for this (I am OoO for the next two weeks).

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

             reply	other threads:[~2012-08-03 12:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-03 12:20 Andre Przywara [this message]
2012-08-03 12:36 ` Dom0 crash with old style AMD NUMA detection Konrad Rzeszutek Wilk
2012-08-17 14:22   ` Konrad Rzeszutek Wilk
2012-09-14 18:58     ` Konrad Rzeszutek Wilk
2012-09-17  7:29       ` Andre Przywara
2012-09-17 19:14         ` Konrad Rzeszutek Wilk
2012-09-18  9:57           ` Andre Przywara
2012-09-18 13:44             ` Konrad Rzeszutek Wilk
2012-09-18 16:50               ` Andre Przywara
2012-09-18 14:55                 ` Konrad Rzeszutek Wilk
2012-09-21 17:49     ` Andre Przywara
2012-09-21 17:48       ` Konrad Rzeszutek Wilk
2012-09-21 23:46         ` Andre Przywara
2012-09-24 13:48           ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=501BC20F.3040205@amd.com \
    --to=andre.przywara@amd.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.