* MCE boot crash in qemu
@ 2009-06-15 11:59 Vegard Nossum
2009-06-15 12:01 ` Pekka Enberg
2009-06-15 12:52 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: Vegard Nossum @ 2009-06-15 11:59 UTC (permalink / raw)
To: Ingo Molnar, Andi Kleen; +Cc: Pekka Enberg, LKML
[-- Attachment #1: Type: text/plain, Size: 3269 bytes --]
Hi,
I get an MCE-related crash like this in latest linus tree:
[ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.116396] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.120570] mce: CPU supports 0 MCE banks
[ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000
00000010
[ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
[ 0.128001] PGD 0
[ 0.128001] Thread overran stack, or stack corrupted
[ 0.128001] Oops: 0002 [#1] PREEMPT SMP
[ 0.128001] last sysfs file:
[ 0.128001] CPU 0
[ 0.128001] Modules linked in:
[ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
[ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+
0x278/0x320
[ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246
[ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
[ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
[ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
[ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
[ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
[ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
00000000000
[ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
[ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
ff8152a4a0)
[ 0.128001] Stack:
[ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
914
[ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
69c
[ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
e6e
[ 0.128001] Call Trace:
[ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392
[ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
[ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
[ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e
[ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
[ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
[ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
[ 0.128001] Code: c7 48 89 05 9e 71 40 00 74 2a 48 63 15 91 71 40 00 be ff 00
00 00 48 c1 e2 03 e8 bf a1 e2 ff e9 3f fe ff ff 48 8b 05 7b 71 40 00 <48> c7 00
00 00 00 00 eb 84 c7 05 40 71 40 00 01 00 00 00 e9 2b
[ 0.128001] RIP [<ffffffff813b98ad>] mcheck_init+0x278/0x320
[ 0.128001] RSP <ffffffff81595e38>
[ 0.128001] CR2: 0000000000000010
[ 0.129306] ---[ end trace a7919e7f17c0a725 ]---
It's this:
/*
* Various K7s with broken bank 0 around. Always disable
* by default.
*/
if (c->x86 == 6)
bank[0] = 0;
in mce_cpu_quirks() in arch/x86/kernel/cpu/mcheck/mce.c around line
1217. Strange that it thinks this is AMD cpu, though?
Attached full boot log.
Vegard
[-- Attachment #2: mce.txt --]
[-- Type: text/plain, Size: 16338 bytes --]
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Linux version 2.6.30 (vegard@damson) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #426 SMP PREEMPT Mon Jun 15 13:34:20 CEST 2009
[ 0.000000] Command line: root=/dev/sda1 console=ttyS0 console=tty0 ignore_loglevel acpi=force no_console_suspend resume=/dev/sdb1 init=/getty.sh kmemcheck=0
[ 0.000000] KERNEL supported cpus:
[ 0.000000] Intel GenuineIntel
[ 0.000000] AMD AuthenticAMD
[ 0.000000] Centaur CentaurHauls
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 0000000007ff0000 (usable)
[ 0.000000] BIOS-e820: 0000000007ff0000 - 0000000008000000 (ACPI data)
[ 0.000000] BIOS-e820: 00000000fffc0000 - 0000000100000000 (reserved)
[ 0.000000] debug: ignoring loglevel setting.
[ 0.000000] last_pfn = 0x7ff0 max_arch_pfn = 0x400000000
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] init_memory_mapping: 0000000000000000-0000000007ff0000
[ 0.000000] 0000000000 - 0007ff0000 page 4k
[ 0.000000] kernel direct mapping tables up to 7ff0000 @ 8000-4a000
[ 0.000000] (5 early reservations) ==> bootmem [0000000000 - 0007ff0000]
[ 0.000000] #0 [0000000000 - 0000001000] BIOS data page ==> [0000000000 - 0000001000]
[ 0.000000] #1 [0000006000 - 0000008000] TRAMPOLINE ==> [0000006000 - 0000008000]
[ 0.000000] #2 [0001000000 - 00022819fc] TEXT DATA BSS ==> [0001000000 - 00022819fc]
[ 0.000000] #3 [000009fc00 - 0000100000] BIOS reserved ==> [000009fc00 - 0000100000]
[ 0.000000] #4 [0000008000 - 0000048000] PGTABLE ==> [0000008000 - 0000048000]
[ 0.000000] found SMP MP-table at [ffff8800000fa6d0] fa6d0
[ 0.000000] [ffffea0000000000-ffffea00003fffff] PMD -> [ffff880002800000-ffff880002bfffff] on node 0
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0x00000000 -> 0x00001000
[ 0.000000] DMA32 0x00001000 -> 0x00100000
[ 0.000000] Normal 0x00100000 -> 0x00100000
[ 0.000000] Movable zone start PFN for each node
[ 0.000000] early_node_map[2] active PFN ranges
[ 0.000000] 0: 0x00000000 -> 0x0000009f
[ 0.000000] 0: 0x00000100 -> 0x00007ff0
[ 0.000000] On node 0 totalpages: 32655
[ 0.000000] DMA zone: 112 pages used for memmap
[ 0.000000] DMA zone: 163 pages reserved
[ 0.000000] DMA zone: 3724 pages, LIFO batch:0
[ 0.000000] DMA32 zone: 784 pages used for memmap
[ 0.000000] DMA32 zone: 27872 pages, LIFO batch:7
[ 0.000000] Intel MultiProcessor Specification v1.4
[ 0.000000] MPTABLE: OEM ID: QEMUCPU
[ 0.000000] MPTABLE: Product ID: 0.1
[ 0.000000] MPTABLE: APIC at: 0xFEE00000
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] Processor #1
[ 0.000000] I/O APIC #2 Version 17 at 0xFEC00000.
[ 0.000000] Processors: 2
[ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
[ 0.000000] nr_irqs_gsi: 24
[ 0.000000] Allocating PCI resources starting at 8000000 (gap: 8000000:f7fc0000)
[ 0.000000] NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:2 nr_node_ids:1
[ 0.000000] PERCPU: Embedded 474 pages at ffff880002288000, static data 1909152 bytes
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 31596
[ 0.000000] Kernel command line: root=/dev/sda1 console=ttyS0 console=tty0 ignore_loglevel acpi=force no_console_suspend resume=/dev/sdb1 init=/getty.sh kmemcheck=0
[ 0.000000] PID hash table entries: 512 (order: 9, 4096 bytes)
[ 0.000000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.000000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Memory: 103132k/131008k available (3866k kernel code, 388k absent, 27348k reserved, 1818k data, 2176k init)
[ 0.000000] Experimental hierarchical RCU implementation.
[ 0.000000] Experimental hierarchical RCU init done.
[ 0.000000] NR_IRQS:4352 nr_irqs:424
[ 0.000000] Fast TSC calibration using PIT
[ 0.000000] Detected 1462.925 MHz processor.
[ 0.004000] Console: colour VGA+ 80x25
[ 0.004000] console [tty0] enabled
[ 0.004000] console [ttyS0] enabled
[ 0.004000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.004000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.004000] ... MAX_LOCK_DEPTH: 48
[ 0.004000] ... MAX_LOCKDEP_KEYS: 8191
[ 0.004000] ... CLASSHASH_SIZE: 4096
[ 0.004000] ... MAX_LOCKDEP_ENTRIES: 16384
[ 0.004000] ... MAX_LOCKDEP_CHAINS: 32768
[ 0.004000] ... CHAINHASH_SIZE: 16384
[ 0.004000] memory used by lock dependency info: 6207 kB
[ 0.004000] per task-struct memory footprint: 2688 bytes
[ 0.004000] ------------------------
[ 0.004000] | Locking API testsuite:
[ 0.004000] ----------------------------------------------------------------------------
[ 0.004000] | spin |wlock |rlock |mutex | wsem | rsem |
[ 0.004000] --------------------------------------------------------------------------
[ 0.004000] A-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] double unlock: ok | ok | ok | ok | ok | ok |
[ 0.004000] initialize held: ok | ok | ok | ok | ok | ok |
[ 0.004000] bad unlock order: ok | ok | ok | ok | ok | ok |
[ 0.004000] --------------------------------------------------------------------------
[ 0.004000] recursive read-lock: | ok | | ok |
[ 0.004000] recursive read-lock #2: | ok | | ok |
[ 0.004000] mixed read-write-lock: | ok | | ok |
[ 0.004000] mixed write-read-lock: | ok | | ok |
[ 0.004000] --------------------------------------------------------------------------
[ 0.004000] hard-irqs-on + irq-safe-A/12: ok | ok | ok |
[ 0.004000] soft-irqs-on + irq-safe-A/12: ok | ok | ok |
[ 0.004000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.004000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
[ 0.004000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
[ 0.004000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
[ 0.004000] hard-safe-A + irqs-on/12: ok | ok | ok |
[ 0.004000] soft-safe-A + irqs-on/12: ok | ok | ok |
[ 0.004000] hard-safe-A + irqs-on/21: ok | ok | ok |
[ 0.004000] soft-safe-A + irqs-on/21: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/132: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/132: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/213: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/213: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/231: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/231: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/312: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/312: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #1/321: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #1/321: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/123: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/123: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/132: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/132: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/213: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/213: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/231: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/231: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/312: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/312: ok | ok | ok |
[ 0.004000] hard-safe-A + unsafe-B #2/321: ok | ok | ok |
[ 0.004000] soft-safe-A + unsafe-B #2/321: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/123: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/123: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/132: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/132: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/213: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/213: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/231: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/231: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/312: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/312: ok | ok | ok |
[ 0.004000] hard-irq lock-inversion/321: ok | ok | ok |
[ 0.004000] soft-irq lock-inversion/321: ok | ok | ok |
[ 0.004000] hard-irq read-recursion/123: ok |
[ 0.004000] soft-irq read-recursion/123: ok |
[ 0.004000] hard-irq read-recursion/132: ok |
[ 0.004000] soft-irq read-recursion/132: ok |
[ 0.004000] hard-irq read-recursion/213: ok |
[ 0.004000] soft-irq read-recursion/213: ok |
[ 0.004000] hard-irq read-recursion/231: ok |
[ 0.004000] soft-irq read-recursion/231: ok |
[ 0.004000] hard-irq read-recursion/312: ok |
[ 0.004000] soft-irq read-recursion/312: ok |
[ 0.004000] hard-irq read-recursion/321: ok |
[ 0.004000] soft-irq read-recursion/321: ok |
[ 0.004000] -------------------------------------------------------
[ 0.004000] Good, all 218 testcases passed! |
[ 0.004000] ---------------------------------
[ 0.004000] allocated 1310720 bytes of page_cgroup
[ 0.004000] please try cgroup_disable=memory option if you don't want
[ 0.008988] Calibrating delay loop (skipped), value calculated using timer frequency.. 2925.85 BogoMIPS (lpj=5851700)
[ 0.030071] Security Framework initialized
[ 0.033281] TOMOYO Linux initialized
[ 0.045394] Mount-cache hash table entries: 256
[ 0.097008] Initializing cgroup subsys debug
[ 0.098789] Initializing cgroup subsys ns
[ 0.100572] Initializing cgroup subsys cpuacct
[ 0.105127] Initializing cgroup subsys memory
[ 0.110394] Initializing cgroup subsys freezer
[ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.116396] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.120570] mce: CPU supports 0 MCE banks
[ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
[ 0.128001] PGD 0
[ 0.128001] Thread overran stack, or stack corrupted
[ 0.128001] Oops: 0002 [#1] PREEMPT SMP
[ 0.128001] last sysfs file:
[ 0.128001] CPU 0
[ 0.128001] Modules linked in:
[ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
[ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+0x278/0x320
[ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246
[ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
[ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
[ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
[ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
[ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
[ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:0000000000000000
[ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
[ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
[ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffffff8152a4a0)
[ 0.128001] Stack:
[ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f914
[ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b869c
[ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddbe6e
[ 0.128001] Call Trace:
[ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392
[ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
[ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
[ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e
[ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
[ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
[ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
[ 0.128001] Code: c7 48 89 05 9e 71 40 00 74 2a 48 63 15 91 71 40 00 be ff 00 00 00 48 c1 e2 03 e8 bf a1 e2 ff e9 3f fe ff ff 48 8b 05 7b 71 40 00 <48> c7 00 00 00 00 00 eb 84 c7 05 40 71 40 00 01 00 00 00 e9 2b
[ 0.128001] RIP [<ffffffff813b98ad>] mcheck_init+0x278/0x320
[ 0.128001] RSP <ffffffff81595e38>
[ 0.128001] CR2: 0000000000000010
[ 0.129306] ---[ end trace a7919e7f17c0a725 ]---
[ 0.132404] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.140325] Pid: 0, comm: swapper Tainted: G D 2.6.30 #426
[ 0.142882] Call Trace:
[ 0.144307] [<ffffffff813bf004>] panic+0x94/0x170
[ 0.146152] [<ffffffff81051951>] do_exit+0x6c1/0x870
[ 0.148287] [<ffffffff81010e8e>] oops_end+0xce/0xe0
[ 0.150202] [<ffffffff8102f7c8>] no_context+0x108/0x290
[ 0.152298] [<ffffffff8102fabd>] __bad_area_nosemaphore+0x16d/0x210
[ 0.154691] [<ffffffff813c2953>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 0.156289] [<ffffffff8107c004>] ? tick_nohz_stop_sched_tick+0x154/0x3e0
[ 0.160282] [<ffffffff8100c4d4>] ? restore_args+0x0/0x30
[ 0.162375] [<ffffffff8102ff75>] ? do_page_fault+0x1b5/0x310
[ 0.164286] [<ffffffff8102fb81>] bad_area_nosemaphore+0x21/0x40
[ 0.168316] [<ffffffff8107dce7>] ? trace_hardirqs_off_caller+0xa7/0xe0
[ 0.172326] [<ffffffff8103001e>] do_page_fault+0x25e/0x310
[ 0.176305] [<ffffffff813c2992>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 0.178663] [<ffffffff813c42f5>] page_fault+0x25/0x30
[ 0.180296] [<ffffffff813b98ad>] ? mcheck_init+0x278/0x320
[ 0.184295] [<ffffffff813b98a1>] ? mcheck_init+0x26c/0x320
[ 0.186424] [<ffffffff813b869c>] identify_cpu+0x331/0x392
[ 0.188322] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
[ 0.190602] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
[ 0.192287] [<ffffffff8159c075>] start_kernel+0x403/0x46e
[ 0.196301] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
[ 0.198909] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
[ 0.200338] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: MCE boot crash in qemu
2009-06-15 11:59 MCE boot crash in qemu Vegard Nossum
@ 2009-06-15 12:01 ` Pekka Enberg
2009-06-15 12:52 ` Andi Kleen
1 sibling, 0 replies; 7+ messages in thread
From: Pekka Enberg @ 2009-06-15 12:01 UTC (permalink / raw)
To: Vegard Nossum; +Cc: Ingo Molnar, Andi Kleen, LKML
On Mon, 2009-06-15 at 13:59 +0200, Vegard Nossum wrote:
> Hi,
>
> I get an MCE-related crash like this in latest linus tree:
>
> [ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.116396] CPU: L2 Cache: 512K (64 bytes/line)
> [ 0.120570] mce: CPU supports 0 MCE banks
> [ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000
> 00000010
> [ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] PGD 0
> [ 0.128001] Thread overran stack, or stack corrupted
> [ 0.128001] Oops: 0002 [#1] PREEMPT SMP
> [ 0.128001] last sysfs file:
> [ 0.128001] CPU 0
> [ 0.128001] Modules linked in:
> [ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
> [ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+
> 0x278/0x320
> [ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246
> [ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
> [ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
> [ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
> [ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
> [ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
> [ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
> 00000000000
> [ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
> [ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
> ff8152a4a0)
> [ 0.128001] Stack:
> [ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
> 914
> [ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
> 69c
> [ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
> e6e
> [ 0.128001] Call Trace:
> [ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392
> [ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
> [ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
> [ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e
> [ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
> [ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
> [ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
> [ 0.128001] Code: c7 48 89 05 9e 71 40 00 74 2a 48 63 15 91 71 40 00 be ff 00
> 00 00 48 c1 e2 03 e8 bf a1 e2 ff e9 3f fe ff ff 48 8b 05 7b 71 40 00 <48> c7 00
> 00 00 00 00 eb 84 c7 05 40 71 40 00 01 00 00 00 e9 2b
> [ 0.128001] RIP [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] RSP <ffffffff81595e38>
> [ 0.128001] CR2: 0000000000000010
> [ 0.129306] ---[ end trace a7919e7f17c0a725 ]---
>
> It's this:
>
> /*
> * Various K7s with broken bank 0 around. Always disable
> * by default.
> */
> if (c->x86 == 6)
> bank[0] = 0;
>
> in mce_cpu_quirks() in arch/x86/kernel/cpu/mcheck/mce.c around line
> 1217. Strange that it thinks this is AMD cpu, though?
>
> Attached full boot log.
I saw something like this too with qemu/x86_64.
Pekka
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: MCE boot crash in qemu
2009-06-15 11:59 MCE boot crash in qemu Vegard Nossum
2009-06-15 12:01 ` Pekka Enberg
@ 2009-06-15 12:52 ` Andi Kleen
2009-06-15 13:22 ` Pekka Enberg
2009-06-17 10:32 ` [tip:x86/urgent] x86: mce: Handle banks == 0 case in K7 quirk tip-bot for Andi Kleen
1 sibling, 2 replies; 7+ messages in thread
From: Andi Kleen @ 2009-06-15 12:52 UTC (permalink / raw)
To: Vegard Nossum; +Cc: Ingo Molnar, Andi Kleen, Pekka Enberg, LKML
On Mon, Jun 15, 2009 at 01:59:04PM +0200, Vegard Nossum wrote:
> Hi,
>
> I get an MCE-related crash like this in latest linus tree:
>
> [ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.116396] CPU: L2 Cache: 512K (64 bytes/line)
> [ 0.120570] mce: CPU supports 0 MCE banks
> [ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000
> 00000010
> [ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] PGD 0
> [ 0.128001] Thread overran stack, or stack corrupted
> [ 0.128001] Oops: 0002 [#1] PREEMPT SMP
> [ 0.128001] last sysfs file:
> [ 0.128001] CPU 0
> [ 0.128001] Modules linked in:
> [ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
> [ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+
> 0x278/0x320
> [ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246
> [ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
> [ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
> [ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
> [ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
> [ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
> [ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
> 00000000000
> [ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
> [ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
> ff8152a4a0)
> [ 0.128001] Stack:
> [ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
> 914
> [ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
> 69c
> [ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
> e6e
> [ 0.128001] Call Trace:
> [ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392
> [ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
> [ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
> [ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e
> [ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
> [ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
> [ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
> [ 0.128001] Code: c7 48 89 05 9e 71 40 00 74 2a 48 63 15 91 71 40 00 be ff 00
> 00 00 48 c1 e2 03 e8 bf a1 e2 ff e9 3f fe ff ff 48 8b 05 7b 71 40 00 <48> c7 00
> 00 00 00 00 eb 84 c7 05 40 71 40 00 01 00 00 00 e9 2b
> [ 0.128001] RIP [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] RSP <ffffffff81595e38>
> [ 0.128001] CR2: 0000000000000010
> [ 0.129306] ---[ end trace a7919e7f17c0a725 ]---
>
> It's this:
>
> /*
> * Various K7s with broken bank 0 around. Always disable
> * by default.
> */
> if (c->x86 == 6)
> bank[0] = 0;
>
> in mce_cpu_quirks() in arch/x86/kernel/cpu/mcheck/mce.c around line
> 1217. Strange that it thinks this is AMD cpu, though?
Probably qemu fakes that. You can check in /proc/cpuinfo after
it booted.
It should really clear the mca cpuid flag if it doesn't have any mca banks,
but ok.
Here's a untested patch (sorry not able to test any patches currently).
Does it fix the problem?
A workaround if you don't want to apply the patch is to boot with mce=off
-Andi
---
x86: mce: Handle banks == 0 case in K7 quirk
This happens on QEMU which reports MCA capability, but no banks.
Without this patch there is a buffer overrun and boot ops because the code
would try to initialize the 0 element of a zero length kmalloc()
buffer.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
--- linux-2.6.30-git8/arch/x86/kernel/cpu/mcheck/mce.c-o 2009-06-15 14:45:52.000000000 +0200
+++ linux-2.6.30-git8/arch/x86/kernel/cpu/mcheck/mce.c 2009-06-15 14:46:40.000000000 +0200
@@ -1245,7 +1245,7 @@
* Various K7s with broken bank 0 around. Always disable
* by default.
*/
- if (c->x86 == 6)
+ if (c->x86 == 6 && banks > 0)
bank[0] = 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: MCE boot crash in qemu
2009-06-15 12:52 ` Andi Kleen
@ 2009-06-15 13:22 ` Pekka Enberg
2009-06-17 5:50 ` Pekka Enberg
2009-06-17 10:32 ` [tip:x86/urgent] x86: mce: Handle banks == 0 case in K7 quirk tip-bot for Andi Kleen
1 sibling, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2009-06-15 13:22 UTC (permalink / raw)
To: Andi Kleen; +Cc: Vegard Nossum, Ingo Molnar, LKML
On Mon, 2009-06-15 at 14:52 +0200, Andi Kleen wrote:
> x86: mce: Handle banks == 0 case in K7 quirk
>
> This happens on QEMU which reports MCA capability, but no banks.
> Without this patch there is a buffer overrun and boot ops because the code
> would try to initialize the 0 element of a zero length kmalloc()
> buffer.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
This fixes the bug for me!
Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Pekka
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: MCE boot crash in qemu
2009-06-15 13:22 ` Pekka Enberg
@ 2009-06-17 5:50 ` Pekka Enberg
2009-06-17 6:57 ` Ingo Molnar
0 siblings, 1 reply; 7+ messages in thread
From: Pekka Enberg @ 2009-06-17 5:50 UTC (permalink / raw)
To: Andi Kleen; +Cc: Vegard Nossum, Ingo Molnar, LKML
On Mon, 2009-06-15 at 16:22 +0300, Pekka Enberg wrote:
> On Mon, 2009-06-15 at 14:52 +0200, Andi Kleen wrote:
> > x86: mce: Handle banks == 0 case in K7 quirk
> >
> > This happens on QEMU which reports MCA capability, but no banks.
> > Without this patch there is a buffer overrun and boot ops because the code
> > would try to initialize the 0 element of a zero length kmalloc()
> > buffer.
> >
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> This fixes the bug for me!
>
> Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Ingo, I hit this again in my testing after rebasing to linus/master so I
really would like this in mainline.
Pekka
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: MCE boot crash in qemu
2009-06-17 5:50 ` Pekka Enberg
@ 2009-06-17 6:57 ` Ingo Molnar
0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2009-06-17 6:57 UTC (permalink / raw)
To: Pekka Enberg, Hidetoshi Seto, H. Peter Anvin
Cc: Andi Kleen, Vegard Nossum, LKML
* Pekka Enberg <penberg@cs.helsinki.fi> wrote:
> On Mon, 2009-06-15 at 16:22 +0300, Pekka Enberg wrote:
> > On Mon, 2009-06-15 at 14:52 +0200, Andi Kleen wrote:
> > > x86: mce: Handle banks == 0 case in K7 quirk
> > >
> > > This happens on QEMU which reports MCA capability, but no banks.
> > > Without this patch there is a buffer overrun and boot ops because the code
> > > would try to initialize the 0 element of a zero length kmalloc()
> > > buffer.
> > >
> > > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> >
> > This fixes the bug for me!
> >
> > Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
>
> Ingo, I hit this again in my testing after rebasing to
> linus/master so I really would like this in mainline.
yep, i've tidied up the changelog and have committed it to
x86/urgent.
But the bank[] code is quirky and butt-ugly and that needs to be
cleaned up - it's no wonder that bugs like this slip in.
- There's zero description about the hw model it represents
and how it relates to the bank[] array - what do the banks mean,
how are they organized.
- It's full of magic constants and implicitly-assumed size
calculations with little explanation and little extensibility:
...
if (c->x86 == 15 && banks > 4) {
/*
* disable GART TBL walk error reporting, which
* trips off incorrectly with the IOMMU & 3ware
* & Cerberus:
*/
clear_bit(10, (unsigned long *)&bank[4]);
}
...
bank = kmalloc(banks * sizeof(u64), GFP_KERNEL);
...
memset(bank, 0xff, banks * sizeof(u64));
...
- There's lots of bitmaps, arrays, flags interacting, creating a
maze of logic.
Instead of this messy code, the proper approach is to introduce an
abstract data structure representing the attributes of an MCE bank
register:
struct mce_bank_register {
int enabled;
int polled;
int dont_init;
int msr_idx;
};
( There's lots of other structural problems with the MCE code too -
but now that it's unified lets first fix the most obvious ones... )
Ingo
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:x86/urgent] x86: mce: Handle banks == 0 case in K7 quirk
2009-06-15 12:52 ` Andi Kleen
2009-06-15 13:22 ` Pekka Enberg
@ 2009-06-17 10:32 ` tip-bot for Andi Kleen
1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Andi Kleen @ 2009-06-17 10:32 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, andi, penberg, vegard.nossum, ak, tglx,
mingo
Commit-ID: 203abd67b75f7714ce98ab0cdbd6cfd7ad79dec4
Gitweb: http://git.kernel.org/tip/203abd67b75f7714ce98ab0cdbd6cfd7ad79dec4
Author: Andi Kleen <andi@firstfloor.org>
AuthorDate: Mon, 15 Jun 2009 14:52:01 +0200
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 17 Jun 2009 08:59:45 +0200
x86: mce: Handle banks == 0 case in K7 quirk
Vegard Nossum reported:
> I get an MCE-related crash like this in latest linus tree:
>
> [ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
> [ 0.116396] CPU: L2 Cache: 512K (64 bytes/line)
> [ 0.120570] mce: CPU supports 0 MCE banks
> [ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
> [ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] PGD 0
> [ 0.128001] Thread overran stack, or stack corrupted
> [ 0.128001] Oops: 0002 [#1] PREEMPT SMP
> [ 0.128001] last sysfs file:
> [ 0.128001] CPU 0
> [ 0.128001] Modules linked in:
> [ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
> [ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+0x278/0x320
> [ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246
> [ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
> [ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
> [ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
> [ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
> [ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
> [ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
> 00000000000
> [ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
> [ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
> [ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
> ff8152a4a0)
> [ 0.128001] Stack:
> [ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
> 914
> [ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
> 69c
> [ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
> e6e
> [ 0.128001] Call Trace:
> [ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392
> [ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
> [ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60
> [ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e
> [ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
> [ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
> [ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
This happens on QEMU which reports MCA capability, but no banks.
Without this patch there is a buffer overrun and boot ops because
the code would try to initialize the 0 element of a zero length
kmalloc() buffer.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Tested-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/cpu/mcheck/mce.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index fabba15..d9d77cf 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1245,7 +1245,7 @@ static void mce_cpu_quirks(struct cpuinfo_x86 *c)
* Various K7s with broken bank 0 around. Always disable
* by default.
*/
- if (c->x86 == 6)
+ if (c->x86 == 6 && banks > 0)
bank[0] = 0;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-06-17 10:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-15 11:59 MCE boot crash in qemu Vegard Nossum
2009-06-15 12:01 ` Pekka Enberg
2009-06-15 12:52 ` Andi Kleen
2009-06-15 13:22 ` Pekka Enberg
2009-06-17 5:50 ` Pekka Enberg
2009-06-17 6:57 ` Ingo Molnar
2009-06-17 10:32 ` [tip:x86/urgent] x86: mce: Handle banks == 0 case in K7 quirk tip-bot for Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox