* (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
@ 2010-05-07 1:45 Luke S Crawford
2010-05-07 7:14 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Luke S Crawford @ 2010-05-07 1:45 UTC (permalink / raw)
To: xen-devel
so I get this on bootup right after detecting USB. this is using the
2.6.18.8-xen dom0 kernel. I got the same results with the 3.4.3-rc6 xen
hypervisor.
Ideas on what the problem might be? looking at amd_nonfatal it seems that
the MCE code is in an impossible state?
(XEN) Xen BUG at amd_nonfatal.c:165
(XEN) ----[ Xen-3.4.2 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 0000000000000ffe rbx: ffff828c8024ff28 rcx: 0000000000000000
(XEN) rdx: c0080ffe01000000 rsi: 0000000000000413 rdi: 0000000000000000
(XEN) rbp: 000000025f13f8e0 rsp: ffff828c8024fe60 r8: ffff828c8028f800
(XEN) r9: 0000000000000000 r10: 0000000000000005 r11: 0000000000000000
(XEN) r12: 0000000000000000 r13: ffff828c80177720 r14: ffff83081fd7b190
(XEN) r15: ffff83081fd7b190 cr0: 000000008005003b cr4: 00000000000006f0
(XEN) cr3: 00000004ca4a6000 cr2: 000000000083c770
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from rsp=ffff828c8024fe60:
(XEN) 0000000000000000 c0080ffe01000000 ffff828c80221180 ffff828c8011a12c
(XEN) ffff8300dfc2c060 ffff828c80221180 ffff83081fd7b198 ffff828c8011a20d
(XEN) 000000024ab06880 0000000000000000 ffff828c8024ff28 ffff828c80267900
(XEN) ffff828c80266900 0000000000000000 ffff828c80221100 ffff828c801185b8
(XEN) 000000000000e008 ffff828c8024ff28 ffff828c80266900 ffff828c802215b0
(XEN) 000000025e3b7f20 ffff828c80138fcc 0000000000000000 ffff8300dfafc000
(XEN) ffff8300dfc2c000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000246
(XEN) 0000000000000008 00000000ffff8e54 0000000000000054 0000000000000000
(XEN) ffffffff802053aa 0000000000000001 0000000000000000 0000000000000001
(XEN) 0000010000000000 ffffffff802053aa 000000000000e033 0000000000000246
(XEN) ffffffff80511f50 000000000000e02b 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff8300dfafc000
(XEN) Xen call trace:
(XEN) [<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
(XEN) [<ffff828c8011a12c>] execute_timer+0x2c/0x50
(XEN) [<ffff828c8011a20d>] timer_softirq_action+0xbd/0x2e0
(XEN) [<ffff828c801185b8>] do_softirq+0x58/0x80
(XEN) [<ffff828c80138fcc>] idle_loop+0x4c/0xa0
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at amd_nonfatal.c:165
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-07 1:45 (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board Luke S Crawford
@ 2010-05-07 7:14 ` Keir Fraser
2010-05-07 21:10 ` Luke S Crawford
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2010-05-07 7:14 UTC (permalink / raw)
To: Luke S Crawford, xen-devel@lists.xensource.com
Try 'no-mce' on xen-4.0 or xen-unstable command line, or 'nomce' on xen-3.4
command line. Looks like MCE support playing up. You probably didn't want
the MCE goop enabled anyway. :-)
-- Keir
On 07/05/2010 02:45, "Luke S Crawford" <lsc@prgmr.com> wrote:
>
> so I get this on bootup right after detecting USB. this is using the
> 2.6.18.8-xen dom0 kernel. I got the same results with the 3.4.3-rc6 xen
> hypervisor.
>
> Ideas on what the problem might be? looking at amd_nonfatal it seems that
> the MCE code is in an impossible state?
>
> (XEN) Xen BUG at amd_nonfatal.c:165
> (XEN) ----[ Xen-3.4.2 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 0000000000000ffe rbx: ffff828c8024ff28 rcx: 0000000000000000
> (XEN) rdx: c0080ffe01000000 rsi: 0000000000000413 rdi: 0000000000000000
> (XEN) rbp: 000000025f13f8e0 rsp: ffff828c8024fe60 r8: ffff828c8028f800
> (XEN) r9: 0000000000000000 r10: 0000000000000005 r11: 0000000000000000
> (XEN) r12: 0000000000000000 r13: ffff828c80177720 r14: ffff83081fd7b190
> (XEN) r15: ffff83081fd7b190 cr0: 000000008005003b cr4: 00000000000006f0
> (XEN) cr3: 00000004ca4a6000 cr2: 000000000083c770
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff828c8024fe60:
> (XEN) 0000000000000000 c0080ffe01000000 ffff828c80221180 ffff828c8011a12c
> (XEN) ffff8300dfc2c060 ffff828c80221180 ffff83081fd7b198 ffff828c8011a20d
> (XEN) 000000024ab06880 0000000000000000 ffff828c8024ff28 ffff828c80267900
> (XEN) ffff828c80266900 0000000000000000 ffff828c80221100 ffff828c801185b8
> (XEN) 000000000000e008 ffff828c8024ff28 ffff828c80266900 ffff828c802215b0
> (XEN) 000000025e3b7f20 ffff828c80138fcc 0000000000000000 ffff8300dfafc000
> (XEN) ffff8300dfc2c000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000246
> (XEN) 0000000000000008 00000000ffff8e54 0000000000000054 0000000000000000
> (XEN) ffffffff802053aa 0000000000000001 0000000000000000 0000000000000001
> (XEN) 0000010000000000 ffffffff802053aa 000000000000e033 0000000000000246
> (XEN) ffffffff80511f50 000000000000e02b 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff8300dfafc000
> (XEN) Xen call trace:
> (XEN) [<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0
> (XEN) [<ffff828c8011a12c>] execute_timer+0x2c/0x50
> (XEN) [<ffff828c8011a20d>] timer_softirq_action+0xbd/0x2e0
> (XEN) [<ffff828c801185b8>] do_softirq+0x58/0x80
> (XEN) [<ffff828c80138fcc>] idle_loop+0x4c/0xa0
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at amd_nonfatal.c:165
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-07 7:14 ` Keir Fraser
@ 2010-05-07 21:10 ` Luke S Crawford
2010-05-07 21:45 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Luke S Crawford @ 2010-05-07 21:10 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com
Keir Fraser <keir.fraser@eu.citrix.com> writes:
> Try 'no-mce' on xen-4.0 or xen-unstable command line, or 'nomce' on xen-3.4
> command line. Looks like MCE support playing up. You probably didn't want
> the MCE goop enabled anyway. :-)
nomce no-mce and mce=off all appear to do nothing (I'm putting them
right after kernel xen.gz) I get the same error.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-07 21:10 ` Luke S Crawford
@ 2010-05-07 21:45 ` Keir Fraser
2010-05-12 8:10 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2010-05-07 21:45 UTC (permalink / raw)
To: Luke S Crawford; +Cc: Christoph Egger, xen-devel@lists.xensource.com
On 07/05/2010 22:10, "Luke S Crawford" <lsc@prgmr.com> wrote:
> Keir Fraser <keir.fraser@eu.citrix.com> writes:
>
>> Try 'no-mce' on xen-4.0 or xen-unstable command line, or 'nomce' on xen-3.4
>> command line. Looks like MCE support playing up. You probably didn't want
>> the MCE goop enabled anyway. :-)
>
> nomce no-mce and mce=off all appear to do nothing (I'm putting them
> right after kernel xen.gz) I get the same error.
Ah, looks like half the MCE stuff is not even hooked up the mce boot
parameter. Well, I expect Christoph Egger can help: he implemented a lot of
the MCE mechanism, and especially the AMD parts.
I think the mce boot parameter should, when disabled, cause the MCE feature
bits to be removed from Xen's copy of CPUID feature flags. That would easily
disable all MCE logic throughout Xen.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-07 21:45 ` Keir Fraser
@ 2010-05-12 8:10 ` Keir Fraser
2010-05-20 9:13 ` Luke S Crawford
2010-07-15 5:40 ` Luke S Crawford
0 siblings, 2 replies; 11+ messages in thread
From: Keir Fraser @ 2010-05-12 8:10 UTC (permalink / raw)
To: Luke S Crawford; +Cc: xen-devel@lists.xensource.com
[-- Attachment #1: Type: text/plain, Size: 836 bytes --]
On 07/05/2010 22:45, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
> On 07/05/2010 22:10, "Luke S Crawford" <lsc@prgmr.com> wrote:
>
>> Keir Fraser <keir.fraser@eu.citrix.com> writes:
> Ah, looks like half the MCE stuff is not even hooked up the mce boot
> parameter. Well, I expect Christoph Egger can help: he implemented a lot of
> the MCE mechanism, and especially the AMD parts.
>
> I think the mce boot parameter should, when disabled, cause the MCE feature
> bits to be removed from Xen's copy of CPUID feature flags. That would easily
> disable all MCE logic throughout Xen.
Actually I think the attached patch should work, in conjunction with
specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let
me know if it works okay for you.
I've applied the patch to xen-unstable as c/s 21360.
-- Keir
[-- Attachment #2: 00-mce --]
[-- Type: application/octet-stream, Size: 387 bytes --]
diff -r 0079f76e906f xen/arch/x86/cpu/mcheck/non-fatal.c
--- a/xen/arch/x86/cpu/mcheck/non-fatal.c Wed May 12 08:53:27 2010 +0100
+++ b/xen/arch/x86/cpu/mcheck/non-fatal.c Wed May 12 09:06:37 2010 +0100
@@ -91,7 +91,7 @@
struct cpuinfo_x86 *c = &boot_cpu_data;
/* Check for MCE support */
- if (!mce_available(c))
+ if (mce_disabled || !mce_available(c))
return -ENODEV;
/*
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-12 8:10 ` Keir Fraser
@ 2010-05-20 9:13 ` Luke S Crawford
2010-05-20 12:54 ` Keir Fraser
2010-07-15 5:40 ` Luke S Crawford
1 sibling, 1 reply; 11+ messages in thread
From: Luke S Crawford @ 2010-05-20 9:13 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com
Keir Fraser <keir.fraser@eu.citrix.com> writes:
> > Ah, looks like half the MCE stuff is not even hooked up the mce boot
> > parameter. Well, I expect Christoph Egger can help: he implemented a lot of
> > the MCE mechanism, and especially the AMD parts.
> >
> > I think the mce boot parameter should, when disabled, cause the MCE feature
> > bits to be removed from Xen's copy of CPUID feature flags. That would easily
> > disable all MCE logic throughout Xen.
>
> Actually I think the attached patch should work, in conjunction with
> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let
> me know if it works okay for you.
>
> I've applied the patch to xen-unstable as c/s 21360.
So this patch was applied to 3.4-testing now, and it works beautifully.
I can repeatably remove nomce from the command line, and i get the error.
I re-add nomce to the command line, and everything works great.
Thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-20 9:13 ` Luke S Crawford
@ 2010-05-20 12:54 ` Keir Fraser
2010-05-20 14:01 ` Christoph Egger
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2010-05-20 12:54 UTC (permalink / raw)
To: Luke S Crawford; +Cc: Christoph Egger, xen-devel@lists.xensource.com
On 20/05/2010 10:13, "Luke S Crawford" <lsc@prgmr.com> wrote:
>> Actually I think the attached patch should work, in conjunction with
>> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let
>> me know if it works okay for you.
>>
>> I've applied the patch to xen-unstable as c/s 21360.
>
> So this patch was applied to 3.4-testing now, and it works beautifully.
> I can repeatably remove nomce from the command line, and i get the error.
> I re-add nomce to the command line, and everything works great.
That'll do then, until stomeone who udnerstands the MCE stuff implements a
proper fix.
Thanks,
Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-20 12:54 ` Keir Fraser
@ 2010-05-20 14:01 ` Christoph Egger
0 siblings, 0 replies; 11+ messages in thread
From: Christoph Egger @ 2010-05-20 14:01 UTC (permalink / raw)
To: Keir Fraser; +Cc: Luke S Crawford, xen-devel@lists.xensource.com
On Thursday 20 May 2010 14:54:14 Keir Fraser wrote:
> On 20/05/2010 10:13, "Luke S Crawford" <lsc@prgmr.com> wrote:
> >> Actually I think the attached patch should work, in conjunction with
> >> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter.
> >> Let me know if it works okay for you.
> >>
> >> I've applied the patch to xen-unstable as c/s 21360.
> >
> > So this patch was applied to 3.4-testing now, and it works beautifully.
> > I can repeatably remove nomce from the command line, and i get the error.
> > I re-add nomce to the command line, and everything works great.
>
> That'll do then, until stomeone who udnerstands the MCE stuff implements a
> proper fix.
Keir: Thanks for fixing it. I am currently busy with nested virtualization.
--
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-05-12 8:10 ` Keir Fraser
2010-05-20 9:13 ` Luke S Crawford
@ 2010-07-15 5:40 ` Luke S Crawford
2010-07-15 7:04 ` Keir Fraser
1 sibling, 1 reply; 11+ messages in thread
From: Luke S Crawford @ 2010-07-15 5:40 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com
Keir Fraser <keir.fraser@eu.citrix.com> writes:
> > I think the mce boot parameter should, when disabled, cause the MCE feature
> > bits to be removed from Xen's copy of CPUID feature flags. That would easily
> > disable all MCE logic throughout Xen.
>
> Actually I think the attached patch should work, in conjunction with
> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let
> me know if it works okay for you.
>
> I've applied the patch to xen-unstable as c/s 21360.
This still works swimmingly on xen 3.4... but I'm starting to
flirt with xen 4.0/pvops and while it looks like your patch is in there,
nomce, no-mce, mce=off and mce=no all appear to do nothing, and my box
reboots in the same place it did before:
(XEN) Panic on CPU 0:
(XEN) Xen BUG at amd_nonfatal.c:162
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-07-15 5:40 ` Luke S Crawford
@ 2010-07-15 7:04 ` Keir Fraser
2010-07-15 8:34 ` Luke S Crawford
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2010-07-15 7:04 UTC (permalink / raw)
To: Luke S Crawford; +Cc: xen-devel@lists.xensource.com
On 15/07/2010 06:40, "Luke S Crawford" <lsc@prgmr.com> wrote:
> This still works swimmingly on xen 3.4... but I'm starting to
> flirt with xen 4.0/pvops and while it looks like your patch is in there,
> nomce, no-mce, mce=off and mce=no all appear to do nothing, and my box
> reboots in the same place it did before:
>
> (XEN) Panic on CPU 0:
> (XEN) Xen BUG at amd_nonfatal.c:162
The bug is unavoidable with Xen 4.0.0 release. If you use tip of
xen-4.0-testing.hg, or one of the 4.0.1 release candidates, then the no-mce
boot parameter will do the right thing.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
2010-07-15 7:04 ` Keir Fraser
@ 2010-07-15 8:34 ` Luke S Crawford
0 siblings, 0 replies; 11+ messages in thread
From: Luke S Crawford @ 2010-07-15 8:34 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com
Keir Fraser <keir.fraser@eu.citrix.com> writes:
> The bug is unavoidable with Xen 4.0.0 release. If you use tip of
> xen-4.0-testing.hg, or one of the 4.0.1 release candidates, then the no-mce
> boot parameter will do the right thing.
Compiling xen-4.0-testing.hg and adding no-mce works great. thanks.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-07-15 8:34 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-07 1:45 (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board Luke S Crawford
2010-05-07 7:14 ` Keir Fraser
2010-05-07 21:10 ` Luke S Crawford
2010-05-07 21:45 ` Keir Fraser
2010-05-12 8:10 ` Keir Fraser
2010-05-20 9:13 ` Luke S Crawford
2010-05-20 12:54 ` Keir Fraser
2010-05-20 14:01 ` Christoph Egger
2010-07-15 5:40 ` Luke S Crawford
2010-07-15 7:04 ` Keir Fraser
2010-07-15 8:34 ` Luke S Crawford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).