* Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
@ 2010-02-15 14:19 Ashwin Pankaj
2010-02-16 9:05 ` Jan Beulich
0 siblings, 1 reply; 3+ messages in thread
From: Ashwin Pankaj @ 2010-02-15 14:19 UTC (permalink / raw)
To: Xen-devel
Hi ,
I am using Xen 3.4.1 - I see that sometimes when an MCE error occurs Xen
panics due to a page fault with the following stack trace-
http://pastebin.com/f30f67342
After some digging, probable culprit seems to be smp_cmci_interrupt
> if (bs.errcnt && mctc != NULL) {
> if (guest_enabled_event(dom0->vcpu[0],
> <------------------------------------ here
> VIRQ_MCA)) {
> mctelem_commit(mctc);
> printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n");
> send_guest_global_virq(dom0, VIRQ_MCA);
> } else {
> x86_mcinfo_dump(mctelem_dataptr(mctc));
> mctelem_dismiss(mctc);
> }
Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible?
Other functions like mce_softirq() perform a NULL check on dom0 before
accessing it's members ....
> /* Step2: Send Log to DOM0 through vIRQ */
> if (dom0 && guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) {
> printk(KERN_DEBUG "MCE: send MCE# to DOM0 through virq\n");
> send_guest_global_virq(dom0, VIRQ_MCA);
> }
Also note that, this system printed the MCE warning message( "(XEN) MCE:
The hardware reports a non fatal, correctable incident occured on CPU 0"
) twice before panicing.
So this code worked properly and entered x86_mcinfo_dump() atleast twice
before panic.
- Regards,
Ashwin
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
2010-02-15 14:19 Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ? Ashwin Pankaj
@ 2010-02-16 9:05 ` Jan Beulich
2010-02-16 10:37 ` Jiang, Yunhong
0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2010-02-16 9:05 UTC (permalink / raw)
To: Yunhong Jiang, Ashwin Pankaj; +Cc: Xen-devel
>>> Ashwin Pankaj <ashwin.pankaj@lsi.com> 15.02.10 15:19 >>>
> After some digging, probable culprit seems to be smp_cmci_interrupt
>
>> if (bs.errcnt && mctc != NULL) {
>> if (guest_enabled_event(dom0->vcpu[0],
>> <------------------------------------ here
>> VIRQ_MCA)) {
>> mctelem_commit(mctc);
>> printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n");
>> send_guest_global_virq(dom0, VIRQ_MCA);
>> } else {
>> x86_mcinfo_dump(mctelem_dataptr(mctc));
>> mctelem_dismiss(mctc);
>> }
>
>
>Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible?
Yes, your call trace confirms this.
>Other functions like mce_softirq() perform a NULL check on dom0 before
>accessing it's members ....
The majority of uses doesn't seem to do that check, yet it is essential
if CMCIs occur during boot of Xen. Even more, it should not only be
dom0 that is checked against NULL, but also dom0->vcpu (or
dom0->max_vcpus) and dom0->vcpu[0].
Jan
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
2010-02-16 9:05 ` Jan Beulich
@ 2010-02-16 10:37 ` Jiang, Yunhong
0 siblings, 0 replies; 3+ messages in thread
From: Jiang, Yunhong @ 2010-02-16 10:37 UTC (permalink / raw)
To: Jan Beulich, Ashwin Pankaj; +Cc: Xen-devel@lists.xensource.com
Anshiwin and Jan, thanks for pointing this out.
As all our developer/test machine is off during the Chinese New Year Holiday. I can't access any system now ( not even run a vim )
Jan, as the error is quite straightfoward, can you please cook a patch for it (I can't even have a smoking testing if I cook a patch) ? If needed, I will verify it after the CNY.
Thanks
--jyh
>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@novell.com]
>Sent: Tuesday, February 16, 2010 5:06 PM
>To: Jiang, Yunhong; Ashwin Pankaj
>Cc: Xen-devel@lists.xensource.com
>Subject: Re: [Xen-devel] Xen PANIC in MCE interrupt context : can global variable
>dom0 be NULL ?
>
>>>> Ashwin Pankaj <ashwin.pankaj@lsi.com> 15.02.10 15:19 >>>
>> After some digging, probable culprit seems to be smp_cmci_interrupt
>>
>>> if (bs.errcnt && mctc != NULL) {
>>> if (guest_enabled_event(dom0->vcpu[0],
>>> <------------------------------------ here
>>> VIRQ_MCA)) {
>>> mctelem_commit(mctc);
>>> printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through
>virq\n");
>>> send_guest_global_virq(dom0, VIRQ_MCA);
>>> } else {
>>> x86_mcinfo_dump(mctelem_dataptr(mctc));
>>> mctelem_dismiss(mctc);
>>> }
>>
>>
>>Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible?
>
>Yes, your call trace confirms this.
>
>>Other functions like mce_softirq() perform a NULL check on dom0 before
>>accessing it's members ....
>
>The majority of uses doesn't seem to do that check, yet it is essential
>if CMCIs occur during boot of Xen. Even more, it should not only be
>dom0 that is checked against NULL, but also dom0->vcpu (or
>dom0->max_vcpus) and dom0->vcpu[0].
>
>Jan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-02-16 10:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-15 14:19 Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ? Ashwin Pankaj
2010-02-16 9:05 ` Jan Beulich
2010-02-16 10:37 ` Jiang, Yunhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).